From sweitzen at cisco.com Sun Oct 1 00:30:37 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 1 Oct 2006 00:30:37 -0700 Subject: [openib-general] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: We are just getting started with OFED testing on SLES10, first platform is x86_64. IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are working so far. MVAPICH with OSU benchmarks just hang. This same hardware works fine with OFED and RHEL4 U3. Has anyone else seen this? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Oct 1 00:50:48 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Oct 2006 09:50:48 +0200 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD In-Reply-To: <1159472595.21249.79.camel@flin.austin.ibm.com> References: <1159472595.21249.79.camel@flin.austin.ibm.com> Message-ID: <20061001075048.GC888@mellanox.co.il> Quoting r. Tseng-Hui (Frank) Lin : > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD > > The ppc64 problem is actually in pci_64.c. Here is the patch: > > ============ cut here ============= > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c > index 4c4449b..490403c 100644 > --- a/arch/powerpc/kernel/pci_64.c > +++ b/arch/powerpc/kernel/pci_64.c > @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_ > if (hose == 0) > return NULL; /* should never happen */ > > - /* If memory, add on the PCI bridge address offset */ > if (mmap_state == pci_mmap_mem) { > - *offset += hose->pci_mem_offset; > res_bit = IORESOURCE_MEM; > } else { > io_offset = (unsigned long)hose->io_base_virt - pci_io_base; > ============= end cut ============= > > The mmap() system call on resource0 does not work on ppc64 without this > patch. PowerMAC G5 got away with this because its hose->pci_mem_offset > was set to 0. > > The fix is made on 8/21. It may be able to make it into 2.6.19. But it > certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have > already been released. > > To cover both cases with and without the fix, my patch try to > mmap /sys/bus/pci/..../resource0 first. It it failed it tries > mmap /proc/bus/pci/.... If it failed again, we have no choice but fall > back to use PCI config space. OK, so for OFED just mmap from /proc/bus/pci/ should be sufficient work-around - it will make things work when driver is loaded. Correct? -- MST From aviram at dev.mellanox.co.il Sun Oct 1 02:29:09 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Sun, 01 Oct 2006 11:29:09 +0200 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 In-Reply-To: References: Message-ID: <451F8A65.4040507@dev.mellanox.co.il> Can you please elaborate on MVAPICH issues, can you send command line? We ran it here on 32 Opteron nodes each quad core and also rigorous tests on the many other nodes? Scott Weitzenkamp (sweitzen) wrote: > We are just getting started with OFED testing on SLES10, first > platform is x86_64. > > IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are working so far. > MVAPICH with OSU benchmarks just hang. This same hardware works > fine with OFED and RHEL4 U3. > > Has anyone else seen this? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > ------------------------------------------------------------------------ > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From jackm at dev.mellanox.co.il Sun Oct 1 02:53:16 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 1 Oct 2006 11:53:16 +0200 Subject: [openib-general] Kernel Oops in user-mad, mad Message-ID: <200610011153.16702.jackm@dev.mellanox.co.il> We received the following kernel Oops while running regression (see console picture attached). This looks like a possible race condition between handling umad send completions and ib_unregister_mad_agent. The Oops is at the list_del line of dequeue_send (user_mad.c: 186) Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler. Is there a possibility that there is a double deletion from a list somewhere? Jack -------------- next part -------------- A non-text attachment was scrubbed... Name: mad_oops.jpg Type: image/jpeg Size: 450236 bytes Desc: not available URL: From mlakshmanan at silverstorm.com Sun Oct 1 03:09:55 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Sun, 1 Oct 2006 06:09:55 -0400 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <20060928130052.GB28381@mellanox.co.il> Message-ID: Quoting r. Vu Pham [vuhuong at mellanox.com]: > Subject: Re: [PATCH] IB/SRP: Enable multichannel > What is the advantage to have multiple connections/qps on the same >physical port to the same target? The disavantages are wasting resources, >instability, no fail-over on physical port error... The advantage is if the target in question is an IOC that connects to a FC SAN for example. In this case, the host is physically connected to the same IOC, but can maintain independent logical connections to specific storage devices on the SAN that are "behind" the IOC. Quoting r. Michael S. Tsirkin >> Subject: Re: [PATCH] IB/SRP: Enable multichannel >> >> Maybe we should just use the port GUID instead of the node GUID to >> form the initiator ID? That would solve this pretty cleanly I think. > Sounds good. > I think we should also stick the pkey into the identifier extension - > I think it's nice for each partition to be able to act as a separate >virtual network, not affecting others. > What do you think? > -- > MST Sticking the pkey into the identifier extension may once again restrict the ability of the host to have multiple logical connections to an SRP IOC target. The most flexible approach appears to be: Identifier ID = Port GUID Identifier Extension = User specified Ishai's IB-SRP patch of 09/27 appears to accomplish the above. Madhu _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Sun Oct 1 04:14:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Oct 2006 13:14:13 +0200 Subject: [openib-general] Kernel Oops in user-mad, mad In-Reply-To: <200610011153.16702.jackm@dev.mellanox.co.il> References: <200610011153.16702.jackm@dev.mellanox.co.il> Message-ID: <20061001111413.GI1796@mellanox.co.il> Quoting r. Jack Morgenstein : > Subject: Kernel Oops in user-mad, mad > > We received the following kernel Oops while running regression > (see console picture attached). > > This looks like a possible race condition between handling umad send completions > and ib_unregister_mad_agent. > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler. > > Is there a possibility that there is a double deletion from a list somewhere? > > Jack > > > Was this during module unload? -- MST From mst at mellanox.co.il Sun Oct 1 04:15:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 1 Oct 2006 13:15:53 +0200 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: References: Message-ID: <20061001111553.GJ1796@mellanox.co.il> Quoting r. Lakshmanan, Madhu : > Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel > > Quoting r. Vu Pham [vuhuong at mellanox.com]: > > Subject: Re: [PATCH] IB/SRP: Enable multichannel > > What is the advantage to have multiple connections/qps on the same > >physical port to the same target? The disavantages are wasting > >resources, instability, no fail-over on physical port error... > > The advantage is if the target in question is an IOC that connects to a > FC SAN for example. In this case, the host is physically connected to > the same IOC, but can maintain independent logical connections to > specific storage devices on the SAN that are "behind" the IOC. We could just let the user specify the Id Ext when adding the device. How does this sound? -- MST From mlakshmanan at silverstorm.com Sun Oct 1 06:23:15 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Sun, 1 Oct 2006 09:23:15 -0400 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <20061001111553.GJ1796@mellanox.co.il> Message-ID: > Quoting r. Michael S. Tsirkin : > Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel > >Quoting r. Lakshmanan, Madhu : > >Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel > > > >Quoting r. Vu Pham [vuhuong at mellanox.com]: > > >Subject: Re: [PATCH] IB/SRP: Enable multichannel > > >What is the advantage to have multiple connections/qps on the same > > >physical port to the same target? The disavantages are wasting > > >resources, instability, no fail-over on physical port error... > > > >The advantage is if the target in question is an IOC that connects to a > > FC SAN for example. In this case, the host is physically connected to > > the same IOC, but can maintain independent logical connections to > > specific storage devices on the SAN that are "behind" the IOC. > > We could just let the user specify the Id Ext when adding the device. > How does this sound? > -- > MST I agree. That was exactly what I had in mind. I'll work on the patch that does that. Madhu From or.gerlitz at gmail.com Sun Oct 1 08:20:42 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Sun, 1 Oct 2006 17:20:42 +0200 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <451BF3D6.7080403@ichips.intel.com> References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com> <451ABF0C.90607@ichips.intel.com> <451B6945.1050707@voltaire.com> <451BF3D6.7080403@ichips.intel.com> Message-ID: <15ddcffd0610010820i26aff1a9n28a47b7d007d2adc@mail.gmail.com> On 9/28/06, Sean Hefty wrote: > Or Gerlitz wrote: > > My understanding is that without this patch the side that sends the DREQ > > would do few DREQ resends as of the "firsts" DREPs being lost and no > > DREPs sent once the id at the peer side left the timewait state, correct? > > This is correct. Note that the number of DREQ retries was changed to 15 now. do you mean internally changed in the CM or somehow controlled from the outside by uDAPL? > > Can you please share what were the implications with intel MPI running a > > 64 nodes (128 ranks?) job? was the issue here just making the ***job > > termination time*** bigger? > > The job termination time was taking about a minute waiting for the DREQ to > timeout. When running a series of tests, this becomes a fairly large issue. Just something you might want to verify with the intel MPI team, does their terminate code looks like: for (i=0,N-1) call dat_ep_disconnect(ep[i]...) j=0 while(j < N) { dat_evd_wait(conn_evd) verify its a disconnected event on EP[i] for some 0 < i < N-1 j++ } and not for (i=0,N-1) dat_ep_disconnect(ep[i]...) dat_evd_wait(conn_evd) } Or. From sweitzen at cisco.com Sun Oct 1 21:31:11 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 1 Oct 2006 21:31:11 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: $ uname -a Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux $ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 192.168.2.46 192.168.2.49 hostname svbu-qa1850-4 svbu-qa1850-3 $ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 192.168.2.46 192.168.2.49 /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/ osu_latency The last command just hangs. Can I try your binary RPMs? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > Sent: Sunday, October 01, 2006 2:29 AM > To: Scott Weitzenkamp (sweitzen) > Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > Can you please elaborate on MVAPICH issues, can you send > command line? > We ran it here on 32 Opteron nodes each quad core and also rigorous > tests on the many other nodes? > > > > Scott Weitzenkamp (sweitzen) wrote: > > We are just getting started with OFED testing on SLES10, first > > platform is x86_64. > > > > IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > working so far. > > MVAPICH with OSU benchmarks just hang. This same hardware works > > fine with OFED and RHEL4 U3. > > > > Has anyone else seen this? > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > -------------------------------------------------------------- > ---------- > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > From sean.hefty at intel.com Sun Oct 1 21:45:54 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 1 Oct 2006 21:45:54 -0700 Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <15ddcffd0610010820i26aff1a9n28a47b7d007d2adc@mail.gmail.com> Message-ID: <000601c6e5dd$a56683d0$42d9180a@amr.corp.intel.com> >> This is correct. Note that the number of DREQ retries was changed to 15 now. > >do you mean internally changed in the CM or somehow controlled from >the outside by uDAPL? I meant the number of retries set by RDMA CM. - Sean From RAISCH at de.ibm.com Mon Oct 2 06:58:43 2006 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Mon, 2 Oct 2006 15:58:43 +0200 Subject: [openib-general] Question about ehca CQ handling In-Reply-To: Message-ID: > While looking over the ehca driver from the perspective of adding a > "peek CQ" operation, I noticed some code that looked funny. > > In hipz_set_cqx_n0() and hipz_set_cqx_n1(), what is the point of the > calls to hipz_galpa_load_cq()? The return value is discarded. I see > that hipz_galpa_load_cq() dereferences a volatile pointer internally, > so I'm guessing this is some sort of ordering constraint. But would > it be just as good to do "barrier()" there? > > - R. No, barrier won't help, the I/O bus connection is theoretically allowed to reorder and aggregate writes in a defined pattern. The recommended way to ensure that the ehca chip actually has seen the write is doing a read on the same address. Gruss / Regards . . . Christoph R From jlentini at netapp.com Mon Oct 2 07:11:01 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 2 Oct 2006 10:11:01 -0400 (EDT) Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159550667.17595.29.camel@sardonyx> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159550667.17595.29.camel@sardonyx> Message-ID: On Fri, 29 Sep 2006, Bryan O'Sullivan wrote: > On Fri, 2006-09-29 at 12:26 -0400, James Lentini wrote: > > > Balkanizing the OFA repository into corporate repositories would be a > > mistake. > > Nobody is suggesting this. However, separating the mess that is the > current SVN trunk into a set of well-understood branches, each of which > sees some testing by its authors in isolation, can *only* be a good > thing for ensuring a higher-quality OF process in general. > > > It is likely that companies will restrict developers at HCA > > vendor X from contributing code to HCA vendor Y's repository. > > I doubt it. Unfortunately this does happen. Sean has already said he can only access git trees at kernel.org. > As a practical matter, having your driver in the kernel tree means > it's open season for anyone who wants to take a crack at it. Just > look at the number of IB/10gbE/iWarp hardware vendors that have > fingerprints all over each other's code in drivers/infiniband/hw for > an example. From rdreier at cisco.com Mon Oct 2 07:43:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 07:43:46 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: (James Lentini's message of "Mon, 2 Oct 2006 10:11:01 -0400 (EDT)") References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159550667.17595.29.camel@sardonyx> Message-ID: James> Unfortunately this does happen. Sean has already said he James> can only access git trees at kernel.org. I think he just said that he can only access git trees via http://. - R. From sean.hefty at intel.com Mon Oct 2 08:25:22 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 2 Oct 2006 08:25:22 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: Message-ID: <000001c6e636$fad6a030$19d1180a@amr.corp.intel.com> > James> Unfortunately this does happen. Sean has already said he > James> can only access git trees at kernel.org. > >I think he just said that he can only access git trees via http://. I can access git://git.kernel.org or http://git.somewhere.else. - Sean From sean.hefty at intel.com Mon Oct 2 09:35:08 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 2 Oct 2006 09:35:08 -0700 Subject: [openib-general] [PATCH v2] ib_cm: fix module unload race with timewait In-Reply-To: Message-ID: <000201c6e640$b987ad90$19d1180a@amr.corp.intel.com> Updated patch based on Roland's feedback - converted a couple uses of spinlock_irqsave to spinlock_irq, and used list manipulation routine for cleanup. Signed-off-by: Sean Hefty --- Index: cm.c =================================================================== --- cm.c (revision 9680) +++ cm.c (working copy) @@ -75,6 +75,7 @@ static struct ib_cm { struct rb_root remote_sidr_table; struct idr local_id_table; __be32 random_id_operand; + struct list_head timewait_list; struct workqueue_struct *wq; } cm; @@ -112,6 +113,7 @@ struct cm_work { struct cm_timewait_info { struct cm_work work; /* Must be first. */ + struct list_head list; struct rb_node remote_qp_node; struct rb_node remote_id_node; __be64 remote_ca_guid; @@ -648,13 +650,6 @@ static inline int cm_convert_to_ms(int i static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info) { - unsigned long flags; - - if (!timewait_info->inserted_remote_id && - !timewait_info->inserted_remote_qp) - return; - - spin_lock_irqsave(&cm.lock, flags); if (timewait_info->inserted_remote_id) { rb_erase(&timewait_info->remote_id_node, &cm.remote_id_table); timewait_info->inserted_remote_id = 0; @@ -664,7 +659,6 @@ static void cm_cleanup_timewait(struct c rb_erase(&timewait_info->remote_qp_node, &cm.remote_qp_table); timewait_info->inserted_remote_qp = 0; } - spin_unlock_irqrestore(&cm.lock, flags); } static struct cm_timewait_info * cm_create_timewait_info(__be32 local_id) @@ -685,8 +679,12 @@ static struct cm_timewait_info * cm_crea static void cm_enter_timewait(struct cm_id_private *cm_id_priv) { int wait_time; + unsigned long flags; + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + list_add_tail(&cm_id_priv->timewait_info->list, &cm.timewait_list); + spin_unlock_irqrestore(&cm.lock, flags); /* * The cm_id could be destroyed by the user before we exit timewait. @@ -702,9 +700,13 @@ static void cm_enter_timewait(struct cm_ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv) { + unsigned long flags; + cm_id_priv->id.state = IB_CM_IDLE; if (cm_id_priv->timewait_info) { + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + spin_unlock_irqrestore(&cm.lock, flags); kfree(cm_id_priv->timewait_info); cm_id_priv->timewait_info = NULL; } @@ -1308,6 +1310,7 @@ static struct cm_id_private * cm_match_r if (timewait_info) { cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, timewait_info->work.remote_id); + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); if (cur_cm_id_priv) { cm_dup_req_handler(work, cur_cm_id_priv); @@ -1316,7 +1319,8 @@ static struct cm_id_private * cm_match_r cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + listen_cm_id_priv = NULL; + goto out; } /* Find matching listen request. */ @@ -1324,21 +1328,20 @@ static struct cm_id_private * cm_match_r req_msg->service_id, req_msg->private_data); if (!listen_cm_id_priv) { + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + goto out; } atomic_inc(&listen_cm_id_priv->refcount); atomic_inc(&cm_id_priv->refcount); cm_id_priv->id.state = IB_CM_REQ_RCVD; atomic_inc(&cm_id_priv->work_count); spin_unlock_irqrestore(&cm.lock, flags); +out: return listen_cm_id_priv; - -error: cm_cleanup_timewait(cm_id_priv->timewait_info); - return NULL; } static int cm_req_handler(struct cm_work *work) @@ -2630,28 +2633,29 @@ static int cm_timewait_handler(struct cm { struct cm_timewait_info *timewait_info; struct cm_id_private *cm_id_priv; - unsigned long flags; int ret; timewait_info = (struct cm_timewait_info *)work; - cm_cleanup_timewait(timewait_info); + spin_lock_irq(&cm.lock); + list_del(&timewait_info->list); + spin_unlock_irq(&cm.lock); cm_id_priv = cm_acquire_id(timewait_info->work.local_id, timewait_info->work.remote_id); if (!cm_id_priv) return -EINVAL; - spin_lock_irqsave(&cm_id_priv->lock, flags); + spin_lock_irq(&cm_id_priv->lock); if (cm_id_priv->id.state != IB_CM_TIMEWAIT || cm_id_priv->remote_qpn != timewait_info->remote_qpn) { - spin_unlock_irqrestore(&cm_id_priv->lock, flags); + spin_unlock_irq(&cm_id_priv->lock); goto out; } cm_id_priv->id.state = IB_CM_IDLE; ret = atomic_inc_and_test(&cm_id_priv->work_count); if (!ret) list_add_tail(&work->list, &cm_id_priv->work_list); - spin_unlock_irqrestore(&cm_id_priv->lock, flags); + spin_unlock_irq(&cm_id_priv->lock); if (ret) cm_process_work(cm_id_priv, work); @@ -3434,6 +3438,7 @@ static int __init ib_cm_init(void) idr_init(&cm.local_id_table); get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand); idr_pre_get(&cm.local_id_table, GFP_KERNEL); + INIT_LIST_HEAD(&cm.timewait_list); cm.wq = create_workqueue("ib_cm"); if (!cm.wq) @@ -3451,7 +3456,20 @@ error: static void __exit ib_cm_cleanup(void) { + struct cm_timewait_info *timewait_info, *tmp; + + spin_lock_irq(&cm.lock); + list_for_each_entry(timewait_info, &cm.timewait_list, list) + cancel_delayed_work(&timewait_info->work.work); + spin_unlock_irq(&cm.lock); + destroy_workqueue(cm.wq); + + list_for_each_entry_safe(timewait_info, tmp, &cm.timewait_list, list) { + list_del(&timewait_info->list); + kfree(timewait_info); + } + ib_unregister_client(&cm_client); idr_destroy(&cm.local_id_table); } From hnguyen at de.ibm.com Mon Oct 2 10:08:52 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 2 Oct 2006 19:08:52 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca: fix ehca_probe if module loaded after ib_ipoib Message-ID: <200610021908.52695.hnguyen@de.ibm.com> Hello Roland! Below is a patch of ehca, which fixes a bug (crash) that occured when ib_ehca is loaded after ib_ipoib. This patch initializes struct ehca_shca with struct device*, then creates internal resources and finally registers the ehca IB device. And that is the proper sequence to do. In addition to that this patch contains a very small format improvement in our tracing function. Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_main.c | 36 +++++++++++++++++++----------------- ehca_tools.h | 2 +- 2 files changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 2380994..024d511 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -49,7 +49,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0016"); +MODULE_VERSION("SVNEHCA_0017"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -239,7 +239,7 @@ init_node_guid1: return ret; } -int ehca_register_device(struct ehca_shca *shca) +int ehca_init_device(struct ehca_shca *shca) { int ret; @@ -317,11 +317,6 @@ int ehca_register_device(struct ehca_shc /* shca->ib_device.process_mad = ehca_process_mad; */ shca->ib_device.mmap = ehca_mmap; - ret = ib_register_device(&shca->ib_device); - if (ret) - ehca_err(&shca->ib_device, - "ib_register_device() failed ret=%x", ret); - return ret; } @@ -561,9 +556,9 @@ static int __devinit ehca_probe(struct i goto probe1; } - ret = ehca_register_device(shca); + ret = ehca_init_device(shca); if (ret) { - ehca_gen_err("Cannot register Infiniband device"); + ehca_gen_err("Cannot init ehca device struct"); goto probe1; } @@ -571,7 +566,7 @@ static int __devinit ehca_probe(struct i ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); - goto probe2; + goto probe1; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); @@ -600,6 +595,13 @@ static int __devinit ehca_probe(struct i goto probe5; } + ret = ib_register_device(&shca->ib_device); + if (ret) { + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); + goto probe6; + } + /* create AQP1 for port 1 */ if (ehca_open_aqp1 == 1) { shca->sport[0].port_state = IB_PORT_DOWN; @@ -607,7 +609,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 1."); - goto probe6; + goto probe7; } } @@ -618,7 +620,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 2."); - goto probe7; + goto probe8; } } @@ -630,12 +632,15 @@ static int __devinit ehca_probe(struct i return 0; -probe7: +probe8: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) ehca_err(&shca->ib_device, "Cannot destroy AQP1 for port 1. ret=%x", ret); +probe7: + ib_unregister_device(&shca->ib_device); + probe6: ret = ehca_dereg_internal_maxmr(shca); if (ret) @@ -660,9 +665,6 @@ probe3: ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); -probe2: - ib_unregister_device(&shca->ib_device); - probe1: ib_dealloc_device(&shca->ib_device); @@ -750,7 +752,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0016)\n"); + "(Rel.: SVNEHCA_0017)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/ehca_tools.h b/drivers/infiniband/hw/ehca/ehca_tools.h index 9f56bb8..809da3e 100644 --- a/drivers/infiniband/hw/ehca/ehca_tools.h +++ b/drivers/infiniband/hw/ehca/ehca_tools.h @@ -117,7 +117,7 @@ #define ehca_dmp(adr, len, format, args. unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - printk("EHCA_DMP:%s" format \ + printk("EHCA_DMP:%s " format \ " adr=%p ofs=%04x %016lx %016lx\n", \ __FUNCTION__, ##args, deb, x, \ *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ From rdreier at cisco.com Mon Oct 2 10:18:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 10:18:29 -0700 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca: fix ehca_probe if module loaded after ib_ipoib In-Reply-To: <200610021908.52695.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's message of "Mon, 2 Oct 2006 19:08:52 +0200") References: <200610021908.52695.hnguyen@de.ibm.com> Message-ID: Looks OK but your mailer mangled the patch. Please resend in a form that can be applied... Also: > In addition to that this patch contains a very small format improvement > in our tracing function. please send unrelated changes as separate patches. So this should come as two patches -- one to fix the device registration, and one to change your debug formatting. Thanks, Roland From bhartner at austin.rr.com Mon Oct 2 10:26:36 2006 From: bhartner at austin.rr.com (Bill Hartner) Date: Mon, 02 Oct 2006 12:26:36 -0500 Subject: [openib-general] RHEL 4 U3 - lost completions Message-ID: <45214BCC.B0B78035@austin.rr.com> I am testing an app in development on RHEL 4 U3 using uDAPL. The app runs OK on gen1 stacks, but cannot run on any OFED based stack I have tried on RHEL 4 U3. The symptom is RDMAs not getting completion. A completion notification is sent, but mthca_poll_cq() finds no completion. I debugged the problem to this: the memory for the completion queue is not pinned and at some point the page struct changes *after* the HCA has been handed the address of the completion queue, so subsequent completions are written elsewhere in memory and the app hangs waiting for completion. I hacked in the following to get the app running, I replaced the allocation of the completion buffer in libmthca, ret = posix_memalign(memptr, alignment, size); with, size = (size + (4096-1)) & ~(4096-1); *memptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_LOCKED,0,0); Is there a restriction on using completion queues on a RHEL 4 Update 3 kernel ? Am I missing a patch ? Details in http://openib.org/bugzilla/show_bug.cgi?id=147 -Bill From rdreier at cisco.com Mon Oct 2 10:47:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 10:47:57 -0700 Subject: [openib-general] RHEL 4 U3 - lost completions In-Reply-To: <45214BCC.B0B78035@austin.rr.com> (Bill Hartner's message of "Mon, 02 Oct 2006 12:26:36 -0500") References: <45214BCC.B0B78035@austin.rr.com> Message-ID: Bill> I am testing an app in development on RHEL 4 U3 using uDAPL. Bill> The app runs OK on gen1 stacks, but cannot run on any OFED Bill> based stack I have tried on RHEL 4 U3. The symptom is RDMAs Bill> not getting completion. A completion notification is sent, Bill> but mthca_poll_cq() finds no completion. I debugged the Bill> problem to this: the memory for the completion queue is not Bill> pinned and at some point the page struct changes *after* the Bill> HCA has been handed the address of the completion queue, so Bill> subsequent completions are written elsewhere in memory and Bill> the app hangs waiting for completion. The memory should be pinned by the call to __mthca_reg_mr() in mthca_create_cq(), since the kernel will do get_user_pages() on the memory. By any chance, does your app do fork() or system() or something like that? - R. From bos at pathscale.com Mon Oct 2 11:14:12 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 02 Oct 2006 11:14:12 -0700 Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19 In-Reply-To: References: Message-ID: <452156F4.4050004@pathscale.com> Eric W. Biederman wrote: > Have you tested your driver against the -mm tree? No. > To the best of my knowledge the irq handling of your hypertransport card > is a complete and total hack that works only by chance. And a happy Monday morning to you, too :-) > In the -mm tree I have added a first pass at proper support for the > hypertranport interrupt capability. As this code is slated to go into > 2.6.19 could you please test against that? I'm on vacation for a few weeks. We'll find someone to do it. Message-ID: <45215636.2FBBC84C@austin.rr.com> Roland Dreier wrote: > > Bill> I am testing an app in development on RHEL 4 U3 using uDAPL. > Bill> The app runs OK on gen1 stacks, but cannot run on any OFED > Bill> based stack I have tried on RHEL 4 U3. The symptom is RDMAs > Bill> not getting completion. A completion notification is sent, > Bill> but mthca_poll_cq() finds no completion. I debugged the > Bill> problem to this: the memory for the completion queue is not > Bill> pinned and at some point the page struct changes *after* the > Bill> HCA has been handed the address of the completion queue, so > Bill> subsequent completions are written elsewhere in memory and > Bill> the app hangs waiting for completion. > > The memory should be pinned by the call to __mthca_reg_mr() in > mthca_create_cq(), since the kernel will do get_user_pages() on the > memory. > > By any chance, does your app do fork() or system() or something like that? At 1st, I thought that was the case, a fork, however, I do not think get_user_pages(), and the increment of the ref count, will guarantee the page struct does not change for RHEL 4 U3, I need to verify that though. I dumped the page struct in ib_umem_get() when the completion queue memory was 1st registered. Then my DTO event thread, on a 10 second timeout, would go ahead and create another EVD (not used) so I could then dump the page struct of the 1st completion queue again in ib_umem_get(), and sure enough the page struct changed. If I wrote some code that mapped an address to the original page struct, I would probably see the completions there. -Bill From bos at pathscale.com Mon Oct 2 11:24:19 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 02 Oct 2006 11:24:19 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159550667.17595.29.camel@sardonyx> Message-ID: <45215953.909@pathscale.com> James Lentini wrote: >>> It is likely that companies will restrict developers at HCA >>> vendor X from contributing code to HCA vendor Y's repository. >> I doubt it. > > Unfortunately this does happen. Sean has already said he can only > access git trees at kernel.org. That appears to be a matter of Intel's lamebrained firewall rules getting in his way, not a "thou shalt not poke at competitor's open code" restriction. (Bill Hartner's message of "Mon, 02 Oct 2006 13:11:02 -0500") References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> Message-ID: Bill> At 1st, I thought that was the case, a fork, however, I do Bill> not think get_user_pages(), and the increment of the ref Bill> count, will guarantee the page struct does not change for Bill> RHEL 4 U3, I need to verify that though. Are you doing a fork()? If so then, yes, you will not be able to make your app work on a RHEL4 kernel. After get_user_pages(), if you do a fork() then a copy-on-write will still happen, which will cause the physical page to move as you have discovered. This is fixed on newer kernels with libibverbs 1.1 (not yet released though). I don't think there's any real way to make it work on RHEL4's 2.6.9 kernel. - R. From bhartner at austin.rr.com Mon Oct 2 11:34:44 2006 From: bhartner at austin.rr.com (Bill Hartner) Date: Mon, 02 Oct 2006 13:34:44 -0500 Subject: [openib-general] RHEL 4 U3 - lost completions References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> Message-ID: <45215BC4.C8BB5E22@austin.rr.com> Roland Dreier wrote: > > Bill> At 1st, I thought that was the case, a fork, however, I do > Bill> not think get_user_pages(), and the increment of the ref > Bill> count, will guarantee the page struct does not change for > Bill> RHEL 4 U3, I need to verify that though. > > Are you doing a fork()? If so then, yes, you will not be able to make > your app work on a RHEL4 kernel. After get_user_pages(), if you do a > fork() then a copy-on-write will still happen, which will cause the > physical page to move as you have discovered. There is no fork that I am aware of in the code. The pthread that created the EVD and any other thread in the process that executes the debug code sees the changed page struct. I will try to recreate this in a test app. -Bill From thlin at us.ibm.com Mon Oct 2 12:34:23 2006 From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin) Date: Mon, 02 Oct 2006 14:34:23 -0500 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD In-Reply-To: <20061001075048.GC888@mellanox.co.il> References: <1159472595.21249.79.camel@flin.austin.ibm.com> <20061001075048.GC888@mellanox.co.il> Message-ID: <1159817663.21249.103.camel@flin.austin.ibm.com> On Sun, 2006-10-01 at 09:50 +0200, Michael S. Tsirkin wrote: > Quoting r. Tseng-Hui (Frank) Lin : > > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD > > > > The ppc64 problem is actually in pci_64.c. Here is the patch: > > > > ============ cut here ============= > > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c > > index 4c4449b..490403c 100644 > > --- a/arch/powerpc/kernel/pci_64.c > > +++ b/arch/powerpc/kernel/pci_64.c > > @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_ > > if (hose == 0) > > return NULL; /* should never happen */ > > > > - /* If memory, add on the PCI bridge address offset */ > > if (mmap_state == pci_mmap_mem) { > > - *offset += hose->pci_mem_offset; > > res_bit = IORESOURCE_MEM; > > } else { > > io_offset = (unsigned long)hose->io_base_virt - pci_io_base; > > ============= end cut ============= > > > > The mmap() system call on resource0 does not work on ppc64 without this > > patch. PowerMAC G5 got away with this because its hose->pci_mem_offset > > was set to 0. > > > > The fix is made on 8/21. It may be able to make it into 2.6.19. But it > > certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have > > already been released. > > > > To cover both cases with and without the fix, my patch try to > > mmap /sys/bus/pci/..../resource0 first. It it failed it tries > > mmap /proc/bus/pci/.... If it failed again, we have no choice but fall > > back to use PCI config space. > > OK, so for OFED just mmap from /proc/bus/pci/ should be sufficient > work-around - it will make things work when driver is loaded. > Correct? > Michael: No. Without the above patch for pci_64.c, mmap() is broken in ppc64 no matter mmap() from /sys/bus/pci//resource0 or /proc/bus/pci/. The only way is "mstflint -d /proc/bus/pci/", which use pread() and pwrite() instaed of mmap(). With the patch, mmap() from /sys/bus/pci//resource0 and /proc/bus/pci/ both work when mthca driver is loaded. No workaround is needed. Note that "-d " uses mmap from /proc/bus/pci/. "-d /proc/bus/pci/" uses pread() and pwrite(). From rkuchimanchi at silverstorm.com Mon Oct 2 12:58:17 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:28:17 +0530 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) Message-ID: <4521BCB1.524.4E3B826B@rkuchimanchi.silverstorm.com> Hi Roland, This patch series adds support for the SilverStorm Virtual Ethernet I/O Controllers (VEx) by adding a new kernel level driver. This kernel driver: 1. Communicates with the VEx on the SilverStorm fabric switches/directors using SilverStorm's native protocol 2. Presents a standard Ethernet NIC interface to the system 3. Uses IB reliable connection semantics 4. Is tuned for high performance and throughput The SilverStorm VEx and the associated communication protocol is in wide use amongst users of SilverStorm IB fabric solutions. This patch series is intended for your infiniband.git for-2.6.19 branch. It also has been tested against the for-2.6.20 branch. Signed-off-by: Ramachandra K Regards, Ram From rkuchimanchi at silverstorm.com Mon Oct 2 13:03:13 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:33:13 +0530 Subject: [openib-general] [PATCH 1/10] Driver Main files - netdev functions and corresponding state maintenance Message-ID: <4521BDD9.27185.4E400870@rkuchimanchi.silverstorm.com> Adds the driver main files. These files implement netdev registration, netdev functions and state maintenance of the virtual NIC corresponding to the various events associated with the Virtual Ethernet IOC (VEx) connection. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_main.c | 1040 +++++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_main.h | 152 +++++ 2 files changed, 1192 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_main.c b/drivers/infiniband/ulp/vnic/vnic_main.c new file mode 100644 index 0000000..b87e00b --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_main.c @@ -0,0 +1,1040 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_netpath.h" +#include "vnic_viport.h" +#include "vnic_ib.h" + +#define MODULEVERSION "0.1" +#define MODULEDETAILS "Virtual NIC driver version " MODULEVERSION + +MODULE_AUTHOR("SilverStorm Technologies Inc."); +MODULE_DESCRIPTION(MODULEDETAILS); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_SUPPORTED_DEVICE("SilverStorm Ethernet Virtual I/O Controller"); + +u32 vnic_debug = 0x0; + +module_param(vnic_debug, uint, 0444); + +LIST_HEAD(vnic_list); + +const char driver[] = "vnic"; + +void vnic_connected(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_connected()\n"); + vnic_npevent_queue_evt(netpath, VNICNP_CONNECTED); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (vnic->statistics.conn_time == 0) { + vnic->statistics.conn_time = + get_cycles() - vnic->statistics.start_time; + } + if (vnic->statistics.disconn_ref != 0) { + vnic->statistics.disconn_time += + get_cycles() - vnic->statistics.disconn_ref; + vnic->statistics.disconn_num++; + vnic->statistics.disconn_ref = 0; + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ +} + +void vnic_disconnected(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_disconnected()\n"); + vnic_npevent_queue_evt(netpath, VNICNP_DISCONNECTED); +} + +void vnic_link_up(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_link_up()\n"); + vnic_npevent_queue_evt(netpath, VNICNP_LINKUP); +} + +void vnic_link_down(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_link_down()\n"); + vnic_npevent_queue_evt(netpath, VNICNP_LINKDOWN); +} + +void vnic_stop_xmit(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_stop_xmit()\n"); + if (netpath == vnic->current_path) { + if (vnic->xmit_started) { + netif_stop_queue(&vnic->netdevice); + vnic->xmit_started = 0; + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (vnic->statistics.xmit_ref == 0) { + vnic->statistics.xmit_ref = get_cycles(); + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + } + return; +} + +void vnic_restart_xmit(struct vnic *vnic, struct netpath *netpath) +{ + VNIC_FUNCTION("vnic_restart_xmit()\n"); + if (netpath == vnic->current_path) { + if (!vnic->xmit_started) { + netif_wake_queue(&vnic->netdevice); + vnic->xmit_started = 1; + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (vnic->statistics.xmit_ref != 0) { + vnic->statistics.xmit_off_time += + get_cycles() - vnic->statistics.xmit_ref; + vnic->statistics.xmit_off_num++; + vnic->statistics.xmit_ref = 0; + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + } + return; +} + +void vnic_recv_packet(struct vnic *vnic, struct netpath *netpath, + struct sk_buff *skb) +{ +#ifdef CONFIG_INFINIBAND_VNIC_STATS + extern cycles_t recv_ref; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + VNIC_FUNCTION("vnic_recv_packet()\n"); + if ((netpath != vnic->current_path) || !vnic->open) { + VNIC_INFO("tossing packet\n"); + dev_kfree_skb(skb); + return; + } + + vnic->netdevice.last_rx = jiffies; + + skb->dev = &vnic->netdevice; + skb->protocol = eth_type_trans(skb, skb->dev); + if (!vnic->config->use_rx_csum) { + skb->ip_summed = CHECKSUM_NONE; + } + + netif_rx(skb); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + vnic->statistics.recv_time += get_cycles() - recv_ref; + vnic->statistics.recv_num++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + return; +} + +static struct net_device_stats *vnic_get_stats(struct net_device *device) +{ + struct vnic *vnic; + int ret = 0; + struct netpath *np; + + VNIC_FUNCTION("vnic_get_stats()\n"); + vnic = (struct vnic *)device->priv; + + np = vnic->current_path; + if (!np || !netpath_get_stats(np, &vnic->stats)) { + ret = -ENODEV; + } + + return &vnic->stats; +} + +static int vnic_open(struct net_device *device) +{ + struct vnic *vnic; + int ret = 0; + struct netpath *np; + + VNIC_FUNCTION("vnic_open()\n"); + vnic = (struct vnic *)device->priv; + np = vnic->current_path; + + if (vnic->state != VNIC_REGISTERED) { + ret = -ENODEV; + } + + vnic->open++; + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_SETLINK); + vnic->xmit_started = 1; + netif_start_queue(&vnic->netdevice); + + return ret; +} + +static int vnic_stop(struct net_device *device) +{ + struct vnic *vnic; + int ret = 0; + struct netpath *np; + + VNIC_FUNCTION("vnic_stop()\n"); + vnic = (struct vnic *)device->priv; + np = vnic->current_path; + netif_stop_queue(device); + vnic->xmit_started = 0; + vnic->open--; + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_SETLINK); + + return ret; +} + +static int vnic_hard_start_xmit(struct sk_buff *skb, struct net_device *device) +{ + struct vnic *vnic; + struct netpath *np; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + cycles_t xmit_time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + VNIC_FUNCTION("vnic_hard_start_xmit()\n"); + vnic = (struct vnic *)device->priv; + np = vnic->current_path; + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + xmit_time = get_cycles(); +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + if (np && netpath_xmit_packet(np, skb)) { + device->trans_start = jiffies; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + vnic->statistics.xmit_time += get_cycles() - xmit_time; + vnic->statistics.xmit_num++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + return 0; + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + vnic->statistics.xmit_fail++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + dev_kfree_skb(skb); + return 0; /* TBD: what should I return? */ +} + +static void vnic_tx_timeout(struct net_device *device) +{ + struct vnic *vnic; + + VNIC_FUNCTION("vnic_tx_timeout()\n"); + vnic = (struct vnic *)device->priv; + device->trans_start = jiffies; + + /* netpath_tx_timeout(vnic->current_path); */ + VNIC_ERROR("vnic_tx_timeout\n"); + + return; +} + +static void vnic_set_multicast_list(struct net_device *device) +{ + struct vnic *vnic; + unsigned long flags; + + VNIC_FUNCTION("vnic_set_multicast_list()\n"); + vnic = (struct vnic *)device->priv; + + spin_lock_irqsave(&vnic->lock, flags); + /* the vnic_link_evt thread also needs to be able to access + * mc_list. it is only safe to access the mc_list + * in the netdevice from this call, so make a local + * copy of it in the vnic. the mc_list is a linked + * list, but my copy is an array where each element's + * next pointer points to the next element. when I + * reallocate the list, I always size it with 10 + * extra elements so I don't have to resize it as + * often. I only downsize the list when it goes empty. + */ + if (device->mc_count == 0) { + if (vnic->mc_list_len) { + vnic->mc_list_len = vnic->mc_count = 0; + kfree(vnic->mc_list); + } + } else { + struct dev_mc_list *mc_list = device->mc_list; + int i; + + if (device->mc_count > vnic->mc_list_len) { + if (vnic->mc_list_len) + kfree(vnic->mc_list); + vnic->mc_list_len = device->mc_count + 10; + vnic->mc_list = (struct dev_mc_list *) + kmalloc(sizeof(struct dev_mc_list) * + vnic->mc_list_len, GFP_ATOMIC); + if (!vnic->mc_list) { + vnic->mc_list_len = vnic->mc_count = 0; + spin_unlock_irqrestore(&vnic->lock, flags); + VNIC_ERROR("failed allocating mc_list\n"); + return; + } + } + vnic->mc_count = device->mc_count; + for (i = 0; i < device->mc_count; i++) { + vnic->mc_list[i] = *mc_list; + vnic->mc_list[i].next = &vnic->mc_list[i + 1]; + mc_list = mc_list->next; + } + } + spin_unlock_irqrestore(&vnic->lock, flags); + netpath_set_multicast(&vnic->primary_path, + vnic->mc_list, vnic->mc_count); + + netpath_set_multicast(&vnic->secondary_path, + vnic->mc_list, vnic->mc_count); + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_SETLINK); + return; +} + +static int vnic_set_mac_address(struct net_device *device, void *addr) +{ + struct vnic *vnic; + struct sockaddr *sockaddr = addr; + + VNIC_FUNCTION("vnic_set_mac_address()\n"); + vnic = (struct vnic *)device->priv; + + if (netif_running(device)) + return -EBUSY; + memcpy(device->dev_addr, sockaddr->sa_data, MAC_ADDR_LEN); + netpath_set_unicast(&vnic->primary_path, sockaddr->sa_data); + netpath_set_unicast(&vnic->secondary_path, sockaddr->sa_data); + vnic->mac_set = 1; + /* I'm assuming that this should work even if nothing is connected + * at the moment. note that this might return before the address has + * actually been changed. + */ + return 0; +} + +static int vnic_change_mtu(struct net_device *device, int mtu) +{ + struct vnic *vnic; + int ret = 0; + + VNIC_FUNCTION("vnic_change_mtu()\n"); + vnic = (struct vnic *)device->priv; + + if ((mtu < netpath_max_mtu(&vnic->primary_path)) + && (mtu < netpath_max_mtu(&vnic->secondary_path))) { + device->mtu = mtu; + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_SETLINK); + } + + return ret; +} + +static int vnic_do_ioctl(struct net_device *device, struct ifreq *ifr, int cmd) +{ + struct vnic *vnic; + int ret = 0; + + VNIC_FUNCTION("vnic_do_ioctl()\n"); + vnic = (struct vnic *)device->priv; + + /* TBD */ + + return ret; +} + +static int vnic_set_config(struct net_device *device, struct ifmap *map) +{ + struct vnic *vnic; + int ret = 0; + + VNIC_FUNCTION("vnic_set_config()\n"); + vnic = (struct vnic *)device->priv; + + /* TBD */ + + return ret; +} + +DECLARE_WAIT_QUEUE_HEAD(vnic_npevent_queue); +LIST_HEAD(vnic_npevent_list); +DECLARE_COMPLETION(vnic_npevent_thread_exit); +spinlock_t vnic_npevent_list_lock = SPIN_LOCK_UNLOCKED; +int vnic_npevent_thread = -1; +int vnic_npevent_thread_end = 0; + +void vnic_npevent_init(struct vnic *vnic) +{ + int i; + + for (i = 0; i < VNICNP_NUM_EVENTS; i++) { + INIT_LIST_HEAD(&(vnic->npevents[i].list_ptrs)); + vnic->npevents[i].vnic = vnic; + } +} + +static BOOLEAN vnic_npevent_register(struct vnic *vnic, struct netpath *netpath) +{ + if (!vnic->mac_set) { + /* if netpath == secondary_path, then the primary path isn't + * connected. MAC address will be set when the primary + * connects. + */ + netpath_get_hw_addr(netpath, vnic->netdevice.dev_addr); + netpath_set_unicast(&vnic->secondary_path, + vnic->netdevice.dev_addr); + vnic->mac_set = 1; + } + if (register_netdev(&vnic->netdevice) != 0) { + VNIC_ERROR("failed registering netdev\n"); + return FALSE; + } + vnic->state = VNIC_REGISTERED; + vnic->carrier = 2; /* special value to force netif_carrier_(on|off) */ + return TRUE; +} + +static const char *const vnic_npevent_str[] = { + "PRIMARY CONNECTED", + "PRIMARY DISCONNECTED", + "PRIMARY CARRIER", + "PRIMARY NO CARRIER", + "PRIMARY TIMER EXPIRED", + "SETLINK", + "SECONDARY CONNECTED", + "SECONDARY DISCONNECTED", + "SECONDARY CARRIER", + "SECONDARY NO CARRIER", + "SECONDARY TIMER EXPIRED", + "FREE VNIC", +}; + +static void update_path_and_reconnect(struct netpath *netpath, + struct vnic *vnic) +{ + struct viport_config *config = netpath->viport->config; + BOOLEAN delay = TRUE; + + if (!vnic_ib_get_path(netpath, vnic)) { + return; + } + + /* + * tell viport_connect to wait 10 seconds before connecting if + * we are retrying the same path index within 10 seconds. + * This prevents flooding connect requests to a path (or set + * of paths) that aren't successfully connecting for some reason. + */ + if (jiffies > netpath->connect_time + vnic->config->no_path_timeout) { + netpath->path_idx = config->path_idx; + netpath->connect_time = jiffies; + delay = FALSE; + } else if (config->path_idx != netpath->path_idx) { + delay = FALSE; + } + + viport_connect(netpath->viport, delay); + + return; +} + +static int vnic_npevent_statemachine(void *context) +{ + struct vnic_npevent *vnic_link_evt; + int operation; + int is_secondary; + struct vnic *vnic; + struct netpath *netpath; + int last_carrier; + struct netpath *last_path; + int i; + BOOLEAN other_path_ok; + + daemonize("vnic_link_evt"); + + while (!vnic_npevent_thread_end || !list_empty(&vnic_npevent_list)) { + unsigned long flags; + + wait_event_interruptible(vnic_npevent_queue, + !list_empty(&vnic_npevent_list) + || vnic_npevent_thread_end); + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + if (list_empty(&vnic_npevent_list)) { + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); + VNIC_INFO("netpath statemachine wake on empty list\n"); + continue; + } + vnic_link_evt = + list_entry(vnic_npevent_list.next, struct vnic_npevent, + list_ptrs); + list_del_init(&vnic_link_evt->list_ptrs); + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); + + vnic = vnic_link_evt->vnic; + operation = vnic_link_evt - vnic_link_evt->vnic->npevents; + + VNIC_INFO("%s: processing %s, netpath=%s, carrier=%d\n", + vnic->config->name, + vnic_npevent_str[operation], + netpath_to_string(vnic, vnic->current_path), + vnic->carrier); + + is_secondary = (operation >= VNICNP_SECONDARYOFFSET); + if (is_secondary) { + netpath = &vnic->secondary_path; + } else { + netpath = &vnic->primary_path; + } + + if (vnic->current_path == &vnic->secondary_path) + other_path_ok = vnic->primary_path.carrier; + else if (vnic->current_path == &vnic->primary_path) + other_path_ok = vnic->secondary_path.carrier; + else + other_path_ok = FALSE; + + switch (operation) { + case VNIC_PRINP_CONNECTED: + if (vnic->state == VNIC_UNINITIALIZED) { + if (!vnic_npevent_register(vnic, netpath)) + break; + } + /* FALLTHROUGH : we may need to set MAC address, etc. */ + + case VNIC_SECNP_CONNECTED: + if (vnic->mac_set) { + netpath_set_unicast(netpath, + vnic->netdevice.dev_addr); + } + spin_lock_irqsave(&vnic->lock, flags); + if (vnic->mc_list) { + netpath_set_multicast(netpath, + vnic->mc_list, + vnic->mc_count); + } + spin_unlock_irqrestore(&vnic->lock, flags); + if (vnic->state == VNIC_REGISTERED) { + netpath_set_link(netpath, + vnic->netdevice. + flags & ~IFF_UP, + vnic->netdevice.mtu); + } + break; + + case VNIC_PRINP_TIMEREXPIRED: + netpath->timer_state = NETPATH_TS_EXPIRED; + if (!netpath->carrier) { + update_path_and_reconnect(netpath, vnic); + } + break; + + case VNIC_SECNP_TIMEREXPIRED: + netpath->timer_state = NETPATH_TS_EXPIRED; + if (netpath->carrier) { + if (vnic->state == VNIC_UNINITIALIZED) { + vnic_npevent_register(vnic, netpath); + } + } else { + update_path_and_reconnect(netpath, vnic); + } + break; + + case VNIC_PRINP_LINKUP: + netpath->carrier = 1; + break; + + case VNIC_SECNP_LINKUP: + netpath->carrier = 1; + if (!vnic->carrier) { + switch (netpath->timer_state) { + case NETPATH_TS_IDLE: + netpath->timer_state = + NETPATH_TS_ACTIVE; + if (vnic->state == VNIC_UNINITIALIZED) + netpath_timer(netpath, + vnic->config-> + primary_connect_timeout); + else + netpath_timer(netpath, + vnic->config-> + primary_reconnect_timeout); + break; + case NETPATH_TS_ACTIVE: + /* do nothing */ + break; + case NETPATH_TS_EXPIRED: + if (vnic->state == VNIC_UNINITIALIZED) { + vnic_npevent_register(vnic, + netpath); + } + break; + } + } + break; + + case VNIC_PRINP_LINKDOWN: + netpath->carrier = 0; + break; + case VNIC_SECNP_LINKDOWN: + if (vnic->state == VNIC_UNINITIALIZED) + netpath_timer_stop(netpath); + netpath->carrier = 0; + break; + case VNIC_PRINP_DISCONNECTED: + case VNIC_SECNP_DISCONNECTED: + netpath_timer_stop(netpath); + netpath->carrier = 0; + update_path_and_reconnect(netpath, vnic); + break; + case VNIC_NP_FREEVNIC: + netpath_timer_stop(&vnic->primary_path); + netpath_timer_stop(&vnic->secondary_path); + vnic->current_path = NULL; + netpath_free(&vnic->primary_path); + netpath_free(&vnic->secondary_path); + if (vnic->state == VNIC_REGISTERED) { + unregister_netdev(&vnic->netdevice); + } + for (i = 0; i < VNICNP_NUM_EVENTS; i++) { + list_del_init(&vnic->npevents[i].list_ptrs); + } + config_free_vnic(vnic->config); + if (vnic->mc_list_len) { + vnic->mc_list_len = vnic->mc_count = 0; + kfree(vnic->mc_list); + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + class_device_unregister(&vnic->stat_info.class_dev); + wait_for_completion(&vnic->stat_info.released); +#endif /*CONFIG_INFINIBAND_VNIC_STATS*/ + class_device_unregister(&vnic->class_dev_info. + class_dev); + wait_for_completion(&vnic->class_dev_info.released); + + kfree(vnic); + vnic = NULL; + break; + case VNIC_NP_SETLINK: + if (vnic->current_path) { + netpath_set_link(vnic->current_path, + vnic->netdevice.flags, + vnic->netdevice.mtu); + } + break; + } + + if (!vnic) + continue; + + last_carrier = vnic->carrier; + last_path = vnic->current_path; + + if (!(vnic->current_path) || !vnic->current_path->carrier) { + vnic->carrier = 0; + vnic->current_path = NULL; + vnic->netdevice.features &= ~NETIF_F_IP_CSUM; + } + + if (!vnic->carrier) { + if (vnic->primary_path.carrier) { + vnic->carrier = 1; + vnic->current_path = &vnic->primary_path; + if (last_path + && last_path != vnic->current_path) + printk(KERN_INFO PFX + "%s: failing over to" + " primary path\n", + vnic->config->name); + else if (!last_path) + printk(KERN_INFO PFX + "%s: using primary path\n", + vnic->config->name); + + if (vnic->config->use_tx_csum + && netpath_can_tx_csum(vnic-> + current_path)) { + vnic->netdevice.features |= + NETIF_F_IP_CSUM; + } + } else if ((vnic->secondary_path.carrier) && + (vnic->secondary_path.timer_state != + NETPATH_TS_ACTIVE)) { + vnic->carrier = 1; + vnic->current_path = &vnic->secondary_path; + if (last_path + && last_path != vnic->current_path) + printk(KERN_INFO PFX + "%s: failing over to" + " secondary path\n", + vnic->config->name); + + else if (!last_path) + printk(KERN_INFO PFX + "%s: using secondary path\n", + vnic->config->name); + + if (vnic->config->use_tx_csum + && netpath_can_tx_csum(vnic-> + current_path)) { + vnic->netdevice.features |= + NETIF_F_IP_CSUM; + } + } + } else if ((vnic->current_path != &vnic->primary_path) && + (vnic->config->prefer_primary) && + (vnic->primary_path.carrier)) { + switch (vnic->primary_path.timer_state) { + case NETPATH_TS_ACTIVE: + /* nothing to do. just wait */ + break; + case NETPATH_TS_IDLE: + netpath_timer(&vnic->primary_path, + vnic->config-> + primary_switch_timeout); + break; + case NETPATH_TS_EXPIRED: + printk(KERN_INFO PFX + "%s: switching to primary path\n", + vnic->config->name); + + vnic->current_path = &vnic->primary_path; + if (vnic->config->use_tx_csum + && netpath_can_tx_csum(vnic-> + current_path)) { + vnic->netdevice.features |= + NETIF_F_IP_CSUM; + } + break; + } + } + if (last_path) { + if (!vnic->current_path) { + if (last_path == &vnic->primary_path) + printk(KERN_INFO PFX + "%s: primary path lost, " + "no failover path available\n", + vnic->config->name); + + else + printk(KERN_INFO PFX + "%s: secondary path lost, " + "no failover path available\n", + vnic->config->name); + } else if (last_path == vnic->current_path) { + if (vnic->current_path == &vnic->secondary_path) { + if (other_path_ok != + vnic->primary_path.carrier) { + if (other_path_ok) + printk(KERN_INFO PFX + "%s: primary " + "path no longer" + " available for" + " failover\n", + vnic->config-> + name); + else + printk(KERN_INFO PFX + "%s: primary " + "path now" + " available for" + " failover\n", + vnic->config-> + name); + } + } else { + if (other_path_ok != + vnic->secondary_path.carrier) { + if (other_path_ok) + printk(KERN_INFO PFX + "%s: secondary " + "path no longer" + " available for" + " failover\n", + vnic->config-> + name); + else + printk(KERN_INFO PFX + "%s: secondary " + "path now" + " available for" + " failover\n", + vnic->config-> + name); + } + } + } + } + + VNIC_INFO("new netpath=%s, carrier=%d\n", + netpath_to_string(vnic, vnic->current_path), + vnic->carrier); + + if (vnic->current_path != last_path) { + if (last_path == NULL) { + if (vnic->current_path == &vnic->primary_path) { + last_path = &vnic->secondary_path; + } else { + last_path = &vnic->primary_path; + } + } + if (vnic->current_path) { + netpath_set_link(vnic->current_path, + vnic->netdevice.flags, + vnic->netdevice.mtu); + } + netpath_set_link(last_path, + vnic->netdevice.flags & ~IFF_UP, + vnic->netdevice.mtu); + vnic_restart_xmit(vnic, vnic->current_path); + } + if (vnic->carrier != last_carrier) { + if (vnic->carrier) { + VNIC_INFO("netif_carrier_on\n"); + netif_carrier_on(&vnic->netdevice); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (vnic->statistics.carrier_ref != 0) { + vnic->statistics.carrier_off_time += + get_cycles() - + vnic->statistics.carrier_ref; + vnic->statistics.carrier_off_num++; + vnic->statistics.carrier_ref = 0; + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + } else { + VNIC_INFO("netif_carrier_off\n"); + netif_carrier_off(&vnic->netdevice); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (!vnic->statistics.disconn_ref) { + vnic->statistics.disconn_ref = + get_cycles(); + } + if (vnic->statistics.carrier_ref == 0) { + vnic->statistics.carrier_ref = + get_cycles(); + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + } + + } + } + complete_and_exit(&vnic_npevent_thread_exit, 0); + return 0; +} + +void vnic_npevent_queue_evt(struct netpath *netpath, int evt_num) +{ + struct vnic *vnic = netpath->parent; + struct list_head *l = + &vnic->npevents[evt_num + netpath->second_bias].list_ptrs; + unsigned long flags; + + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + list_del_init(l); + list_add_tail(l, &vnic_npevent_list); + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); + wake_up(&vnic_npevent_queue); +} + +void vnic_npevent_dequeue_evt(struct netpath *netpath, int evt_num) +{ + struct vnic *vnic = netpath->parent; + struct list_head *l = + &vnic->npevents[evt_num + netpath->second_bias].list_ptrs; + unsigned long flags; + + spin_lock_irqsave(&vnic_npevent_list_lock, flags); + if (!list_empty(l)) { + list_del_init(l); + } + spin_unlock_irqrestore(&vnic_npevent_list_lock, flags); +} + +BOOLEAN vnic_npevent_start() +{ + VNIC_FUNCTION("vnic_npevent_start()\n"); + + if ((vnic_npevent_thread = + kernel_thread(vnic_npevent_statemachine, NULL, 0)) < 0) { + return FALSE; + } + return TRUE; +} + +void vnic_npevent_cleanup() +{ + if (vnic_npevent_thread >= 0) { + vnic_npevent_thread_end = 1; + wake_up(&vnic_npevent_queue); + wait_for_completion(&vnic_npevent_thread_exit); + } + return; +} + +struct vnic *vnic_allocate(struct vnic_config *config) +{ + struct vnic *vnic = NULL; + struct net_device *device; + + VNIC_FUNCTION("vnic_allocate()\n"); + vnic = (struct vnic *)kmalloc(sizeof(struct vnic), GFP_KERNEL); + if (!vnic) { + VNIC_ERROR("failed allocating vnic structure\n"); + goto failure; + } + memset(vnic, 0, sizeof(struct vnic)); + vnic->lock = SPIN_LOCK_UNLOCKED; + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + vnic->statistics.start_time = get_cycles(); +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + vnic->state = VNIC_UNINITIALIZED; + vnic->config = config; + device = &vnic->netdevice; + + strcpy(device->name, config->name); + + ether_setup(device); + + /* FUTURE: + * ether_setup sets the following values. these + * may need to be overridden in the future. + */ + /* device->hard_header_len set to 14 for ethernet */ + /* device->mtu set to 1500 for ethernet */ + /* device->tx_queue_len defaults to 100 for ethernet */ + /* device->type defaults to ARPHRD_ETHER for ethernet */ + /* device->addr_len set to 6 octets for ethernet */ + /* device->broadcast set to 0xffffffffffff for ethernet */ + /* device->family defaults to AF_INET for ethernet */ + /* device->pa_alen length of family address len (4) */ + /* device->pa_addr set by ifconfig, do not modify */ + /* device->pa_brdaddr set by ifconfig, do not modify */ + /* device->pa_mask set by ifconfig, do not modify */ + /* device->pa_dstaddr dest for p-to-p, set by ifconfig, do not modify */ + /* device->flags use default flags for now */ + + device->priv = (void *)vnic; + device->get_stats = vnic_get_stats; + device->open = vnic_open; + device->stop = vnic_stop; + device->hard_start_xmit = vnic_hard_start_xmit; + device->tx_timeout = vnic_tx_timeout; + device->set_multicast_list = vnic_set_multicast_list; + device->set_mac_address = vnic_set_mac_address; + device->change_mtu = vnic_change_mtu; + device->do_ioctl = vnic_do_ioctl; + device->set_config = vnic_set_config; + device->watchdog_timeo = HZ; /* 1 second */ + /* TBD: do I want the NETIF_F_DYNALLOC feature? */ + device->features = 0; + + netpath_init(&vnic->primary_path, vnic, 0); + netpath_init(&vnic->secondary_path, vnic, VNICNP_SECONDARYOFFSET); + + vnic->current_path = NULL; + + vnic_npevent_init(vnic); + + list_add_tail(&vnic->list_ptrs, &vnic_list); + + return vnic; +failure: + config_free_vnic(vnic->config); + return NULL; +} + +void vnic_free(struct vnic *vnic) +{ + VNIC_FUNCTION("vnic_free()\n"); + list_del(&vnic->list_ptrs); + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_FREEVNIC); + return; +} + +static void __exit vnic_cleanup(void) +{ + VNIC_FUNCTION("vnic_cleanup()\n"); + + VNIC_INIT("unloading %s\n", MODULEDETAILS); + + while (!list_empty(&vnic_list)) { + struct vnic *vnic = + list_entry(vnic_list.next, struct vnic, list_ptrs); + vnic_free(vnic); + } + + vnic_npevent_cleanup(); + viport_cleanup(); + vnic_ib_cleanup(); + config_cleanup(); + + return; +} + +static int __init vnic_init(void) +{ + VNIC_FUNCTION("vnic_init()\n"); + VNIC_INIT("Initializing %s\n", MODULEDETAILS); + + if (config_start() == FALSE) { + VNIC_ERROR("config_start failed\n"); + goto failure; + } + if (vnic_ib_init() == FALSE) { + VNIC_ERROR("ib_start failed\n"); + goto failure; + } + if (viport_start() == FALSE) { + VNIC_ERROR("viport_start failed\n"); + goto failure; + } + if (vnic_npevent_start() == FALSE) { + VNIC_ERROR("vnic_npevent_start failed\n"); + goto failure; + } + + return 0; +failure: + vnic_cleanup(); + return -ENODEV; +} + +module_init(vnic_init); +module_exit(vnic_cleanup); diff --git a/drivers/infiniband/ulp/vnic/vnic_main.h b/drivers/infiniband/ulp/vnic/vnic_main.h new file mode 100644 index 0000000..b48c2cf --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_main.h @@ -0,0 +1,152 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_MAIN_H_INCLUDED +#define VNIC_MAIN_H_INCLUDED + +#include + +#include "vnic_config.h" +#include "vnic_netpath.h" + +#define stringize(x) #x +#define add_quotes(x) stringize(x) + + +/* keep in sync with names in vnic_main.c vnic_npevent_str[] */ + +enum vnic_npevent_pos { + VNICNP_CONNECTED = 0, + VNICNP_DISCONNECTED, + VNICNP_LINKUP, + VNICNP_LINKDOWN, + VNICNP_TIMEREXPIRED, + VNICNP_UNIVERSAL1, + /* SECONDARYOFFSET MUST ALWAYS COME AT THE END */ + VNICNP_SECONDARYOFFSET +}; + +#define VNICNP_NUM_EVENTS (2 * VNICNP_SECONDARYOFFSET) + +#define VNIC_PRINP_CONNECTED VNICNP_CONNECTED +#define VNIC_PRINP_DISCONNECTED VNICNP_DISCONNECTED +#define VNIC_PRINP_LINKUP VNICNP_LINKUP +#define VNIC_PRINP_LINKDOWN VNICNP_LINKDOWN +#define VNIC_PRINP_TIMEREXPIRED VNICNP_TIMEREXPIRED +#define VNIC_NP_SETLINK VNICNP_UNIVERSAL1 + +#define VNIC_SECNP_CONNECTED (VNICNP_CONNECTED + VNICNP_SECONDARYOFFSET) +#define VNIC_SECNP_DISCONNECTED (VNICNP_DISCONNECTED + VNICNP_SECONDARYOFFSET) +#define VNIC_SECNP_LINKUP (VNICNP_LINKUP + VNICNP_SECONDARYOFFSET) +#define VNIC_SECNP_LINKDOWN (VNICNP_LINKDOWN + VNICNP_SECONDARYOFFSET) +#define VNIC_SECNP_TIMEREXPIRED (VNICNP_TIMEREXPIRED + VNICNP_SECONDARYOFFSET) +#define VNIC_NP_FREEVNIC (VNICNP_UNIVERSAL1 + VNICNP_SECONDARYOFFSET) + +struct vnic_npevent { + struct list_head list_ptrs; + struct vnic *vnic; +}; + +void vnic_npevent_init(struct vnic *); + +BOOLEAN vnic_npevent_start(void); + +void vnic_npevent_cleanup(void); +void vnic_npevent_queue_evt(struct netpath *netpath, int evt_num); +void vnic_npevent_dequeue_evt(struct netpath *netpath, int evt_num); + +enum vnic_state { + VNIC_UNINITIALIZED, + VNIC_REGISTERED, +}; + +struct vnic { + struct list_head list_ptrs; + enum vnic_state state; + struct vnic_config *config; + struct netpath *current_path; + struct netpath primary_path; + struct netpath secondary_path; + int open; + int carrier; + int xmit_started; + int mac_set; + struct net_device_stats stats; + struct net_device netdevice; + struct class_dev_info class_dev_info; + struct dev_mc_list *mc_list; + int mc_list_len; + int mc_count; + spinlock_t lock; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + struct { + cycles_t start_time; + cycles_t conn_time; + cycles_t disconn_ref; /* intermediate time */ + cycles_t disconn_time; + u32 disconn_num; + cycles_t xmit_time; + u32 xmit_num; + u32 xmit_fail; + cycles_t recv_time; + u32 recv_num; + cycles_t xmit_ref; /* intermediate time */ + cycles_t xmit_off_time; + u32 xmit_off_num; + cycles_t carrier_ref; /* intermediate time */ + cycles_t carrier_off_time; + u32 carrier_off_num; + } statistics; + struct class_dev_info stat_info; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + struct vnic_npevent npevents[VNICNP_NUM_EVENTS]; +}; + +; + +struct vnic *vnic_allocate(struct vnic_config *config); + +void vnic_free(struct vnic *vnic); + +void vnic_connected(struct vnic *vnic, struct netpath *netpath); +void vnic_disconnected(struct vnic *vnic, struct netpath *netpath); + +void vnic_link_up(struct vnic *vnic, struct netpath *netpath); +void vnic_link_down(struct vnic *vnic, struct netpath *netpath); + +void vnic_stop_xmit(struct vnic *vnic, struct netpath *netpath); +void vnic_restart_xmit(struct vnic *vnic, struct netpath *netpath); + +void vnic_recv_packet(struct vnic *vnic, struct netpath *netpath, + struct sk_buff *skb); + +#endif /* VNIC_MAIN_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:03:55 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:33:55 +0530 Subject: [openib-general] [PATCH 2/10] Driver netpath files - abstraction of connection to VEx Message-ID: <4521BE03.25916.4E40AAF9@rkuchimanchi.silverstorm.com> Adds the driver netpath files. These files implement the netpath layer. Netpath is an an abstraction of a connection to the VEx. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_netpath.c | 250 ++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_netpath.h | 103 ++++++++++++ 2 files changed, 353 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_netpath.c b/drivers/infiniband/ulp/vnic/vnic_netpath.c new file mode 100644 index 0000000..e02d602 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_netpath.c @@ -0,0 +1,250 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_viport.h" +#include "vnic_netpath.h" + +void netpath_init(struct netpath *netpath, struct vnic *vnic, int second_bias) +{ + netpath->parent = vnic; + netpath->carrier = 0; + netpath->viport = NULL; + netpath->second_bias = second_bias; + netpath->timer_state = NETPATH_TS_IDLE; + init_timer(&netpath->timer); + return; +} + +void vnic_npevent_timeout(unsigned long data) +{ + struct netpath *netpath = (struct netpath *)data; + vnic_npevent_queue_evt(netpath, VNICNP_TIMEREXPIRED); +} + +void netpath_timer(struct netpath *netpath, int timeout) +{ + if (netpath->timer_state == NETPATH_TS_ACTIVE) { + del_timer_sync(&netpath->timer); + } + if (timeout) { + init_timer(&netpath->timer); + netpath->timer_state = NETPATH_TS_ACTIVE; + netpath->timer.expires = jiffies + timeout; + netpath->timer.data = (unsigned long)netpath; + netpath->timer.function = vnic_npevent_timeout; + add_timer(&netpath->timer); + } else { + vnic_npevent_timeout((unsigned long)netpath); + } + return; +} + +void netpath_timer_stop(struct netpath *netpath) +{ + if (netpath->timer_state == NETPATH_TS_ACTIVE) { + del_timer_sync(&netpath->timer); + vnic_npevent_dequeue_evt(netpath, VNICNP_TIMEREXPIRED); + netpath->timer_state = NETPATH_TS_IDLE; + } +} + +void netpath_free(struct netpath *netpath) +{ + if (netpath->viport) { + netpath_remove_path(netpath, netpath->viport); + class_device_unregister(&netpath->class_dev_info.class_dev); + wait_for_completion(&netpath->class_dev_info.released); + } + + return; +} + +BOOLEAN netpath_add_path(struct netpath * netpath, struct viport * viport) +{ + if (netpath->viport) { + return FALSE; + } else { + netpath->viport = viport; + viport_set_parent(viport, netpath); + return TRUE; + } +} + +BOOLEAN netpath_remove_path(struct netpath * netpath, struct viport * viport) +{ + if (netpath->viport != viport) { + return FALSE; + } else { + netpath->viport = NULL; + viport_unset_parent(viport, netpath); + return TRUE; + } +} + +void netpath_connected(struct netpath *netpath, struct viport *viport) +{ + vnic_connected(netpath->parent, netpath); + return; +} + +void netpath_disconnected(struct netpath *netpath, struct viport *viport) +{ + vnic_disconnected(netpath->parent, netpath); + return; +} + +BOOLEAN netpath_set_link(struct netpath * netpath, u16 flags, u16 mtu) +{ + BOOLEAN ret = FALSE; + + NETPATH_INFO("set %s receiver=%s. mtu=%d\n", + netpath_to_string(netpath->parent, netpath), + (flags & IFF_UP) ? "ON" : "OFF", mtu); + if (netpath->viport) { + ret = viport_set_link(netpath->viport, flags, mtu); + } + return ret; +} + +BOOLEAN netpath_get_stats(struct netpath * netpath, + struct net_device_stats * stats) +{ + BOOLEAN ret = FALSE; + + if (netpath->viport) { + ret = viport_get_stats(netpath->viport, stats); + } + return ret; +} + +BOOLEAN netpath_set_unicast(struct netpath * netpath, u8 * address) +{ + BOOLEAN ret = FALSE; + + NETPATH_INFO("set %s MAC to %02X:%02X:%02X:%02X:%02X:%02X\n", + netpath_to_string(netpath->parent, netpath), + address[0], + address[1], + address[2], address[3], address[4], address[5]); + if (netpath->viport) { + ret = viport_set_unicast(netpath->viport, address); + } + return ret; +} + +BOOLEAN netpath_set_multicast(struct netpath * netpath, + struct dev_mc_list * mc_list, int mc_count) +{ + BOOLEAN ret = FALSE; + + if (netpath->viport) { + ret = viport_set_multicast(netpath->viport, mc_list, mc_count); + } + return ret; +} + +int netpath_max_mtu(struct netpath *netpath) +{ + int ret = MAX_PARAM_VALUE; + + if (netpath->viport) { + ret = viport_max_mtu(netpath->viport); + } + return ret; +} + +BOOLEAN netpath_xmit_packet(struct netpath * netpath, struct sk_buff * skb) +{ + BOOLEAN ret = FALSE; + if (netpath->viport) { + ret = viport_xmit_packet(netpath->viport, skb); + } + return ret; +} + +void netpath_link_up(struct netpath *netpath, struct viport *viport) +{ + vnic_link_up(netpath->parent, netpath); + return; +} + +void netpath_link_down(struct netpath *netpath, struct viport *viport) +{ + vnic_link_down(netpath->parent, netpath); + return; +} + +void netpath_stop_xmit(struct netpath *netpath, struct viport *viport) +{ + vnic_stop_xmit(netpath->parent, netpath); + return; +} + +void netpath_restart_xmit(struct netpath *netpath, struct viport *viport) +{ + vnic_restart_xmit(netpath->parent, netpath); + return; +} + + +/* viport on input calls this */ +void netpath_recv_packet(struct netpath *netpath, struct sk_buff *skb) +{ + + vnic_recv_packet(netpath->parent, netpath, skb); + return; +} + +void netpath_tx_timeout(struct netpath *netpath) +{ + if (netpath->viport) { + viport_failure(netpath->viport); + } +} + +const char *netpath_to_string(struct vnic *vnic, struct netpath *netpath) +{ + if (!netpath) { + return "NULL"; + } else if (netpath == &vnic->primary_path) { + return "PRIMARY"; + } else if (netpath == &vnic->secondary_path) { + return "SECONDARY"; + } else { + return "UNKNOWN"; + } +} diff --git a/drivers/infiniband/ulp/vnic/vnic_netpath.h b/drivers/infiniband/ulp/vnic/vnic_netpath.h new file mode 100644 index 0000000..707e3a1 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_netpath.h @@ -0,0 +1,103 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_NETPATH_H_INCLUDED +#define VNIC_NETPATH_H_INCLUDED + +#include + +#include "vnic_sys.h" + +struct viport; + +enum netpath_ts { + NETPATH_TS_IDLE, + NETPATH_TS_ACTIVE, + NETPATH_TS_EXPIRED +}; + +struct netpath { + int carrier; + struct vnic *parent; + struct viport *viport; + size_t path_idx; + u32 connect_time; + int second_bias; + struct timer_list timer; + enum netpath_ts timer_state; + struct class_dev_info class_dev_info; +}; + +void netpath_init(struct netpath *netpath, struct vnic *vnic, int second_bias); +void netpath_free(struct netpath *netpath); + +BOOLEAN netpath_add_path(struct netpath *netpath, struct viport *viport); +BOOLEAN netpath_remove_path(struct netpath *netpath, struct viport *viport); + +void netpath_connected(struct netpath *netpath, struct viport *viport); +void netpath_disconnected(struct netpath *netpath, struct viport *viport); + +BOOLEAN netpath_set_link(struct netpath *netpath, u16 flags, u16 mtu); +BOOLEAN netpath_get_stats(struct netpath *netpath, + struct net_device_stats *stats); + +BOOLEAN netpath_set_unicast(struct netpath *netpath, u8 * address); +BOOLEAN netpath_set_multicast(struct netpath *netpath, + struct dev_mc_list *mc_list, int mc_count); + +int netpath_max_mtu(struct netpath *netpath); + +BOOLEAN netpath_xmit_packet(struct netpath *netpath, struct sk_buff *skb); + +void netpath_link_up(struct netpath *netpath, struct viport *viport); +void netpath_link_down(struct netpath *netpath, struct viport *viport); + +void netpath_stop_xmit(struct netpath *netpath, struct viport *viport); +void netpath_restart_xmit(struct netpath *netpath, struct viport *viport); + +void netpath_recv_packet(struct netpath *netpath, struct sk_buff *skb); + +void netpath_kick(struct netpath *netpath); + +void netpath_timer(struct netpath *netpath, int timeout); +void netpath_timer_stop(struct netpath *netpath); + +void netpath_tx_timeout(struct netpath *netpath); + +const char *netpath_to_string(struct vnic *vnic, struct netpath *netpath); + +#define netpath_get_hw_addr(netpath, address) \ + viport_get_hw_addr((netpath)->viport, address) +#define netpath_is_connected(netpath) (netpath->state == NETPATH_CONNECTED) +#define netpath_can_tx_csum(netpath) viport_can_tx_csum(netpath->viport) + +#endif /* VNIC_NETPATH_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:05:00 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:35:00 +0530 Subject: [openib-general] [PATCH 3/10] Driver viport files - implementation of communication protocol with VEx Message-ID: <4521BE44.28605.4E41A836@rkuchimanchi.silverstorm.com> Adds the driver viport files. These files implement the state machine for the communication protocol with the VEx. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_viport.c | 936 +++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_viport.h | 175 +++++ 2 files changed, 1111 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_viport.c b/drivers/infiniband/ulp/vnic/vnic_viport.c new file mode 100644 index 0000000..516e802 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_viport.c @@ -0,0 +1,936 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_viport.h" +#include "vnic_netpath.h" +#include "vnic_control.h" +#include "vnic_data.h" +#include "vnic_config.h" +#include "vnic_control_pkt.h" + +DECLARE_WAIT_QUEUE_HEAD(viport_queue); +LIST_HEAD(viport_list); +DECLARE_COMPLETION(viport_thread_exit); +spinlock_t viport_list_lock = SPIN_LOCK_UNLOCKED; + +int viport_thread = -1; +int viport_thread_end = 0; + +struct viport *viport_allocate(struct viport_config *config) +{ + struct viport *viport; + + VIPORT_FUNCTION("viport_allocate()\n"); + viport = (struct viport *)kmalloc(sizeof(struct viport), GFP_KERNEL); + if (!viport) { + VIPORT_ERROR("failed allocating viport structure\n"); + config_free_viport(viport->config); + return NULL; + } + memset(viport, 0, sizeof(struct viport)); + + viport->state = VIPORT_DISCONNECTED; + viport->link_state = LINK_RETRYWAIT; + viport->connect = WAIT; + viport->new_mtu = 1500; + viport->new_flags = 0; + viport->config = config; + + spin_lock_init(&viport->lock); + init_waitqueue_head(&viport->stats_queue); + init_waitqueue_head(&viport->disconnect_queue); + INIT_LIST_HEAD(&viport->list_ptrs); + + viport_kick(viport); + + return viport; +} + +BOOLEAN viport_connect(struct viport * viport, BOOLEAN delay) +{ + VIPORT_FUNCTION("viport_connect()\n"); + if (viport->parent == NULL) { + return FALSE; + } + if (delay) + viport->connect = DELAY; + else + viport->connect = NOW; + viport_kick(viport); + return TRUE; +} + +BOOLEAN viport_set_parent(struct viport *viport, struct netpath *netpath) +{ + VIPORT_FUNCTION("viport_set_parent()\n"); + if (viport->parent != NULL) { + return FALSE; + } + + viport->parent = netpath; + viport_kick(viport); + return TRUE; +} + +BOOLEAN viport_unset_parent(struct viport * viport, struct netpath * netpath) +{ + VIPORT_FUNCTION("viport_unset_parent()\n"); + if (viport->parent != netpath) { + return FALSE; + } + viport_free(viport); + return TRUE; +} + +void viport_free(struct viport *viport) +{ + VIPORT_FUNCTION("viport_free()\n"); + viport_disconnect(viport); /* NOTE: this can sleep */ + config_free_viport(viport->config); + kfree(viport); + return; +} + +void viport_disconnect(struct viport *viport) +{ + VIPORT_FUNCTION("viport_disconnect()\n"); + viport->disconnect = 1; + viport_failure(viport); + wait_event(viport->disconnect_queue, viport->disconnect == 0); + return; +} + +BOOLEAN viport_set_link(struct viport * viport, u16 flags, u16 mtu) +{ + unsigned long localflags; + + VIPORT_FUNCTION("viport_set_link()\n"); + if (mtu > data_max_mtu(&viport->data)) { + VIPORT_ERROR("configuration error." + " mtu of %d unsupported by %s\n", mtu, + config_viport_name(viport->config)); + viport_failure(viport); + return FALSE; + } + + spin_lock_irqsave(&viport->lock, localflags); + flags &= IFF_UP | IFF_ALLMULTI | IFF_PROMISC; + if ((viport->new_flags != flags) + || (viport->new_mtu != mtu)) { + viport->new_flags = flags; + viport->new_mtu = mtu; + viport->updates |= NEED_LINK_CONFIG; + viport_kick(viport); + } + + spin_unlock_irqrestore(&viport->lock, localflags); + return TRUE; +} + +BOOLEAN viport_set_unicast(struct viport * viport, u8 * address) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_set_unicast()\n"); + spin_lock_irqsave(&viport->lock, flags); + if (viport->mac_addresses == NULL) { + spin_unlock_irqrestore(&viport->lock, flags); + return FALSE; + } + if (memcmp(viport->mac_addresses[UNICAST_ADDR].address, + address, MAC_ADDR_LEN)) { + memcpy(viport->mac_addresses[UNICAST_ADDR].address, + address, MAC_ADDR_LEN); + viport->mac_addresses[UNICAST_ADDR].operation + = VNIC_OP_SET_ENTRY; + viport->updates |= NEED_ADDRESS_CONFIG; + viport_kick(viport); + } + spin_unlock_irqrestore(&viport->lock, flags); + return TRUE; +} + +BOOLEAN viport_set_multicast(struct viport * viport, + struct dev_mc_list * mc_list, int mc_count) +{ + u32 old_update_list; + int i; + unsigned long flags; + + VIPORT_FUNCTION("viport_set_multicast()\n"); + spin_lock_irqsave(&viport->lock, flags); + if (viport->mac_addresses == NULL) { + spin_unlock_irqrestore(&viport->lock, flags); + return FALSE; + } + old_update_list = viport->updates; + if (mc_count > viport->num_mac_addresses - MCAST_ADDR_START) { + viport->updates |= NEED_LINK_CONFIG | MCAST_OVERFLOW; + } else { + if (viport->updates & MCAST_OVERFLOW) { + viport->updates &= ~MCAST_OVERFLOW; + viport->updates |= NEED_LINK_CONFIG; + } + /* brute force algorithm */ + for (i = MCAST_ADDR_START; + i < mc_count + MCAST_ADDR_START; + i++, mc_list = mc_list->next) { + if (viport->mac_addresses[i].valid && + !memcmp(viport->mac_addresses[i].address, + mc_list->dmi_addr, MAC_ADDR_LEN)) + continue; + memcpy(viport->mac_addresses[i].address, + mc_list->dmi_addr, MAC_ADDR_LEN); + viport->mac_addresses[i].valid = 1; + viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY; + } + for (; i < viport->num_mac_addresses; i++) { + if (!viport->mac_addresses[i].valid) + continue; + viport->mac_addresses[i].valid = 0; + viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY; + } + if (mc_count) + viport->updates |= NEED_ADDRESS_CONFIG; + } + + if (viport->updates != old_update_list) + viport_kick(viport); + spin_unlock_irqrestore(&viport->lock, flags); + return TRUE; +} + +BOOLEAN viport_get_stats(struct viport * viport, + struct net_device_stats * stats) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_get_stats()\n"); + if (jiffies > viport->last_stats_time + + viport->config->stats_interval) { + spin_lock_irqsave(&viport->lock, flags); + viport->updates |= NEED_STATS; + spin_unlock_irqrestore(&viport->lock, flags); + viport_kick(viport); + wait_event(viport->stats_queue, + !(viport->updates & NEED_STATS)); + + if (viport->stats.ethernet_status) { + viport_link_up(viport); + } else { + viport_link_down(viport); + } + } + stats->rx_packets = viport->stats.if_in_ok; + stats->tx_packets = viport->stats.if_out_ok; + stats->rx_bytes = viport->stats.if_in_octets; + stats->tx_bytes = viport->stats.if_out_octets; + stats->rx_errors = viport->stats.if_in_errors; + stats->tx_errors = viport->stats.if_out_errors; + stats->rx_dropped = 0; /* EIOC doesn't track */ + stats->tx_dropped = 0; /* EIOC doesn't track */ + stats->multicast = viport->stats.if_in_nucast_pkts; + stats->collisions = 0; /* EIOC doesn't track */ + + return TRUE; +} + +BOOLEAN viport_xmit_packet(struct viport * viport, struct sk_buff * skb) +{ + BOOLEAN status = FALSE; + unsigned long flags; + + VIPORT_FUNCTION("viport_xmit_packet()\n"); + spin_lock_irqsave(&viport->lock, flags); + if (viport->state == VIPORT_CONNECTED) + status = data_xmit_packet(&viport->data, skb); + spin_unlock_irqrestore(&viport->lock, flags); + return status; +} + +void viport_link_up(struct viport *viport) +{ + VIPORT_FUNCTION("viport_link_up()\n"); + netpath_link_up(viport->parent, viport); + return; +} + +void viport_link_down(struct viport *viport) +{ + VIPORT_FUNCTION("viport_link_down()\n"); + netpath_link_down(viport->parent, viport); + return; +} + +void viport_stop_xmit(struct viport *viport) +{ + VIPORT_FUNCTION("viport_stop_xmit()\n"); + netpath_stop_xmit(viport->parent, viport); + return; +} + +void viport_restart_xmit(struct viport *viport) +{ + VIPORT_FUNCTION("viport_restart_xmit()\n"); + netpath_restart_xmit(viport->parent, viport); + return; +} + +void viport_recv_packet(struct viport *viport, struct sk_buff *skb) +{ + VIPORT_FUNCTION("viport_recv_packet()\n"); + netpath_recv_packet(viport->parent, skb); + return; +} + +void viport_kick(struct viport *viport) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_kick()\n"); + spin_lock_irqsave(&viport_list_lock, flags); + if (list_empty(&viport->list_ptrs)) { + list_add_tail(&viport->list_ptrs, &viport_list); + wake_up(&viport_queue); + } + spin_unlock_irqrestore(&viport_list_lock, flags); + return; +} + +void viport_failure(struct viport *viport) +{ + unsigned long flags; + + VIPORT_FUNCTION("viport_failure()\n"); + spin_lock_irqsave(&viport_list_lock, flags); + viport->errored = 1; + if (list_empty(&viport->list_ptrs)) { + list_add_tail(&viport->list_ptrs, &viport_list); + wake_up(&viport_queue); + } + spin_unlock_irqrestore(&viport_list_lock, flags); + return; +} + +static void viport_timeout(unsigned long data) +{ + struct viport *viport; + + VIPORT_FUNCTION("viport_timeout()\n"); + viport = (struct viport *)data; + viport->timer_active = FALSE; + viport_kick(viport); + return; +} + +static void viport_timer(struct viport *viport, int timeout) +{ + VIPORT_FUNCTION("viport_timer()\n"); + if (viport->timer_active) { + del_timer(&viport->timer); + } + init_timer(&viport->timer); + viport->timer.expires = jiffies + timeout; + viport->timer.data = (unsigned long)viport; + viport->timer.function = viport_timeout; + viport->timer_active = TRUE; + add_timer(&viport->timer); + return; +} + +static void viport_timer_stop(struct viport *viport) +{ + VIPORT_FUNCTION("viport_timer_stop()\n"); + if (viport->timer_active) { + del_timer(&viport->timer); + } + viport->timer_active = FALSE; + return; +} + +static BOOLEAN viport_init_mac_addresses(struct viport *viport) +{ + int i; + struct vnic_address_op *temp; + unsigned long flags; + + VIPORT_FUNCTION("viport_init_mac_addresses()\n"); + i = viport->num_mac_addresses * sizeof(struct vnic_address_op); + temp = (struct vnic_address_op *)kmalloc(i, GFP_KERNEL); + spin_lock_irqsave(&viport->lock, flags); + viport->mac_addresses = temp; + if (!viport->mac_addresses) { + VIPORT_ERROR("failed allocating MAC address table\n"); + goto failure; + } + memset(viport->mac_addresses, '\0', i); + for (i = 0; i < viport->num_mac_addresses; i++) { + viport->mac_addresses[i].index = i; + viport->mac_addresses[i].vlan = viport->default_vlan; + } + memset(viport->mac_addresses[BROADCAST_ADDR].address, + 0xFF, MAC_ADDR_LEN); + viport->mac_addresses[BROADCAST_ADDR].valid = TRUE; + memcpy(viport->mac_addresses[UNICAST_ADDR].address, + viport->hw_mac_address, MAC_ADDR_LEN); + viport->mac_addresses[UNICAST_ADDR].valid = TRUE; + + spin_unlock_irqrestore(&viport->lock, flags); + return TRUE; +failure: + spin_unlock_irqrestore(&viport->lock, flags); + return FALSE; +} + +static int viport_statemachine(void *context) +{ + struct viport *viport; + enum link_state old_link_state; + int res; + + VIPORT_FUNCTION("viport_statemachine()\n"); + daemonize("vnic_viport"); + while (!viport_thread_end || !list_empty(&viport_list)) { + wait_event_interruptible(viport_queue, !list_empty(&viport_list) + || viport_thread_end); + spin_lock_irq(&viport_list_lock); + if (list_empty(&viport_list)) { + spin_unlock_irq(&viport_list_lock); + continue; + } + viport = list_entry(viport_list.next, struct viport, list_ptrs); + list_del_init(&viport->list_ptrs); + spin_unlock_irq(&viport_list_lock); +repeat: + switch (old_link_state = viport->link_state) { + case LINK_UNINITIALIZED: + LINK_STATE("state LINK_UNINITIALIZED\n"); + viport->updates = 0; + wake_up(&viport->stats_queue); + /* in case of going to + * uninitialized put this viport + * back on the serviceQ, delete + * it off again. + */ + spin_lock_irq(&viport_list_lock); + list_del_init(&viport->list_ptrs); + spin_unlock_irq(&viport_list_lock); + viport->disconnect = 0; + wake_up(&viport->disconnect_queue); + break; + case LINK_INITIALIZE: + LINK_STATE("state LINK_INITIALIZE\n"); + viport->errored = 0; + viport->connect = WAIT; + viport->last_stats_time = 0; + if (viport->disconnect) { + viport->link_state = LINK_UNINITIALIZED; + } else { + viport->link_state = LINK_INITIALIZECONTROL; + } + break; + case LINK_INITIALIZECONTROL: + LINK_STATE("state LINK_INITIALIZECONTROL\n"); + viport->pd = ib_alloc_pd(viport->config->ibdev); + if (IS_ERR(viport->pd)) + viport->link_state = LINK_DISCONNECTED; + else if (control_init(&viport->control, viport, + &viport->config->control_config, + viport->pd, viport->port_guid)) { + viport->link_state = LINK_INITIALIZEDATA; + } else { + ib_dealloc_pd(viport->pd); + viport->link_state = LINK_DISCONNECTED; + } + break; + case LINK_INITIALIZEDATA: + LINK_STATE("state LINK_INITIALIZEDATA\n"); + if (data_init(&viport->data, viport, + &viport->config->data_config, + viport->pd, viport->port_guid)) { + viport->link_state = LINK_CONTROLCONNECT; + } else { + viport->link_state = LINK_CLEANUPCONTROL; + } + + break; + case LINK_CONTROLCONNECT: + init_completion(&(viport->control.ib_conn.done)); + if (vnic_ib_cm_connect(&viport->control.ib_conn)) { + viport->link_state = LINK_CONTROLCONNECTWAIT; + } else { + viport->link_state = LINK_CLEANUPDATA; + } + break; + case LINK_CONTROLCONNECTWAIT: + LINK_STATE("state LINK_CONTROLCONNECTWAIT\n"); + wait_for_completion(&(viport->control.ib_conn.done)); + + if (control_is_connected(&viport->control)) { + viport->link_state = LINK_INITVNICREQ; + } + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_CONTROLDISCONNECT; + } + break; + + case LINK_INITVNICREQ: + LINK_STATE("state LINK_INITVNICREQ\n"); + if (control_init_vnic_req(&viport->control)) { + viport->link_state = LINK_INITVNICRSP; + } else { + viport->link_state = LINK_RESETCONTROL; + } + break; + case LINK_INITVNICRSP: + LINK_STATE("state LINK_INITVNICRSP\n"); + + control_process_async(&viport->control); + + if (control_init_vnic_rsp(&viport->control, + &viport->features_supported, + viport->hw_mac_address, + &viport->num_mac_addresses, + &viport->default_vlan)) { + if (viport_init_mac_addresses(viport)) { + viport->link_state = LINK_BEGINDATAPATH; + } else { + viport->link_state = LINK_RESETCONTROL; + } + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESETCONTROL; + } + break; + case LINK_BEGINDATAPATH: + LINK_STATE("state LINK_BEGINDATAPATH\n"); + viport->link_state = LINK_CONFIGDATAPATHREQ; + break; + case LINK_CONFIGDATAPATHREQ: + LINK_STATE("state LINK_CONFIGDATAPATHREQ\n"); + if (control_config_data_path_req(&viport->control, + data_path_id(&viport-> + data), + data_host_pool_max + (&viport->data), + data_eioc_pool_max + (&viport->data))) { + viport->link_state = LINK_CONFIGDATAPATHRSP; + } else { + viport->link_state = LINK_RESETCONTROL; + } + break; + case LINK_CONFIGDATAPATHRSP: + LINK_STATE("state LINK_CONFIGDATAPATHRSP\n"); + control_process_async(&viport->control); + + if (control_config_data_path_rsp(&viport->control, + data_host_pool + (&viport->data), + data_eioc_pool + (&viport->data), + data_host_pool_max + (&viport->data), + data_eioc_pool_max + (&viport->data), + data_host_pool_min + (&viport->data), + data_eioc_pool_min + (&viport->data))) { + viport->link_state = LINK_DATACONNECT; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESETCONTROL; + } + break; + case LINK_DATACONNECT: + LINK_STATE("state LINK_DATACONNECT\n"); + init_completion(&viport->data.ib_conn.done); + if (data_connect(&viport->data)) { + viport->link_state = LINK_DATACONNECTWAIT; + } else { + viport->link_state = LINK_RESETCONTROL; + } + break; + case LINK_DATACONNECTWAIT: + LINK_STATE("state LINK_DATACONNECTWAIT\n"); + wait_for_completion(&viport->data.ib_conn.done); + + control_process_async(&viport->control); + + if (data_is_connected(&viport->data)) { + viport->link_state = LINK_XCHGPOOLREQ; + } + + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_XCHGPOOLREQ: + LINK_STATE("state LINK_XCHGPOOLREQ\n"); + if (control_exchange_pools_req(&viport->control, + data_local_pool_addr + (&viport->data), + data_local_pool_rkey + (&viport->data))) { + viport->link_state = LINK_XCHGPOOLRSP; + } else { + viport->link_state = LINK_RESET; + } + break; + case LINK_XCHGPOOLRSP: + LINK_STATE("state LINK_XCHGPOOLRSP\n"); + control_process_async(&viport->control); + + if (control_exchange_pools_rsp(&viport->control, + data_remote_pool_addr + (&viport->data), + data_remote_pool_rkey + (&viport->data))) { + viport->link_state = LINK_INITIALIZED; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_INITIALIZED: + LINK_STATE("state LINK_INITIALIZED\n"); + viport->link_state = LINK_IDLE; + viport->state = VIPORT_CONNECTED; + printk(KERN_INFO PFX + "%s: connection established\n", + config_viport_name(viport->config)); + data_connected(&viport->data); + netpath_connected(viport->parent, viport); + spin_lock_irq(&viport->lock); + viport->mtu = 1500; + viport->flags = 0; + if ((viport->mtu != viport->new_mtu) + || (viport->flags != viport->new_flags)) { + viport->updates |= NEED_LINK_CONFIG; + } + spin_unlock_irq(&viport->lock); + viport->link_state = LINK_IDLE; + break; + case LINK_IDLE: + LINK_STATE("state LINK_IDLE\n"); + if (viport->config->hb_interval) { + viport_timer(viport, + viport->config->hb_interval); + } + viport->link_state = LINK_IDLING; + break; + case LINK_IDLING: + LINK_STATE("state LINK_IDLING\n"); + control_process_async(&viport->control); + + if (viport->errored) { + viport_timer_stop(viport); + viport->errored = 0; + viport->link_state = LINK_RESET; + break; + } + spin_lock_irq(&viport->lock); + if (viport->updates & NEED_LINK_CONFIG) { + viport_timer_stop(viport); + viport->link_state = LINK_CONFIGLINKREQ; + } else if (viport->updates & NEED_ADDRESS_CONFIG) { + viport_timer_stop(viport); + viport->link_state = LINK_CONFIGADDRSREQ; + } else if (viport->updates & NEED_STATS) { + viport_timer_stop(viport); + viport->link_state = LINK_REPORTSTATREQ; + } else if (viport->config->hb_interval) { + if (!viport->timer_active) { + viport->link_state = LINK_HEARTBEATREQ; + } + } + spin_unlock_irq(&viport->lock); + break; + case LINK_CONFIGLINKREQ: + LINK_STATE("state LINK_CONFIGLINKREQ\n"); + spin_lock_irq(&viport->lock); + viport->updates &= ~NEED_LINK_CONFIG; + viport->flags = viport->new_flags; + if (viport->updates & MCAST_OVERFLOW) + viport->flags |= IFF_ALLMULTI; + viport->mtu = viport->new_mtu; + spin_unlock_irq(&viport->lock); + if (control_config_link_req(&viport->control, + viport->flags, + viport->mtu)) { + viport->link_state = LINK_CONFIGLINKRSP; + } else { + viport->link_state = LINK_RESET; + } + break; + case LINK_CONFIGLINKRSP: + LINK_STATE("state LINK_CONFIGLINKRSP\n"); + control_process_async(&viport->control); + + if (control_config_link_rsp(&viport->control, + &viport->flags, + &viport->mtu)) { + viport->link_state = LINK_IDLE; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_CONFIGADDRSREQ: + LINK_STATE("state LINK_CONFIGADDRSREQ\n"); + + spin_lock_irq(&viport->lock); + res = control_config_addrs_req(&viport->control, + viport->mac_addresses, + viport-> + num_mac_addresses); + + if (res > 0) { + viport->updates &= ~NEED_ADDRESS_CONFIG; + viport->link_state = LINK_CONFIGADDRSRSP; + } else if (res == 0) { + viport->link_state = LINK_CONFIGADDRSRSP; + } else { + viport->link_state = LINK_RESET; + } + spin_unlock_irq(&viport->lock); + break; + case LINK_CONFIGADDRSRSP: + LINK_STATE("state LINK_CONFIGADDRSRSP\n"); + control_process_async(&viport->control); + + if (control_config_addrs_rsp(&viport->control)) { + viport->link_state = LINK_IDLE; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_REPORTSTATREQ: + LINK_STATE("state LINK_REPORTSTATREQ\n"); + if (control_report_statistics_req(&viport->control)) { + viport->link_state = LINK_REPORTSTATRSP; + } else { + viport->link_state = LINK_RESET; + } + break; + case LINK_REPORTSTATRSP: + LINK_STATE("state LINK_REPORTSTATRSP\n"); + control_process_async(&viport->control); + + spin_lock_irq(&viport->lock); + if (control_report_statistics_rsp(&viport->control, + &viport->stats)) { + viport->updates &= ~NEED_STATS; + viport->last_stats_time = jiffies; + spin_unlock_irq(&viport->lock); + wake_up(&viport->stats_queue); + viport->link_state = LINK_IDLE; + } else { + spin_unlock_irq(&viport->lock); + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_HEARTBEATREQ: + LINK_STATE("state LINK_HEARTBEATREQ\n"); + if (control_heartbeat_req(&viport->control, + viport->config->hb_timeout)) { + viport->link_state = LINK_HEARTBEATRSP; + } else { + viport->link_state = LINK_RESET; + } + break; + case LINK_HEARTBEATRSP: + LINK_STATE("state LINK_HEARTBEATRSP\n"); + control_process_async(&viport->control); + + if (control_heartbeat_rsp(&viport->control)) { + viport->link_state = LINK_IDLE; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_RESET; + } + break; + case LINK_RESET: + LINK_STATE("state LINK_RESET\n"); + viport->errored = 0; + spin_lock_irq(&viport->lock); + viport->state = VIPORT_DISCONNECTED; + spin_unlock_irq(&viport->lock); + viport_link_down(viport); + printk(KERN_INFO PFX + "%s: connection lost\n", + config_viport_name(viport->config)); + if (control_reset_req(&viport->control)) { + viport->link_state = LINK_RESETRSP; + } else { + viport->link_state = LINK_DATADISCONNECT; + } + break; + case LINK_RESETRSP: + LINK_STATE("state LINK_RESETRSP\n"); + control_process_async(&viport->control); + + if (control_reset_rsp(&viport->control)) { + viport->link_state = LINK_DATADISCONNECT; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_DATADISCONNECT; + } + break; + case LINK_RESETCONTROL: + LINK_STATE("state LINK_RESETCONTROL\n"); + if (control_reset_req(&viport->control)) { + viport->link_state = LINK_RESETCONTROLRSP; + } else { + viport->link_state = LINK_CONTROLDISCONNECT; + } + break; + case LINK_RESETCONTROLRSP: + LINK_STATE("state LINK_RESETCONTROLRSP\n"); + control_process_async(&viport->control); + + if (control_reset_rsp(&viport->control)) { + viport->link_state = LINK_CONTROLDISCONNECT; + } + if (viport->errored) { + viport->errored = 0; + viport->link_state = LINK_CONTROLDISCONNECT; + } + break; + case LINK_DATADISCONNECT: + LINK_STATE("state LINK_DATADISCONNECT\n"); + data_disconnect(&viport->data); + viport->link_state = LINK_CONTROLDISCONNECT; + break; + case LINK_CONTROLDISCONNECT: + LINK_STATE("state LINK_CONTROLDISCONNECT\n"); + viport->link_state = LINK_CLEANUPDATA; + break; + case LINK_CLEANUPDATA: + LINK_STATE("state LINK_CLEANUPDATA\n"); + data_cleanup(&viport->data); + viport->link_state = LINK_CLEANUPCONTROL; + break; + case LINK_CLEANUPCONTROL: + LINK_STATE("state LINK_CLEANUPCONTROL\n"); + spin_lock_irq(&viport->lock); + if (viport->mac_addresses != NULL) { + kfree(viport->mac_addresses); + viport->mac_addresses = NULL; + } + spin_unlock_irq(&viport->lock); + control_cleanup(&viport->control); + ib_dealloc_pd(viport->pd); + viport->link_state = LINK_DISCONNECTED; + break; + case LINK_DISCONNECTED: + LINK_STATE("state LINK_DISCONNECTED\n"); + netpath_disconnected(viport->parent, viport); + if (viport->disconnect != 0) { + viport->link_state = LINK_UNINITIALIZED; + } else { + viport_timer(viport, CONV2JIFFIES(1000)); + viport->link_state = LINK_RETRYWAIT; + } + break; + case LINK_RETRYWAIT: + LINK_STATE("state LINK_RETRYWAIT\n"); + viport->stats.ethernet_status = 0; + viport->updates = 0; + wake_up(&viport->stats_queue); + if (viport->disconnect != 0) { + viport_timer_stop(viport); + viport->link_state = LINK_UNINITIALIZED; + } else if (viport->connect == DELAY) { + if (!viport->timer_active) { + viport->link_state = LINK_INITIALIZE; + } + } else if (viport->connect == NOW) { + viport_timer_stop(viport); + viport->link_state = LINK_INITIALIZE; + } + break; + } + + /* if state has changed, run through state machine again */ + if (viport->link_state != old_link_state) { + goto repeat; + } + + } + + complete_and_exit(&viport_thread_exit, 0); +} + +BOOLEAN viport_start() +{ + VIPORT_FUNCTION("viport_start()\n"); + if ((viport_thread = + kernel_thread(viport_statemachine, NULL, 0)) < 0) { + return FALSE; + } + return TRUE; +} + +void viport_cleanup() +{ + VIPORT_FUNCTION("viport_cleanup()\n"); + if (viport_thread > 0) { + viport_thread_end = 1; + wake_up(&viport_queue); + wait_for_completion(&viport_thread_exit); + } + return; +} diff --git a/drivers/infiniband/ulp/vnic/vnic_viport.h b/drivers/infiniband/ulp/vnic/vnic_viport.h new file mode 100644 index 0000000..2332dbe --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_viport.h @@ -0,0 +1,175 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_VIPORT_H_INCLUDED +#define VNIC_VIPORT_H_INCLUDED + +#include "vnic_control.h" +#include "vnic_data.h" + +enum viport_state { + VIPORT_DISCONNECTED, + VIPORT_CONNECTED +}; + +enum link_state { + LINK_UNINITIALIZED, + LINK_INITIALIZE, + LINK_INITIALIZECONTROL, + LINK_INITIALIZEDATA, + LINK_CONTROLCONNECT, + LINK_CONTROLCONNECTWAIT, + LINK_INITVNICREQ, + LINK_INITVNICRSP, + LINK_BEGINDATAPATH, + LINK_CONFIGDATAPATHREQ, + LINK_CONFIGDATAPATHRSP, + LINK_DATACONNECT, + LINK_DATACONNECTWAIT, + LINK_XCHGPOOLREQ, + LINK_XCHGPOOLRSP, + LINK_INITIALIZED, + LINK_IDLE, + LINK_IDLING, + LINK_CONFIGLINKREQ, + LINK_CONFIGLINKRSP, + LINK_CONFIGADDRSREQ, + LINK_CONFIGADDRSRSP, + LINK_REPORTSTATREQ, + LINK_REPORTSTATRSP, + LINK_HEARTBEATREQ, + LINK_HEARTBEATRSP, + LINK_RESET, + LINK_RESETRSP, + LINK_RESETCONTROL, + LINK_RESETCONTROLRSP, + LINK_DATADISCONNECT, + LINK_CONTROLDISCONNECT, + LINK_CLEANUPDATA, + LINK_CLEANUPCONTROL, + LINK_DISCONNECTED, + LINK_RETRYWAIT +}; + +#define BROADCAST_ADDR 0 +#define UNICAST_ADDR 1 +#define MCAST_ADDR_START 2 +#define current_mac_address mac_addresses[UNICAST_ADDR].address + +#define NEED_STATS 0x00000001 +#define NEED_ADDRESS_CONFIG 0x00000002 +#define NEED_LINK_CONFIG 0x00000004 +#define MCAST_OVERFLOW 0x00000008 + +struct viport { + struct list_head list_ptrs; + struct netpath *parent; + struct viport_config *config; + struct control control; + struct data data; + u64 ioc_guid; + u64 port_guid; + spinlock_t lock; + struct ib_pd *pd; + enum viport_state state; + enum link_state link_state; + struct vnic_cmd_report_stats_rsp stats; + wait_queue_head_t stats_queue; + u32 last_stats_time; + u32 features_supported; + u8 hw_mac_address[MAC_ADDR_LEN]; + u16 default_vlan; + u16 num_mac_addresses; + struct vnic_address_op *mac_addresses; + u32 updates; + u16 flags; + u16 new_flags; + u16 mtu; + u16 new_mtu; + u32 errored; + enum { WAIT, DELAY, NOW } connect; + u32 disconnect; + wait_queue_head_t disconnect_queue; + BOOLEAN timer_active; + struct timer_list timer; +}; + +BOOLEAN viport_start(void); +void viport_cleanup(void); + +struct viport *viport_allocate(struct viport_config *config); + +BOOLEAN viport_connect(struct viport *viport, BOOLEAN delay); + +BOOLEAN viport_set_parent(struct viport *viport, struct netpath *netpath); +BOOLEAN viport_unset_parent(struct viport *viport, struct netpath *netpath); + +void viport_free(struct viport *viport); +void viport_disconnect(struct viport *viport); + +BOOLEAN viport_set_link(struct viport *viport, u16 flags, u16 mtu); + +BOOLEAN viport_get_stats(struct viport *viport, struct net_device_stats *stats); + +BOOLEAN viport_xmit_packet(struct viport *viport, struct sk_buff *skb); + +void viport_link_up(struct viport *viport); +void viport_link_down(struct viport *viport); + +void viport_stop_xmit(struct viport *viport); +void viport_restart_xmit(struct viport *viport); + +void viport_recv_packet(struct viport *viport, struct sk_buff *skb); + +void viport_kick(struct viport *viport); + +void viport_failure(struct viport *viport); + +BOOLEAN viport_set_unicast(struct viport *viport, u8 * address); +BOOLEAN viport_set_multicast(struct viport *viport, struct dev_mc_list *mc_list, + int mc_count); + +#define viport_port_guid(viport) ((viport)->port_guid) +#define viport_max_mtu(viport) data_max_mtu(&(viport)->data) + +#define viport_get_hw_addr(viport,address) \ + memcpy(address, (viport)->hw_mac_address, MAC_ADDR_LEN) + +#define viport_features(viport) ((viport)->features_supported) + +#define viport_can_tx_csum(viport) \ + (((viport)->features_supported & \ + (VNIC_FEAT_IPV4_CSUM_TX | VNIC_FEAT_TCP_CSUM_TX | \ + VNIC_FEAT_UDP_CSUM_TX)) == (VNIC_FEAT_IPV4_CSUM_TX | \ + VNIC_FEAT_TCP_CSUM_TX | VNIC_FEAT_UDP_CSUM_TX)) + +#endif /* VNIC_VIPORT_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:06:02 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:36:02 +0530 Subject: [openib-general] [PATCH 4/10] Implementation of Control path of the communication protocol Message-ID: <4521BE82.30931.4E429AE3@rkuchimanchi.silverstorm.com> Adds the files that implement the various control messages that are exchanged as part of the communication protocol with the VEx. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_control.c | 1875 ++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_control.h | 145 ++ drivers/infiniband/ulp/vnic/vnic_control_pkt.h | 278 ++++ 3 files changed, 2298 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_control.c b/drivers/infiniband/ulp/vnic/vnic_control.c new file mode 100644 index 0000000..9e71bc7 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_control.c @@ -0,0 +1,1875 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "vnic_util.h" +#include "vnic_main.h" +#include "vnic_viport.h" +#include "vnic_control.h" +#include "vnic_config.h" +#include "vnic_control_pkt.h" + +static void control_log_control_packet(struct vnic_control_packet *pkt); + +static inline char *control_ifcfg_name(struct control *control) +{ + if (!control) + return "nctl"; + if (!control->parent) + return "np"; + if (!control->parent->parent) + return "npp"; + if (!control->parent->parent->parent) + return "nppp"; + if (!control->parent->parent->parent->config) + return "npppc"; + return (control->parent->parent->parent->config->name); +} + +static void control_recv(struct control *control, struct recv_io *recv_io) +{ + if (!vnic_ib_post_recv(&control->ib_conn, &recv_io->io)) { + viport_failure(control->parent); + } + return; +} + +static void control_recv_complete(struct io *io) +{ + struct recv_io *recv_io = (struct recv_io *)io; + struct recv_io *last_recv_io; + struct control *control = &io->viport->control; + struct vnic_control_packet *pkt = control_packet(recv_io); + struct vnic_control_header *c_hdr = &pkt->hdr; + unsigned long flags; + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + cycles_t response_time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + CONTROL_FUNCTION("%s: control_recv_complete()\n", + control_ifcfg_name(control)); + + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + response_time = get_cycles(); +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + CONTROL_PACKET(pkt); + spin_lock_irqsave(&control->io_lock, flags); + if (c_hdr->pkt_type == TYPE_INFO) { + last_recv_io = control->info; + control->info = recv_io; + spin_unlock_irqrestore(&control->io_lock, flags); + viport_kick(control->parent); + if (last_recv_io) + control_recv(control, last_recv_io); + } else if (c_hdr->pkt_type == TYPE_RSP) { + if (control->rsp_expected + && (c_hdr->pkt_seq_num == control->seq_num)) { + control->response = recv_io; + control->rsp_expected = FALSE; + spin_unlock_irqrestore(&control->io_lock, flags); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + response_time -= control->statistics.request_time; + control->statistics.response_time += response_time; + control->statistics.response_num++; + if (control->statistics.response_max < response_time) + control->statistics.response_max = + response_time; + if ((control->statistics.response_min == 0) + || (control->statistics.response_min > + response_time)) + control->statistics.response_min = + response_time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + viport_kick(control->parent); + } else { + spin_unlock_irqrestore(&control->io_lock, flags); + control_recv(control, recv_io); + } + } else { + list_add_tail(&recv_io->io.list_ptrs, &control->failure_list); + spin_unlock_irqrestore(&control->io_lock, flags); + viport_kick(control->parent); + } + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return; +} + +static void control_timeout(unsigned long data) +{ + struct control *control; + + control = (struct control *)PTR(data); + CONTROL_FUNCTION("%s: control_timeout()\n", + control_ifcfg_name(control)); + control->timer_state = TIMER_EXPIRED; + control->rsp_expected = FALSE; + viport_kick(control->parent); + return; +} + +static void control_timer(struct control *control, int timeout) +{ + CONTROL_FUNCTION("%s: control_timer()\n", + control_ifcfg_name(control)); + if (control->timer_state == TIMER_ACTIVE) { + mod_timer(&control->timer, jiffies + timeout); + } else { + init_timer(&control->timer); + control->timer.expires = jiffies + timeout; + control->timer.data = (unsigned long)control; + control->timer.function = control_timeout; + control->timer_state = TIMER_ACTIVE; + add_timer(&control->timer); + } + return; +} + +static void control_timer_stop(struct control *control) +{ + CONTROL_FUNCTION("%s: control_timer_stop()\n", + control_ifcfg_name(control)); + if (control->timer_state == TIMER_ACTIVE) { + del_timer_sync(&control->timer); + } + control->timer_state = TIMER_IDLE; + return; +} + +static BOOLEAN control_send(struct control *control, struct send_io *send_io) +{ + CONTROL_FUNCTION("%s: control_send()\n", control_ifcfg_name(control)); + if (control->req_outstanding) { + CONTROL_ERROR("%s: IB send never completed\n", + control_ifcfg_name(control)); + viport_failure(control->parent); + return FALSE; + } + control->req_outstanding = TRUE; + + control_timer(control, control->config->rsp_timeout); + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + control->statistics.request_time = get_cycles(); +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + if (!vnic_ib_post_send(&control->ib_conn, &control->send_io.io)) { + CONTROL_ERROR("failed to post send\n"); + viport_failure(control->parent); + control->req_outstanding = FALSE; + return FALSE; + } + return TRUE; +} + +static void control_send_complete(struct io *io) +{ + struct control *control = &io->viport->control; + + CONTROL_FUNCTION("%s: control_send_complete()\n", + control_ifcfg_name(control)); + control->req_outstanding = FALSE; + + return; +} + +void control_process_async(struct control *control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + unsigned long flags; + + CONTROL_FUNCTION("%s: control_process_async()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + spin_lock_irqsave(&control->io_lock, flags); + if ((recv_io = control->info) != NULL) { + CONTROL_INFO("%s: processing info packet\n", + control_ifcfg_name(control)); + control->info = NULL; + spin_unlock_irqrestore(&control->io_lock, flags); + pkt = control_packet(recv_io); + if (ntoh8(pkt->hdr.pkt_cmd) == CMD_REPORT_STATUS) { + switch (ntoh32(pkt->cmd.report_status.status_number)) { + case VNIC_STATUS_LINK_UP: + CONTROL_INFO("%s: link up\n", + control_ifcfg_name(control)); + viport_link_up(control->parent); + break; + case VNIC_STATUS_LINK_DOWN: + CONTROL_INFO("%s: link down\n", + control_ifcfg_name(control)); + viport_link_down(control->parent); + break; + default: + CONTROL_ERROR("%s: asynchronous status" + " received from EIOC\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + break; + } + } + if ((ntoh8(pkt->hdr.pkt_cmd) != CMD_REPORT_STATUS) + || ntoh8(pkt->cmd.report_status.is_fatal)) { + viport_failure(control->parent); + } + control_recv(control, recv_io); + spin_lock_irqsave(&control->io_lock, flags); + } + + while (!list_empty(&control->failure_list)) { + CONTROL_INFO("%s: processing error packet\n", + control_ifcfg_name(control)); + recv_io = (struct recv_io *) + list_entry(control->failure_list.next, struct io, + list_ptrs); + list_del(&recv_io->io.list_ptrs); + spin_unlock_irqrestore(&control->io_lock, flags); + pkt = control_packet(recv_io); + CONTROL_ERROR("%s: asynchronous error received from EIOC\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + if ((ntoh8(pkt->hdr.pkt_type) != TYPE_ERR) + || (ntoh8(pkt->hdr.pkt_cmd) != CMD_REPORT_STATUS) + || ntoh8(pkt->cmd.report_status.is_fatal)) { + viport_failure(control->parent); + } + control_recv(control, recv_io); + spin_lock_irqsave(&control->io_lock, flags); + } + spin_unlock_irqrestore(&control->io_lock, flags); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + CONTROL_INFO("%s: done control_process_async\n", + control_ifcfg_name(control)); +} + +static struct send_io *control_init_hdr(struct control *control, u8 cmd) +{ + struct control_config *config; + struct vnic_control_packet *pkt; + struct vnic_control_header *hdr; + + CONTROL_FUNCTION("control_init_hdr()\n"); + config = control->config; + + pkt = control_packet(&control->send_io); + hdr = &pkt->hdr; + + hdr->pkt_type = hton8(TYPE_REQ); + hdr->pkt_cmd = hton8(cmd); + control->seq_num++; + hdr->pkt_seq_num = hton8(control->seq_num); + control->req_retry_counter = 0; + hdr->pkt_retry_count = hton8(control->req_retry_counter); + + return &control->send_io; +} + +static struct recv_io *control_get_rsp(struct control *control) +{ + struct recv_io *recv_io; + unsigned long flags; + + CONTROL_FUNCTION("%s: control_get_rsp()\n", + control_ifcfg_name(control)); + spin_lock_irqsave(&control->io_lock, flags); + if ((recv_io = control->response) != NULL) { + control_timer_stop(control); + control->response = NULL; + spin_unlock_irqrestore(&control->io_lock, flags); + return recv_io; + } + spin_unlock_irqrestore(&control->io_lock, flags); + if (control->timer_state == TIMER_EXPIRED) { + struct vnic_control_packet *pkt = + control_packet(&control->send_io); + struct vnic_control_header *hdr = &pkt->hdr; + + control->timer_state = TIMER_IDLE; + CONTROL_ERROR("%s: no response received from EIOC\n", + control_ifcfg_name(control)); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + control->statistics.timeout_num++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + control->req_retry_counter++; + if (control->req_retry_counter >= + control->config->req_retry_count) { + CONTROL_ERROR("%s: control packet retry exceeded\n", + control_ifcfg_name(control)); + viport_failure(control->parent); + } else { + hdr->pkt_retry_count = + hton8(control->req_retry_counter); + control_send(control, &control->send_io); + } + } + return NULL; +} + +BOOLEAN control_init_vnic_req(struct control *control) +{ + struct send_io *send_io; + struct control_config *config = control->config; + struct vnic_control_packet *pkt; + struct vnic_cmd_init_vnic_req *init_vnic_req; + + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_INIT_VNIC); + if (!send_io) { + dma_sync_single_for_device(control->parent->config->ibdev-> + dma_device, control->send_dma, + control->send_len, DMA_TO_DEVICE); + return FALSE; + } + + pkt = control_packet(send_io); + init_vnic_req = &pkt->cmd.init_vnic_req; + init_vnic_req->vnic_major_version = hton16(VNIC_MAJORVERSION); + init_vnic_req->vnic_minor_version = hton16(VNIC_MINORVERSION); + init_vnic_req->vnic_instance = hton8(config->vnic_instance); + init_vnic_req->num_data_paths = 1; + init_vnic_req->num_address_entries = + hton16(config->max_address_entries); + + CONTROL_PACKET(pkt); + + control->rsp_expected = pkt->hdr.pkt_cmd; + + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_init_vnic_rsp(struct control * control, u32 * features, + u8 * mac_address, u16 * num_addrs, u16 * vlan) +{ + struct recv_io *recv_io; + struct control_config *config = control->config; + struct vnic_control_packet *pkt; + struct vnic_cmd_init_vnic_rsp *init_vnic_rsp; + u8 num_data_paths, num_lan_switches; + + CONTROL_FUNCTION("%s: control_init_vnic_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_INIT_VNIC) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + init_vnic_rsp = &pkt->cmd.init_vnic_rsp; + control->maj_ver = ntoh16(init_vnic_rsp->vnic_major_version); + control->min_ver = ntoh16(init_vnic_rsp->vnic_minor_version); + num_data_paths = ntoh8(init_vnic_rsp->num_data_paths); + num_lan_switches = ntoh8(init_vnic_rsp->num_lan_switches); + *features = ntoh32(init_vnic_rsp->features_supported); + *num_addrs = ntoh16(init_vnic_rsp->num_address_entries); + + if ((control->maj_ver > VNIC_MAJORVERSION) + || ((control->maj_ver == VNIC_MAJORVERSION) + && (control->min_ver > VNIC_MINORVERSION))) { + CONTROL_ERROR("%s: unsupported version\n", + control_ifcfg_name(control)); + goto failure; + } + if (num_data_paths != 1) { + CONTROL_ERROR("%s: EIOC returned too many datapaths\n", + control_ifcfg_name(control)); + goto failure; + } + if (*num_addrs > config->max_address_entries) { + CONTROL_ERROR + ("%s: EIOC returned more address entries than requested\n", + control_ifcfg_name(control)); + goto failure; + } + if (*num_addrs < config->min_address_entries) { + CONTROL_ERROR("%s: not enough address entries\n", + control_ifcfg_name(control)); + goto failure; + } + if (num_lan_switches < 1) { + CONTROL_ERROR("%s: EIOC returned no lan switches\n", + control_ifcfg_name(control)); + goto failure; + } + if (num_lan_switches > 1) { + CONTROL_ERROR("%s: EIOC returned multiple lan switches\n", + control_ifcfg_name(control)); + goto failure; + } + + control->lan_switch.lan_switch_num = + ntoh8(init_vnic_rsp->lan_switch[0].lan_switch_num); + control->lan_switch.num_enet_ports = + ntoh8(init_vnic_rsp->lan_switch[0].num_enet_ports); + control->lan_switch.default_vlan = + ntoh16(init_vnic_rsp->lan_switch[0].default_vlan); + *vlan = control->lan_switch.default_vlan; + memcpy(control->lan_switch.hw_mac_address, + init_vnic_rsp->lan_switch[0].hw_mac_address, MAC_ADDR_LEN); + memcpy(mac_address, init_vnic_rsp->lan_switch[0].hw_mac_address, + MAC_ADDR_LEN); + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; +failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return FALSE; +} + +static void copy_recv_pool_config(struct vnic_recv_pool_config *src, + struct vnic_recv_pool_config *dst) +{ + dst->size_recv_pool_entry = hton32(src->size_recv_pool_entry); + dst->num_recv_pool_entries = hton32(src->num_recv_pool_entries); + dst->timeout_before_kick = hton32(src->timeout_before_kick); + dst->num_recv_pool_entries_before_kick = + hton32(src->num_recv_pool_entries_before_kick); + dst->num_recv_pool_bytes_before_kick = + hton32(src->num_recv_pool_bytes_before_kick); + dst->free_recv_pool_entries_per_update = + hton32(src->free_recv_pool_entries_per_update); + return; +} + +static BOOLEAN check_recv_pool_config_value(u32 *src, u32 *dst, + u32 *max, u32 *min, char *name) +{ + u32 value; + + value = ntoh32(*src); + if (value > *max) { + CONTROL_ERROR("value %s too large\n", name); + return FALSE; + } else if (value < *min) { + CONTROL_ERROR("value %s too small\n", name); + return FALSE; + } + *dst = value; + return TRUE; +} + +static BOOLEAN check_recv_pool_config(struct vnic_recv_pool_config *src, + struct vnic_recv_pool_config *dst, + struct vnic_recv_pool_config *max, + struct vnic_recv_pool_config *min) +{ + if (!check_recv_pool_config_value + (&src->size_recv_pool_entry, &dst->size_recv_pool_entry, + &max->size_recv_pool_entry, &min->size_recv_pool_entry, + "size_recv_pool_entry") + || !check_recv_pool_config_value(&src->num_recv_pool_entries, + &dst->num_recv_pool_entries, + &max->num_recv_pool_entries, + &min->num_recv_pool_entries, + "num_recv_pool_entries") + || !check_recv_pool_config_value(&src->timeout_before_kick, + &dst->timeout_before_kick, + &max->timeout_before_kick, + &min->timeout_before_kick, + "timeout_before_kick") + || !check_recv_pool_config_value(&src-> + num_recv_pool_entries_before_kick, + &dst-> + num_recv_pool_entries_before_kick, + &max-> + num_recv_pool_entries_before_kick, + &min-> + num_recv_pool_entries_before_kick, + "num_recv_pool_entries_before_kick") + || !check_recv_pool_config_value(&src-> + num_recv_pool_bytes_before_kick, + &dst-> + num_recv_pool_bytes_before_kick, + &max-> + num_recv_pool_bytes_before_kick, + &min-> + num_recv_pool_bytes_before_kick, + "num_recv_pool_bytes_before_kick") + || !check_recv_pool_config_value(&src-> + free_recv_pool_entries_per_update, + &dst-> + free_recv_pool_entries_per_update, + &max-> + free_recv_pool_entries_per_update, + &min-> + free_recv_pool_entries_per_update, + "free_recv_pool_entries_per_update")) + return FALSE; + + if (!is_power_of2(dst->num_recv_pool_entries)) { + CONTROL_ERROR("num_recv_pool_entries (%d)" + " must be power of 2\n", + dst->num_recv_pool_entries); + return FALSE; + } + if (!is_power_of2(dst->free_recv_pool_entries_per_update)) { + CONTROL_ERROR("free_recv_pool_entries_per_update (%d)" + " must be power of 2\n", + dst->free_recv_pool_entries_per_update); + return FALSE; + } + if (dst->free_recv_pool_entries_per_update >= + dst->num_recv_pool_entries) { + CONTROL_ERROR("free_recv_pool_entries_per_update (%d) must be" + " less than num_recv_pool_entries (%d)\n", + dst->free_recv_pool_entries_per_update, + dst->num_recv_pool_entries); + return FALSE; + } + if (dst->num_recv_pool_entries_before_kick >= + dst->num_recv_pool_entries) { + CONTROL_ERROR("num_recv_pool_entries_before_kick (%d) must be" + " less than num_recv_pool_entries (%d)\n", + dst->num_recv_pool_entries_before_kick, + dst->num_recv_pool_entries); + return FALSE; + } + + return TRUE; +} + +BOOLEAN control_config_data_path_req(struct control * control, u64 path_id, + struct vnic_recv_pool_config * host, + struct vnic_recv_pool_config * eioc) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_data_path *config_data_path; + + CONTROL_FUNCTION("%s: control_config_data_path_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_CONFIG_DATA_PATH); + if (!send_io) + return FALSE; + + pkt = control_packet(send_io); + config_data_path = &pkt->cmd.config_data_path_req; + config_data_path->data_path = hton8(0); + config_data_path->path_identifier = path_id; + copy_recv_pool_config(host, &config_data_path->host_recv_pool_config); + copy_recv_pool_config(eioc, &config_data_path->eioc_recv_pool_config); + CONTROL_PACKET(pkt); + + control->rsp_expected = pkt->hdr.pkt_cmd; + + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_config_data_path_rsp(struct control * control, + struct vnic_recv_pool_config * host, + struct vnic_recv_pool_config * eioc, + struct vnic_recv_pool_config * max_host, + struct vnic_recv_pool_config * max_eioc, + struct vnic_recv_pool_config * min_host, + struct vnic_recv_pool_config * min_eioc) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_data_path *config_data_path; + + CONTROL_FUNCTION("%s: control_config_data_path_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_CONFIG_DATA_PATH) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + config_data_path = &pkt->cmd.config_data_path_rsp; + if (ntoh8(config_data_path->data_path) != 0) { + CONTROL_ERROR("%s: received CMD_CONFIG_DATA_PATH response" + " for wrong data path: %u\n", + control_ifcfg_name(control), + config_data_path->data_path); + goto failure; + } + if (!check_recv_pool_config(&config_data_path->host_recv_pool_config, + host, max_host, min_host) + || !check_recv_pool_config(&config_data_path->eioc_recv_pool_config, + eioc, max_eioc, min_eioc)) { + goto failure; + } + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; +failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return FALSE; +} + +BOOLEAN control_exchange_pools_req(struct control * control, u64 addr, u32 rkey) +{ + struct send_io *send_io; + struct vnic_cmd_exchange_pools *exchange_pools; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_exchange_pools_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_EXCHANGE_POOLS); + if (!send_io) + return FALSE; + + pkt = control_packet(send_io); + exchange_pools = &pkt->cmd.exchange_pools_req; + exchange_pools->data_path = hton32(0); + exchange_pools->pool_rkey = hton32(rkey); + exchange_pools->pool_addr = hton64(addr); + + control->rsp_expected = pkt->hdr.pkt_cmd; + + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_exchange_pools_rsp(struct control * control, u64 * addr, + u32 * rkey) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_exchange_pools *exchange_pools; + + CONTROL_FUNCTION("%s: control_exchange_pools_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_EXCHANGE_POOLS) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + exchange_pools = &pkt->cmd.exchange_pools_rsp; + *rkey = hton32(exchange_pools->pool_rkey); + *addr = hton64(exchange_pools->pool_addr); + if (hton32(exchange_pools->data_path) != 0) { + CONTROL_ERROR("%s: received CMD_EXCHANGE_POOLS response" + " for wrong data path: %u\n", + control_ifcfg_name(control), + exchange_pools->data_path); + goto failure; + } + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; +failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return FALSE; +} + +BOOLEAN control_config_link_req(struct control * control, u16 flags, u16 mtu) +{ + struct send_io *send_io; + struct vnic_cmd_config_link *config_link_req; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_config_link_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_CONFIG_LINK); + if (!send_io) + return FALSE; + + pkt = control_packet(send_io); + config_link_req = &pkt->cmd.config_link_req; + config_link_req->lan_switch_num = + hton8(control->lan_switch.lan_switch_num); + config_link_req->cmd_flags = VNIC_FLAG_SET_MTU; + if (flags & IFF_UP) { + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_NIC; + } else { + config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_NIC; + } + if (flags & IFF_ALLMULTI) { + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_MCAST_ALL; + } else { + config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_MCAST_ALL; + } + if (flags & IFF_PROMISC) { + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_PROMISC; + /* the EIOU doesn't really do PROMISC mode. + * if PROMISC is set, it only receives unicast packets + * I also have to set MCAST_ALL if I want real + * PROMISC mode. + */ + config_link_req->cmd_flags &= ~VNIC_FLAG_DISABLE_MCAST_ALL; + config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_MCAST_ALL; + } else { + config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_PROMISC; + } + config_link_req->mtu_size = hton16(mtu); + + control->rsp_expected = pkt->hdr.pkt_cmd; + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_config_link_rsp(struct control * control, u16 * flags, + u16 * mtu) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_link *config_link_rsp; + + CONTROL_FUNCTION("%s: control_config_link_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_CONFIG_LINK) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + config_link_rsp = &pkt->cmd.config_link_rsp; + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_NIC) { + *flags |= IFF_UP; + } + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL) { + *flags |= IFF_ALLMULTI; + } + if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_PROMISC) { + *flags |= IFF_PROMISC; + } + *mtu = ntoh16(config_link_rsp->mtu_size); + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; +failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return FALSE; +} + +/* control_config_addrs_req: + * return values: + * -1: failure + * 0: incomplete (successful operation, but more address + * table entries to be updated) + * 1: complete + */ +int control_config_addrs_req(struct control *control, + struct vnic_address_op *addrs, u16 num) +{ + struct send_io *send_io; + struct vnic_cmd_config_addresses *config_addrs_req; + struct vnic_control_packet *pkt; + u16 i; + u8 j; + int ret = 1; + + CONTROL_FUNCTION("%s: control_config_addrs_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_CONFIG_ADDRESSES); + if (!send_io) + return -1; + + pkt = control_packet(send_io); + config_addrs_req = &pkt->cmd.config_addresses_req; + config_addrs_req->lan_switch_num = + hton8(control->lan_switch.lan_switch_num); + for (i = 0, j = 0; (i < num) && (j < 16); i++) { + if (!addrs[i].operation) + continue; + config_addrs_req->list_address_ops[j].index = hton16(i); + config_addrs_req->list_address_ops[j].operation = + VNIC_OP_SET_ENTRY; + config_addrs_req->list_address_ops[j].valid = addrs[i].valid; + memcpy(config_addrs_req->list_address_ops[j].address, + addrs[i].address, MAC_ADDR_LEN); + config_addrs_req->list_address_ops[j].vlan = + hton16(addrs[i].vlan); + addrs[i].operation = 0; + j++; + } + for (; i < num; i++) { + if (addrs[i].operation) { + ret = 0; + break; + } + } + config_addrs_req->num_address_ops = hton8(j); + + control->rsp_expected = pkt->hdr.pkt_cmd; + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + if (!control_send(control, send_io)) + return -1; + return ret; +} + +BOOLEAN control_config_addrs_rsp(struct control * control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_config_addresses *config_addrs_rsp; + + CONTROL_FUNCTION("%s: control_config_addrs_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_CONFIG_ADDRESSES) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + config_addrs_rsp = &pkt->cmd.config_addresses_rsp; + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return TRUE; +failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + return FALSE; +} + +BOOLEAN control_report_statistics_req(struct control * control) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_report_stats_req *report_statistics_req; + + CONTROL_FUNCTION("%s: control_report_statistics_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_REPORT_STATISTICS); + if (!send_io) + return FALSE; + + pkt = control_packet(send_io); + report_statistics_req = &pkt->cmd.report_statistics_req; + report_statistics_req->lan_switch_num = + hton8(control->lan_switch.lan_switch_num); + + control->rsp_expected = pkt->hdr.pkt_cmd; + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_report_statistics_rsp(struct control * control, + struct vnic_cmd_report_stats_rsp * + stats) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_report_stats_rsp *rep_stat_rsp; + + CONTROL_FUNCTION("%s: control_report_statistics_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_REPORT_STATISTICS) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + rep_stat_rsp = &pkt->cmd.report_statistics_rsp; + stats->if_in_broadcast_pkts = + ntoh64(rep_stat_rsp->if_in_broadcast_pkts); + stats->if_in_multicast_pkts = + ntoh64(rep_stat_rsp->if_in_multicast_pkts); + stats->if_in_octets = ntoh64(rep_stat_rsp->if_in_octets); + stats->if_in_ucast_pkts = ntoh64(rep_stat_rsp->if_in_ucast_pkts); + stats->if_in_nucast_pkts = ntoh64(rep_stat_rsp->if_in_nucast_pkts); + stats->if_in_underrun = ntoh64(rep_stat_rsp->if_in_underrun); + stats->if_in_errors = ntoh64(rep_stat_rsp->if_in_errors); + stats->if_out_errors = ntoh64(rep_stat_rsp->if_out_errors); + stats->if_out_octets = ntoh64(rep_stat_rsp->if_out_octets); + stats->if_out_ucast_pkts = ntoh64(rep_stat_rsp->if_out_ucast_pkts); + stats->if_out_multicast_pkts = + ntoh64(rep_stat_rsp->if_out_multicast_pkts); + stats->if_out_broadcast_pkts = + ntoh64(rep_stat_rsp->if_out_broadcast_pkts); + stats->if_out_nucast_pkts = ntoh64(rep_stat_rsp->if_out_nucast_pkts); + stats->if_out_ok = ntoh64(rep_stat_rsp->if_out_ok); + stats->if_in_ok = ntoh64(rep_stat_rsp->if_in_ok); + stats->if_out_ucast_bytes = ntoh64(rep_stat_rsp->if_out_ucast_bytes); + stats->if_out_multicast_bytes = + ntoh64(rep_stat_rsp->if_out_multicast_bytes); + stats->if_out_broadcast_bytes = + ntoh64(rep_stat_rsp->if_out_broadcast_bytes); + stats->if_in_ucast_bytes = ntoh64(rep_stat_rsp->if_in_ucast_bytes); + stats->if_in_multicast_bytes = + ntoh64(rep_stat_rsp->if_in_multicast_bytes); + stats->if_in_broadcast_bytes = + ntoh64(rep_stat_rsp->if_in_broadcast_bytes); + stats->ethernet_status = ntoh64(rep_stat_rsp->ethernet_status); + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; +failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return FALSE; +} + +BOOLEAN control_reset_req(struct control * control) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_reset_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_RESET); + if (!send_io) + return FALSE; + + pkt = control_packet(send_io); + + control->rsp_expected = pkt->hdr.pkt_cmd; + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_reset_rsp(struct control * control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + + CONTROL_FUNCTION("%s: control_reset_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_RESET) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; + failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return FALSE; +} + +BOOLEAN control_heartbeat_req(struct control * control, u32 hb_interval) +{ + struct send_io *send_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_heartbeat *heartbeat_req; + + CONTROL_FUNCTION("%s: control_heartbeat_req()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + send_io = control_init_hdr(control, CMD_HEARTBEAT); + if (!send_io) + return FALSE; + + pkt = control_packet(send_io); + heartbeat_req = &pkt->cmd.heartbeat_req; + heartbeat_req->hb_interval = hton32(hb_interval); + + control->rsp_expected = pkt->hdr.pkt_cmd; + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, + DMA_TO_DEVICE); + + return control_send(control, send_io); +} + +BOOLEAN control_heartbeat_rsp(struct control * control) +{ + struct recv_io *recv_io; + struct vnic_control_packet *pkt; + struct vnic_cmd_heartbeat *heartbeat_rsp; + + CONTROL_FUNCTION("%s: control_heartbeat_rsp()\n", + control_ifcfg_name(control)); + dma_sync_single_for_cpu(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + recv_io = control_get_rsp(control); + if (!recv_io) + return FALSE; + + pkt = control_packet(recv_io); + if (pkt->hdr.pkt_cmd != CMD_HEARTBEAT) { + CONTROL_ERROR("%s: sent control request:\n", + control_ifcfg_name(control)); + control_log_control_packet(control_last_req(control)); + CONTROL_ERROR("%s: received control response:\n", + control_ifcfg_name(control)); + control_log_control_packet(pkt); + goto failure; + } + + heartbeat_rsp = &pkt->cmd.heartbeat_rsp; + + control_recv(control, recv_io); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return TRUE; + failure: + viport_failure(control->parent); + dma_sync_single_for_device(control->parent->config->ibdev->dma_device, + control->recv_dma, control->recv_len, + DMA_FROM_DEVICE); + + return FALSE; +} + +static void control_log_control_packet(struct vnic_control_packet *pkt) +{ + char *type; + int i; + + switch (ntoh8(pkt->hdr.pkt_type)) { + case TYPE_INFO: + type = "TYPE_INFO"; + break; + case TYPE_REQ: + type = "TYPE_REQ"; + break; + case TYPE_RSP: + type = "TYPE_RSP"; + break; + case TYPE_ERR: + type = "TYPE_ERR"; + break; + default: + type = "UNKNOWN"; + } + switch (ntoh8(pkt->hdr.pkt_cmd)) { + case CMD_INIT_VNIC: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_INIT_VNIC\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO + " vnic_major_version = %u," + " vnic_minor_version = %u\n", + ntoh16(pkt->cmd.init_vnic_req.vnic_major_version), + ntoh16(pkt->cmd.init_vnic_req.vnic_minor_version)); + if (pkt->hdr.pkt_type == TYPE_REQ) { + printk(KERN_INFO + " vnic_instance = %u," + " num_data_paths = %u\n", + ntoh8(pkt->cmd.init_vnic_req.vnic_instance), + ntoh8(pkt->cmd.init_vnic_req.num_data_paths)); + printk(KERN_INFO + " num_address_entries = %u\n", + ntoh16(pkt->cmd.init_vnic_req. + num_address_entries)); + } else { + printk(KERN_INFO + " num_lan_switches = %u," + " num_data_paths = %u\n", + ntoh8(pkt->cmd.init_vnic_rsp.num_lan_switches), + ntoh8(pkt->cmd.init_vnic_rsp.num_data_paths)); + printk(KERN_INFO + " num_address_entries = %u," + " features_supported = %08x\n", + ntoh16(pkt->cmd.init_vnic_rsp. + num_address_entries), + ntoh32(pkt->cmd.init_vnic_rsp. + features_supported)); + if (pkt->cmd.init_vnic_rsp.num_lan_switches != 0) { + printk(KERN_INFO + "lan_switch[0] lan_switch_num = %u," + " num_enet_ports = %08x\n", + ntoh8(pkt->cmd.init_vnic_rsp. + lan_switch[0].lan_switch_num), + ntoh8(pkt->cmd.init_vnic_rsp. + lan_switch[0].num_enet_ports)); + printk(KERN_INFO + " default_vlan = %u," + " hw_mac_address =" + " %02x:%02x:%02x:%02x:%02x:%02x\n", + ntoh16(pkt->cmd.init_vnic_rsp. + lan_switch[0].default_vlan), + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[0], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[1], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[2], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[3], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[4], + pkt->cmd.init_vnic_rsp.lan_switch[0]. + hw_mac_address[5]); + } + } + break; + case CMD_CONFIG_DATA_PATH: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_CONFIG_DATA_PATH\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO " path_identifier = %" PRIx64 + ", data_path = %u\n", + pkt->cmd.config_data_path_req.path_identifier, + ntoh8(pkt->cmd.config_data_path_req.data_path)); + printk(KERN_INFO + "host config size_recv_pool_entry = %u," + " num_recv_pool_entries = %u\n", + ntoh32(pkt->cmd.config_data_path_req. + host_recv_pool_config.size_recv_pool_entry), + ntoh32(pkt->cmd.config_data_path_req. + host_recv_pool_config.num_recv_pool_entries)); + printk(KERN_INFO + " timeout_before_kick = %u," + " num_recv_pool_entries_before_kick = %u\n", + ntoh32(pkt->cmd.config_data_path_req. + host_recv_pool_config.timeout_before_kick), + ntoh32(pkt->cmd.config_data_path_req. + host_recv_pool_config. + num_recv_pool_entries_before_kick)); + printk(KERN_INFO + " num_recv_pool_bytes_before_kick = %u," + " free_recv_pool_entries_per_update = %u\n", + ntoh32(pkt->cmd.config_data_path_req. + host_recv_pool_config. + num_recv_pool_bytes_before_kick), + ntoh32(pkt->cmd.config_data_path_req. + host_recv_pool_config. + free_recv_pool_entries_per_update)); + printk(KERN_INFO + "eioc config size_recv_pool_entry = %u," + " num_recv_pool_entries = %u\n", + ntoh32(pkt->cmd.config_data_path_req. + eioc_recv_pool_config.size_recv_pool_entry), + ntoh32(pkt->cmd.config_data_path_req. + eioc_recv_pool_config.num_recv_pool_entries)); + printk(KERN_INFO + " timeout_before_kick = %u," + " num_recv_pool_entries_before_kick = %u\n", + ntoh32(pkt->cmd.config_data_path_req. + eioc_recv_pool_config.timeout_before_kick), + ntoh32(pkt->cmd.config_data_path_req. + eioc_recv_pool_config. + num_recv_pool_entries_before_kick)); + printk(KERN_INFO + " num_recv_pool_bytes_before_kick = %u," + " free_recv_pool_entries_per_update = %u\n", + ntoh32(pkt->cmd.config_data_path_req. + eioc_recv_pool_config. + num_recv_pool_bytes_before_kick), + ntoh32(pkt->cmd.config_data_path_req. + eioc_recv_pool_config. + free_recv_pool_entries_per_update)); + break; + case CMD_EXCHANGE_POOLS: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_EXCHANGE_POOLS\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO " datapath = %u\n", + pkt->cmd.exchange_pools_req.data_path); + printk(KERN_INFO " pool_rkey = %08x" + " pool_addr = %" + PRIx64 "\n", + ntoh32(pkt->cmd.exchange_pools_req.pool_rkey), + ntoh64(pkt->cmd.exchange_pools_req.pool_addr)); + break; + case CMD_CONFIG_ADDRESSES: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_CONFIG_ADDRESSES\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO + " num_address_ops = %x," + " lan_switch_num = %d\n", + ntoh8(pkt->cmd.config_addresses_req.num_address_ops), + ntoh8(pkt->cmd.config_addresses_req.lan_switch_num)); + for (i = 0; (i < pkt->cmd.config_addresses_req.num_address_ops) + && (i < 16); i++) { + printk(KERN_INFO + " list_address_ops[%u].index" + " = %u\n", + i, + ntoh16(pkt->cmd.config_addresses_req. + list_address_ops[i].index)); + switch (ntoh8 + (pkt->cmd.config_addresses_req. + list_address_ops[i].operation)) { + case VNIC_OP_GET_ENTRY: + printk(KERN_INFO + " list_address_ops[%u]." + "operation = VNIC_OP_GET_ENTRY\n", + i); + break; + case VNIC_OP_SET_ENTRY: + printk(KERN_INFO + " list_address_ops[%u]." + "operation = VNIC_OP_SET_ENTRY\n", + i); + break; + default: + printk(KERN_INFO + " list_address_ops[%u]." + "operation = UNKNOWN(%d)\n", + i, + ntoh8(pkt->cmd.config_addresses_req. + list_address_ops[i].operation)); + break; + } + printk(KERN_INFO + " list_address_ops[%u].valid" + " = %u\n", + i, + ntoh8(pkt->cmd.config_addresses_req. + list_address_ops[i].valid)); + printk(KERN_INFO + " list_address_ops[%u].address" + " = %02x:%02x:%02x:%02x:%02x:%02x\n", + i, + pkt->cmd.config_addresses_req. + list_address_ops[i].address[0], + pkt->cmd.config_addresses_req. + list_address_ops[i].address[1], + pkt->cmd.config_addresses_req. + list_address_ops[i].address[2], + pkt->cmd.config_addresses_req. + list_address_ops[i].address[3], + pkt->cmd.config_addresses_req. + list_address_ops[i].address[4], + pkt->cmd.config_addresses_req. + list_address_ops[i].address[5]); + printk(KERN_INFO + " list_address_ops[%u].vlan" + " = %u\n", + i, + ntoh16(pkt->cmd.config_addresses_req. + list_address_ops[i].vlan)); + } + break; + case CMD_CONFIG_LINK: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_CONFIG_LINK\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO " cmd_flags = %x\n", + ntoh8(pkt->cmd.config_link_req.cmd_flags)); + if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_ENABLE_NIC) + printk(KERN_INFO + " VNIC_FLAG_ENABLE_NIC\n"); + if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_DISABLE_NIC) + printk(KERN_INFO + " VNIC_FLAG_DISABLE_NIC\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL) + printk(KERN_INFO + " VNIC_FLAG_ENABLE_" + "MCAST_ALL\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_DISABLE_MCAST_ALL) + printk(KERN_INFO + " VNIC_FLAG_DISABLE_" + "MCAST_ALL\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_ENABLE_PROMISC) + printk(KERN_INFO + " VNIC_FLAG_ENABLE_" + "PROMISC\n"); + if (pkt->cmd.config_link_req. + cmd_flags & VNIC_FLAG_DISABLE_PROMISC) + printk(KERN_INFO + " VNIC_FLAG_DISABLE_" + "PROMISC\n"); + if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_SET_MTU) + printk(KERN_INFO + " VNIC_FLAG_SET_MTU\n"); + printk(KERN_INFO + " lan_switch_num = %x, mtu_size = %d\n", + ntoh8(pkt->cmd.config_link_req.lan_switch_num), + ntoh16(pkt->cmd.config_link_req.mtu_size)); + if (pkt->hdr.pkt_type == TYPE_RSP) { + printk(KERN_INFO + " default_vlan = %u," + " hw_mac_address =" + " %02x:%02x:%02x:%02x:%02x:%02x\n", + ntoh16(pkt->cmd.config_link_req.default_vlan), + pkt->cmd.config_link_req.hw_mac_address[0], + pkt->cmd.config_link_req.hw_mac_address[1], + pkt->cmd.config_link_req.hw_mac_address[2], + pkt->cmd.config_link_req.hw_mac_address[3], + pkt->cmd.config_link_req.hw_mac_address[4], + pkt->cmd.config_link_req.hw_mac_address[5]); + } + break; + case CMD_REPORT_STATISTICS: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_REPORT_STATISTICS\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO " lan_switch_num = %u\n", + ntoh8(pkt->cmd.report_statistics_req.lan_switch_num)); + if (pkt->hdr.pkt_type == TYPE_REQ) + break; + printk(KERN_INFO " if_in_broadcast_pkts = %" + PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_in_broadcast_pkts)); + printk(" if_in_multicast_pkts = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp. + if_in_multicast_pkts)); + printk(KERN_INFO " if_in_octets = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp.if_in_octets)); + printk(" if_in_ucast_pkts = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp.if_in_ucast_pkts)); + printk(KERN_INFO " if_in_nucast_pkts = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_in_nucast_pkts)); + printk(" if_in_underrun = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp.if_in_underrun)); + printk(KERN_INFO " if_in_errors = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp.if_in_errors)); + printk(" if_out_errors = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp.if_out_errors)); + printk(KERN_INFO " if_out_octets = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp.if_out_octets)); + printk(" if_out_ucast_pkts = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_ucast_pkts)); + printk(KERN_INFO " if_out_multicast_pkts = %" + PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_multicast_pkts)); + printk(" if_out_broadcast_pkts = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_broadcast_pkts)); + printk(KERN_INFO " if_out_nucast_pkts" + " = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_nucast_pkts)); + printk(" if_out_ok = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp.if_out_ok)); + printk(KERN_INFO " if_in_ok = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp.if_in_ok)); + printk(" if_out_ucast_bytes = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_ucast_bytes)); + printk(KERN_INFO " if_out_multicast_bytes = %" + PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_multicast_bytes)); + printk(" if_out_broadcast_bytes = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp. + if_out_broadcast_bytes)); + printk(KERN_INFO " if_in_ucast_bytes = %" PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_in_ucast_bytes)); + printk(" if_in_multicast_bytes = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp. + if_in_multicast_bytes)); + printk(KERN_INFO " if_in_broadcast_bytes = %" + PRIu64, + ntoh64(pkt->cmd.report_statistics_rsp. + if_in_broadcast_bytes)); + printk(" ethernet_status = %" PRIu64 "\n", + ntoh64(pkt->cmd.report_statistics_rsp.ethernet_status)); + break; + case CMD_CLEAR_STATISTICS: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_CLEAR_STATISTICS\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + break; + case CMD_REPORT_STATUS: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_REPORT_STATUS\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO + " lan_switch_num = %u, is_fatal = %u\n", + ntoh8(pkt->cmd.report_status.lan_switch_num), + ntoh8(pkt->cmd.report_status.is_fatal)); + printk(KERN_INFO + " status_number = %u, status_info = %u\n", + ntoh32(pkt->cmd.report_status.status_number), + ntoh32(pkt->cmd.report_status.status_info)); + pkt->cmd.report_status.file_name[31] = '\0'; + pkt->cmd.report_status.routine[31] = '\0'; + printk(KERN_INFO " filename = %s, routine = %s\n", + pkt->cmd.report_status.file_name, + pkt->cmd.report_status.routine); + printk(KERN_INFO + " line_num = %u, error_parameter = %u\n", + ntoh32(pkt->cmd.report_status.line_num), + ntoh32(pkt->cmd.report_status.error_parameter)); + pkt->cmd.report_status.desc_text[127] = '\0'; + printk(KERN_INFO " desc_text = %s\n", + pkt->cmd.report_status.desc_text); + break; + case CMD_RESET: + printk(KERN_INFO + "control_packet: pkt_type = %s, pkt_cmd = CMD_RESET\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + break; + case CMD_HEARTBEAT: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = CMD_HEARTBEAT\n", + type); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + printk(KERN_INFO " hb_interval = %d\n", + ntoh32(pkt->cmd.heartbeat_req.hb_interval)); + break; + default: + printk(KERN_INFO + "control_packet: pkt_type = %s," + " pkt_cmd = UNKNOWN (%u)\n", + type, pkt->hdr.pkt_cmd); + printk(KERN_INFO + " pkt_seq_num = %u," + " pkt_retry_count = %u\n", + ntoh8(pkt->hdr.pkt_seq_num), + ntoh8(pkt->hdr.pkt_retry_count)); + break; + } + return; +} + +BOOLEAN control_init(struct control * control, struct viport * viport, + struct control_config * config, struct ib_pd * pd, + u64 guid) +{ + struct vnic_control_packet *pkt; + struct io *io; + int sz; + unsigned int i; + struct ib_device *ibdev; + dma_addr_t recv_dma; + + CONTROL_FUNCTION("%s: control_init()\n", control_ifcfg_name(control)); + memset(control, 0, sizeof(struct control)); + + control->parent = viport; + control->config = config; + control->ib_conn.viport = viport; + control->ib_conn.ib_config = &config->ib_config; + control->ib_conn.state = IB_CONN_UNINITTED; + + control->req_outstanding = FALSE; + control->seq_num = 0; + + control->response = NULL; + control->info = NULL; + INIT_LIST_HEAD(&control->failure_list); + spin_lock_init(&control->io_lock); + + if (!vnic_ib_conn_init(&control->ib_conn, viport, pd, guid, + &config->ib_config)) { + CONTROL_ERROR("IB connection initialization failed\n"); + goto failure; + } + + control->mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(control->mr)) { + CONTROL_ERROR("%s: failed to register memory" + " for control connection\n", + control_ifcfg_name(control)); + goto destroy_conn; + } + + control->ib_conn.cm_id = ib_create_cm_id(viport->config->ibdev, + vnic_ib_cm_handler, + &control->ib_conn); + + if (IS_ERR(control->ib_conn.cm_id)) { + CONTROL_ERROR("creating control CM ID failed\n"); + return FALSE; + } + + sz = (sizeof(struct recv_io) * config->num_recvs) + + (sizeof(struct vnic_control_packet) * (config->num_recvs + 1)); + + control->local_storage = kmalloc(sz, GFP_KERNEL); + if (control->local_storage == NULL) { + CONTROL_ERROR("%s: failed allocating space for local storage\n", + control_ifcfg_name(control)); + goto destroy_conn; + } + memset(control->local_storage, '\0', sz); + + control->recv_ios = (struct recv_io *)control->local_storage; + + ibdev = viport->config->ibdev; + sz = sizeof(struct recv_io) * config->num_recvs; + + pkt = (struct vnic_control_packet *)(control->local_storage + + sizeof(struct recv_io) * + config->num_recvs); + + control->send_io.virtual_addr = control->local_storage + + sizeof(struct send_io) * config->num_recvs; + control->send_len = sizeof(struct vnic_control_packet); + + /*NOTE: using one send buffer and num_recvs recv buffers */ + + control->send_dma = + dma_map_single(ibdev->dma_device, pkt, control->send_len, + DMA_TO_DEVICE); + + if (dma_mapping_error(control->send_dma)) { + CONTROL_ERROR("control send dma map error\n"); + goto destroy_conn; + } + + io = &control->send_io.io; + io->viport = viport; + io->routine = control_send_complete; + + control->send_io.list.addr = control->send_dma; + control->send_io.list.length = sizeof(struct vnic_control_packet); + control->send_io.list.lkey = control->mr->lkey; + + io->swr.wr_id = PTR64(io); + io->swr.sg_list = &control->send_io.list; + io->swr.num_sge = 1; + io->swr.opcode = IB_WR_SEND; + io->swr.send_flags = IB_SEND_SIGNALED; + io->type = SEND; + + pkt++; + + sz = sizeof(struct vnic_control_packet) * (config->num_recvs); + control->recv_dma = dma_map_single(ibdev->dma_device, + pkt, sz, DMA_FROM_DEVICE); + control->recv_len = sz; + + if (dma_mapping_error(control->recv_dma)) { + CONTROL_ERROR("control recv dma map error\n"); + goto destroy_conn; + } + + recv_dma = control->recv_dma; + + for (i = 0; i < config->num_recvs; i++) { + io = &control->recv_ios[i].io; + io->viport = viport; + io->routine = control_recv_complete; + + io->type = RECV; + + control->recv_ios[i].virtual_addr = (u8 *) pkt; + control->recv_ios[i].list.addr = recv_dma; + control->recv_ios[i].list.length = + sizeof(struct vnic_control_packet); + control->recv_ios[i].list.lkey = control->mr->lkey; + + recv_dma = recv_dma + sizeof(struct vnic_control_packet); + pkt++; + + io->rwr.wr_id = PTR64(io); + io->rwr.sg_list = &control->recv_ios[i].list; + io->rwr.num_sge = 1; + if (!vnic_ib_post_recv(&control->ib_conn, io)) { + kfree(control->local_storage); + goto destroy_conn; + } + } + return TRUE; +destroy_conn: + ib_destroy_qp(control->ib_conn.qp); + ib_destroy_cq(control->ib_conn.cq); +failure: + return FALSE; + +} + +void control_cleanup(struct control *control) +{ + CONTROL_FUNCTION("%s: control_disconnect()\n", + control_ifcfg_name(control)); + + init_completion(&control->ib_conn.done); + + if (ib_send_cm_dreq(control->ib_conn.cm_id, NULL, 0)) { + printk(KERN_DEBUG "control CM DREQ sending failed\n"); + } else + wait_for_completion(&control->ib_conn.done); + + control_timer_stop(control); + + ib_destroy_cm_id(control->ib_conn.cm_id); + + ib_destroy_qp(control->ib_conn.qp); + + ib_destroy_cq(control->ib_conn.cq); + + ib_dereg_mr(control->mr); + + dma_unmap_single(control->parent->config->ibdev->dma_device, + control->send_dma, control->send_len, DMA_TO_DEVICE); + dma_unmap_single(control->parent->config->ibdev->dma_device, + control->recv_dma, control->send_len, DMA_FROM_DEVICE); + + kfree(control->local_storage); + return; +} diff --git a/drivers/infiniband/ulp/vnic/vnic_control.h b/drivers/infiniband/ulp/vnic/vnic_control.h new file mode 100644 index 0000000..2124bea --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_control.h @@ -0,0 +1,145 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_CONTROL_H_INCLUDED +#define VNIC_CONTROL_H_INCLUDED + +#ifdef CONFIG_INFINIBAND_VNIC_STATS +#include +#include +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + +#include "vnic_ib.h" +#include "vnic_control_pkt.h" + +enum control_timer_state { + TIMER_IDLE, + TIMER_ACTIVE, + TIMER_EXPIRED +}; + +struct control { + struct viport *parent; + struct control_config *config; + struct ib_mr *mr; + struct vnic_ib_conn ib_conn; + u8 *local_storage; + int send_len; + int recv_len; + u16 maj_ver; + u16 min_ver; + struct vnic_lan_switch_attribs lan_switch; + struct send_io send_io; + struct recv_io *recv_ios; + dma_addr_t send_dma; + dma_addr_t recv_dma; + enum control_timer_state timer_state; + struct timer_list timer; + u8 req_retry_counter; + u8 req_outstanding; + u8 seq_num; + u8 rsp_expected; + struct recv_io *response; + struct recv_io *info; + struct list_head failure_list; + spinlock_t io_lock; + struct completion done; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + struct { + cycles_t request_time; /* intermediate value */ + cycles_t response_time; + u32 response_num; + cycles_t response_max; + cycles_t response_min; + u32 timeout_num; + } statistics; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ +}; + +BOOLEAN control_init(struct control *control, struct viport *viport, + struct control_config *config, struct ib_pd *pd, + u64 guid); + +BOOLEAN control_connect(struct control *control); +void control_cleanup(struct control *control); + +void control_process_async(struct control *control); + +BOOLEAN control_init_vnic_req(struct control *control); +BOOLEAN control_init_vnic_rsp(struct control *control, u32 * features, + u8 * mac_address, u16 * num_addrs, u16 * vlan); + +BOOLEAN control_config_data_path_req(struct control *control, u64 path_id, + struct vnic_recv_pool_config *host, + struct vnic_recv_pool_config *eioc); +BOOLEAN control_config_data_path_rsp(struct control *control, + struct vnic_recv_pool_config *host, + struct vnic_recv_pool_config *eioc, + struct vnic_recv_pool_config *max_host, + struct vnic_recv_pool_config *max_eioc, + struct vnic_recv_pool_config *min_host, + struct vnic_recv_pool_config *min_eioc); + +BOOLEAN control_exchange_pools_req(struct control *control, u64 addr, u32 rkey); +BOOLEAN control_exchange_pools_rsp(struct control *control, u64 * addr, + u32 * rkey); + +BOOLEAN control_config_link_req(struct control *control, u16 flags, u16 mtu); +BOOLEAN control_config_link_rsp(struct control *control, u16 * flags, + u16 * mtu); + +int control_config_addrs_req(struct control *control, + struct vnic_address_op *addrs, u16 num); +BOOLEAN control_config_addrs_rsp(struct control *control); + +BOOLEAN control_report_statistics_req(struct control *control); +BOOLEAN control_report_statistics_rsp(struct control *control, + struct vnic_cmd_report_stats_rsp *stats); + +BOOLEAN control_reset_req(struct control *control); +BOOLEAN control_reset_rsp(struct control *control); + +BOOLEAN control_heartbeat_req(struct control *control, u32 hb_interval); +BOOLEAN control_heartbeat_rsp(struct control *control); + +#define control_packet(io) \ + (struct vnic_control_packet *)PTR((io)->virtual_addr) +#define control_is_connected(control) \ + (ib_conn_connected(&((control)->ib_conn))) + +#define control_last_req(control) control_packet(&(control)->send_io) +#define control_features(control) (control)->features_supported + +#define control_get_mac_address(control,addr) \ + memcpy(addr,(control)->lan_switch.hw_mac_address,MAC_ADDR_LEN) + +#endif /* VNIC_CONTROL_H_INCLUDED */ diff --git a/drivers/infiniband/ulp/vnic/vnic_control_pkt.h b/drivers/infiniband/ulp/vnic/vnic_control_pkt.h new file mode 100644 index 0000000..33dcf8d --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_control_pkt.h @@ -0,0 +1,278 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_CONTROL_PKT_H_INCLUDED +#define VNIC_CONTROL_PKT_H_INCLUDED + +struct vnic_connection_data { + u64 path_id; + u8 vnic_instance; + u8 path_num; + u8 nodename[65]; +}; + +struct vnic_control_header { + u8 pkt_type; + u8 pkt_cmd; + u8 pkt_seq_num; + u8 pkt_retry_count; + u32 reserved; /* for 64-bit alignmnet */ +}; + +/* ptk_type values */ +#define TYPE_INFO 0 +#define TYPE_REQ 1 +#define TYPE_RSP 2 +#define TYPE_ERR 3 + +/* ptk_cmd values */ +#define CMD_INIT_VNIC 1 +#define CMD_CONFIG_DATA_PATH 2 +#define CMD_EXCHANGE_POOLS 3 +#define CMD_CONFIG_ADDRESSES 4 +#define CMD_CONFIG_LINK 5 +#define CMD_REPORT_STATISTICS 6 +#define CMD_CLEAR_STATISTICS 7 +#define CMD_REPORT_STATUS 8 +#define CMD_RESET 9 +#define CMD_HEARTBEAT 10 + +#define MAC_ADDR_LEN 6 + +/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_REQ data format */ +struct vnic_cmd_init_vnic_req { + u16 vnic_major_version; + u16 vnic_minor_version; + u8 vnic_instance; + u8 num_data_paths; + u16 num_address_entries; +}; + +/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_RSP subdata format */ +struct vnic_lan_switch_attribs { + u8 lan_switch_num; + u8 num_enet_ports; + u16 default_vlan; + u8 hw_mac_address[MAC_ADDR_LEN]; +}; + +/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_RSP data format */ +struct vnic_cmd_init_vnic_rsp { + u16 vnic_major_version; + u16 vnic_minor_version; + u8 num_lan_switches; + u8 num_data_paths; + u16 num_address_entries; + u32 features_supported; + struct vnic_lan_switch_attribs lan_switch[1]; +}; + +/* features_supported values */ +#define VNIC_FEAT_IPV4_HEADERS 0x0001 +#define VNIC_FEAT_IPV6_HEADERS 0x0002 +#define VNIC_FEAT_IPV4_CSUM_RX 0x0004 +#define VNIC_FEAT_IPV4_CSUM_TX 0x0008 +#define VNIC_FEAT_TCP_CSUM_RX 0x0010 +#define VNIC_FEAT_TCP_CSUM_TX 0x0020 +#define VNIC_FEAT_UDP_CSUM_RX 0x0040 +#define VNIC_FEAT_UDP_CSUM_TX 0x0080 +#define VNIC_FEAT_TCP_SEGMENT 0x0100 +#define VNIC_FEAT_IPV4_IPSEC_OFFLOAD 0x0200 +#define VNIC_FEAT_IPV6_IPSEC_OFFLOAD 0x0400 +#define VNIC_FEAT_FCS_PROPAGATE 0x0800 +#define VNIC_FEAT_PF_KICK 0x1000 +#define VNIC_FEAT_PF_FORCE_ROUTE 0x2000 +#define VNIC_FEAT_CHASH_OFFLOAD 0x4000 + +/* pkt_cmd CMD_CONFIG_DATA_PATH subdata format */ +struct vnic_recv_pool_config { + u32 size_recv_pool_entry; + u32 num_recv_pool_entries; + u32 timeout_before_kick; + u32 num_recv_pool_entries_before_kick; + u32 num_recv_pool_bytes_before_kick; + u32 free_recv_pool_entries_per_update; +}; + +/* pkt_cmd CMD_CONFIG_DATA_PATH data format */ +struct vnic_cmd_config_data_path { + u64 path_identifier; + u8 data_path; + u8 reserved[3]; + struct vnic_recv_pool_config host_recv_pool_config; + struct vnic_recv_pool_config eioc_recv_pool_config; +}; + +/* pkt_cmd CMD_EXCHANGE_POOLS data format */ +struct vnic_cmd_exchange_pools { + u8 data_path; + u8 reserved[3]; + u32 pool_rkey; + u64 pool_addr; +}; + +/* pkt_cmd CMD_CONFIG_ADDRESSES subdata format */ +struct vnic_address_op { + u16 index; + u8 operation; + u8 valid; + u8 address[6]; + u16 vlan; +}; + +/* operation values */ +#define VNIC_OP_SET_ENTRY 0x01 +#define VNIC_OP_GET_ENTRY 0x02 + +/* pkt_cmd CMD_CONFIG_ADDRESSES data format */ +struct vnic_cmd_config_addresses { + u8 num_address_ops; + u8 lan_switch_num; + struct vnic_address_op list_address_ops[1]; +}; + +/* CMD_CONFIG_LINK data format */ +struct vnic_cmd_config_link { + u8 cmd_flags; + u8 lan_switch_num; + u16 mtu_size; + u16 default_vlan; + u8 hw_mac_address[6]; +}; + +/* cmd_flags values */ +#define VNIC_FLAG_ENABLE_NIC 0x01 +#define VNIC_FLAG_DISABLE_NIC 0x02 +#define VNIC_FLAG_ENABLE_MCAST_ALL 0x04 +#define VNIC_FLAG_DISABLE_MCAST_ALL 0x08 +#define VNIC_FLAG_ENABLE_PROMISC 0x10 +#define VNIC_FLAG_DISABLE_PROMISC 0x20 +#define VNIC_FLAG_SET_MTU 0x40 + +/* pkt_cmd CMD_REPORT_STATISTICS, pkt_type TYPE_REQ data format */ +struct vnic_cmd_report_stats_req { + u8 lan_switch_num; +}; + +/* pkt_cmd CMD_REPORT_STATISTICS, pkt_type TYPE_RSP data format */ +struct vnic_cmd_report_stats_rsp { + u8 lan_switch_num; + u8 reserved[7]; /* for 64-bit alignment */ + u64 if_in_broadcast_pkts; + u64 if_in_multicast_pkts; + u64 if_in_octets; + u64 if_in_ucast_pkts; + u64 if_in_nucast_pkts; /* if_in_broadcast_pkts + + if_in_multicast_pkts */ + u64 if_in_underrun; /* (OID_GEN_RCV_NO_BUFFER) */ + u64 if_in_errors; /* (OID_GEN_RCV_ERROR) */ + u64 if_out_errors; /* (OID_GEN_XMIT_ERROR) */ + u64 if_out_octets; + u64 if_out_ucast_pkts; + u64 if_out_multicast_pkts; + u64 if_out_broadcast_pkts; + u64 if_out_nucast_pkts; /* if_out_broadcast_pkts + + if_out_multicast_pkts */ + u64 if_out_ok; /* if_out_nucast_pkts + + if_out_ucast_pkts(OID_GEN_XMIT_OK) */ + u64 if_in_ok; /* if_in_nucast_pkts + + if_in_ucast_pkts(OID_GEN_RCV_OK) */ + u64 if_out_ucast_bytes; /* (OID_GEN_DIRECTED_BYTES_XMT) */ + u64 if_out_multicast_bytes; /* (OID_GEN_MULTICAST_BYTES_XMT) */ + u64 if_out_broadcast_bytes; /* (OID_GEN_BROADCAST_BYTES_XMT) */ + u64 if_in_ucast_bytes; /* (OID_GEN_DIRECTED_BYTES_RCV) */ + u64 if_in_multicast_bytes; /* (OID_GEN_MULTICAST_BYTES_RCV) */ + u64 if_in_broadcast_bytes; /* (OID_GEN_BROADCAST_BYTES_RCV) */ + u64 ethernet_status; /* OID_GEN_MEDIA_CONNECT_STATUS) */ +}; + +/* pkt_cmd CMD_CLEAR_STATISTICS data format */ +struct vnic_cmd_clear_statistics { + u8 lan_switch_num; +}; + +/* pkt_cmd CMD_REPORT_STATUS data format */ +struct vnic_cmd_report_status { + u8 lan_switch_num; + u8 is_fatal; + u8 reserved[2]; /* for 32-bit alignment */ + u32 status_number; + u32 status_info; + u8 file_name[32]; + u8 routine[32]; + u32 line_num; + u32 error_parameter; + u8 desc_text[128]; +}; + +/* pkt_cmd CMD_HEARTBEAT data format */ +struct vnic_cmd_heartbeat { + u32 hb_interval; +}; + +#define VNIC_STATUS_LINK_UP 1 +#define VNIC_STATUS_LINK_DOWN 2 +#define VNIC_STATUS_ENET_AGGREGATION_CHANGE 3 +#define VNIC_STATUS_EIOC_SHUTDOWN 4 +#define VNIC_STATUS_CONTROL_ERROR 5 +#define VNIC_STATUS_EIOC_ERROR 6 + +#define VNIC_MAX_CONTROLPKTSZ 256 +#define VNIC_MAX_CONTROLDATASZ \ + (VNIC_MAX_CONTROLPKTSZ - sizeof(struct vnic_control_header)) + +struct vnic_control_packet { + struct vnic_control_header hdr; + union { + struct vnic_cmd_init_vnic_req init_vnic_req; + struct vnic_cmd_init_vnic_rsp init_vnic_rsp; + struct vnic_cmd_config_data_path config_data_path_req; + struct vnic_cmd_config_data_path config_data_path_rsp; + struct vnic_cmd_exchange_pools exchange_pools_req; + struct vnic_cmd_exchange_pools exchange_pools_rsp; + struct vnic_cmd_config_addresses config_addresses_req; + struct vnic_cmd_config_addresses config_addresses_rsp; + struct vnic_cmd_config_link config_link_req; + struct vnic_cmd_config_link config_link_rsp; + struct vnic_cmd_report_stats_req report_statistics_req; + struct vnic_cmd_report_stats_rsp report_statistics_rsp; + struct vnic_cmd_clear_statistics clear_statistics_req; + struct vnic_cmd_clear_statistics clear_statistics_rsp; + struct vnic_cmd_report_status report_status; + struct vnic_cmd_heartbeat heartbeat_req; + struct vnic_cmd_heartbeat heartbeat_rsp; + + char cmd_data[VNIC_MAX_CONTROLDATASZ]; + } cmd; +}; + +#endif /* VNIC_CONTROL_PKT_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:06:56 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:36:56 +0530 Subject: [openib-general] [PATCH 5/10] Implementation of Data path of the communication protocol Message-ID: <4521BEB8.15266.4E436D27@rkuchimanchi.silverstorm.com> Adds the files that implement the data transfer part of the communication protocol with the VEx. The RDMA of ethernet packets is implemented in here. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_data.c | 1065 ++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_data.h | 179 +++++ drivers/infiniband/ulp/vnic/vnic_trailer.h | 63 ++ 3 files changed, 1307 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_data.c b/drivers/infiniband/ulp/vnic/vnic_data.c new file mode 100644 index 0000000..e3b9739 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_data.c @@ -0,0 +1,1065 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "vnic_util.h" +#include "vnic_viport.h" +#include "vnic_config.h" +#include "vnic_data.h" +#include "vnic_trailer.h" + +static void data_received_kick(struct io *io); +static void data_xmit_complete(struct io *io); + +#define LOCAL_IO(x) PTR64((x)) + +#define INBOUND_COPY + +#ifdef INBOUND_COPY +u32 min_rcv_skb = 60; +module_param(min_rcv_skb, int, 0444); +#endif + +u32 min_xmt_skb = 60; +module_param(min_xmt_skb, int, 0444); + +#ifdef CONFIG_INFINIBAND_VNIC_STATS +cycles_t recv_ref; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + +BOOLEAN data_init(struct data * data, struct viport * viport, + struct data_config * config, struct ib_pd *pd, u64 guid) +{ + DATA_FUNCTION("data_init()\n"); + + data->parent = viport; + data->config = config; + data->ib_conn.viport = viport; + data->ib_conn.ib_config = &config->ib_config; + data->ib_conn.state = IB_CONN_UNINITTED; + + if ((min_xmt_skb < 60) || (min_xmt_skb > 9000)) { + DATA_ERROR("min_xmt_skb (%d) must be between 60 and 9000\n", + min_xmt_skb); + goto failure; + } + if (!vnic_ib_conn_init(&data->ib_conn, viport, pd, guid, + &config->ib_config)) { + goto failure; + } + data->mr = ib_get_dma_mr(pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + if (IS_ERR(data->mr)) { + DATA_ERROR("failed to register memory for data connection\n"); + goto destroy_conn; + } + + data->ib_conn.cm_id = ib_create_cm_id(viport->config->ibdev, + vnic_ib_cm_handler, + &data->ib_conn); + + if (IS_ERR(data->ib_conn.cm_id)) { + DATA_ERROR("creating data CM ID failed\n"); + return FALSE; + } + + return TRUE; + +destroy_conn: + ib_destroy_qp(data->ib_conn.qp); + ib_destroy_cq(data->ib_conn.cq); +failure: + return FALSE; +} + +static void data_post_recvs(struct data *data) +{ + unsigned long flags; + + DATA_FUNCTION("data_post_recvs()\n"); + spin_lock_irqsave(&data->recv_ios_lock, flags); + while (!list_empty(&data->recv_ios)) { + struct io *io = list_entry(data->recv_ios.next, + struct io, list_ptrs); + struct recv_io *recv_io = (struct recv_io *)io; + + list_del(&recv_io->io.list_ptrs); + spin_unlock_irqrestore(&data->recv_ios_lock, flags); + if (!vnic_ib_post_recv(&data->ib_conn, &recv_io->io)) { + viport_failure(data->parent); + return; + } + spin_lock_irqsave(&data->recv_ios_lock, flags); + } + spin_unlock_irqrestore(&data->recv_ios_lock, flags); +} + +BOOLEAN data_connect(struct data * data) +{ + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct recv_pool *recv_pool = &data->recv_pool; + struct recv_io *recv_io; + struct send_io *send_io; + struct rdma_io *rdma_io; + struct rdma_dest *rdma_dest; + u8 *region_data = NULL; + int sz; + unsigned int i; + dma_addr_t region_data_dma; + dma_addr_t xmit_dma; + u8 *xmit_data; + struct viport *viport = data->parent; + + DATA_FUNCTION("data_connect()\n"); + + recv_pool->pool_sz = data->config->host_recv_pool_entries; + recv_pool->eioc_pool_sz = data->host_pool_parms.num_recv_pool_entries; + if (recv_pool->pool_sz > recv_pool->eioc_pool_sz) + recv_pool->pool_sz = + data->host_pool_parms.num_recv_pool_entries; + + xmit_pool->pool_sz = data->eioc_pool_parms.num_recv_pool_entries; + + recv_pool->buffer_sz = data->host_pool_parms.size_recv_pool_entry; + xmit_pool->buffer_sz = data->eioc_pool_parms.size_recv_pool_entry; + + xmit_pool->notify_count = 0; + xmit_pool->notify_bundle = data->config->notify_bundle; + xmit_pool->next_xmit_pool = 0; +#ifdef LIMIT_OUTSTANDING_SENDS + xmit_pool->num_xmit_bufs = xmit_pool->notify_bundle * 2; +#else /* !LIMIT_OUTSTANDING_SENDS */ + xmit_pool->num_xmit_bufs = xmit_pool->pool_sz; +#endif /* LIMIT_OUTSTANDING_SENDS */ + xmit_pool->next_xmit_buf = 0; + xmit_pool->last_comp_buf = xmit_pool->num_xmit_bufs - 1; + + recv_pool->sz_free_bundle = + data->host_pool_parms.free_recv_pool_entries_per_update; + recv_pool->num_free_bufs = 0; + recv_pool->num_posted_bufs = 0; + xmit_pool->kick_count = 0; + xmit_pool->kick_byte_count = 0; + + xmit_pool->send_kicks = + data->eioc_pool_parms.num_recv_pool_entries_before_kick + || data->eioc_pool_parms.num_recv_pool_bytes_before_kick; + xmit_pool->kick_bundle = + data->eioc_pool_parms.num_recv_pool_entries_before_kick; + xmit_pool->kick_byte_bundle = + data->eioc_pool_parms.num_recv_pool_bytes_before_kick; + recv_pool->next_full_buf = 0; + recv_pool->next_free_buf = 0; + recv_pool->kick_on_free = FALSE; + + xmit_pool->need_buffers = TRUE; + + sz = sizeof(struct rdma_dest) * recv_pool->pool_sz; + sz += sizeof(struct recv_io) * data->config->num_recvs; + sz += sizeof(struct rdma_io) * xmit_pool->num_xmit_bufs; + + xmit_pool->xmitdata_len = + BUFFER_SIZE(min_xmt_skb) * xmit_pool->num_xmit_bufs; + if ((data->local_storage = vmalloc(sz)) == NULL) { + DATA_ERROR("failed allocating %d bytes local storage\n", sz); + goto failure; + } + + memset(data->local_storage, '\0', sz); + + recv_pool->recv_bufs = (struct rdma_dest *)data->local_storage; + sz = sizeof(struct rdma_dest) * recv_pool->pool_sz; + recv_io = (struct recv_io *)(data->local_storage + sz); + sz += sizeof(struct recv_io) * data->config->num_recvs; + xmit_pool->xmit_bufs = (struct rdma_io *)(data->local_storage + sz); + sz += sizeof(struct rdma_io) * xmit_pool->num_xmit_bufs; + + if ((region_data = kzalloc(4, GFP_KERNEL)) == NULL) { + DATA_ERROR("failed to alloc memory for region data\n"); + goto failure; + } + + data->region_data = region_data; + + recv_pool->buf_pool_len = + sizeof(struct buff_pool_entry) * recv_pool->eioc_pool_sz; + if ((recv_pool->buf_pool = + kzalloc(recv_pool->buf_pool_len, GFP_KERNEL)) == NULL) { + DATA_ERROR("failed allocating %d bytes" + " for recv pool bufpool\n", + recv_pool->buf_pool_len); + goto failure; + } + + recv_pool->buf_pool_dma = + dma_map_single(viport->config->ibdev->dma_device, + recv_pool->buf_pool, recv_pool->buf_pool_len, + DMA_TO_DEVICE); + + if (dma_mapping_error(recv_pool->buf_pool_dma)) { + DATA_ERROR("xmit buf_pool dma map error\n"); + goto failure; + } + + xmit_pool->buf_pool_len = + sizeof(struct buff_pool_entry) * xmit_pool->pool_sz; + if ((xmit_pool->buf_pool = + kzalloc(xmit_pool->buf_pool_len, GFP_KERNEL)) == NULL) { + DATA_ERROR("failed allocating %d bytes" + " for xmit pool bufpool\n", + xmit_pool->buf_pool_len); + goto failure; + } + xmit_pool->buf_pool_dma = + dma_map_single(viport->config->ibdev->dma_device, + xmit_pool->buf_pool, xmit_pool->buf_pool_len, + DMA_FROM_DEVICE); + + if (dma_mapping_error(xmit_pool->buf_pool_dma)) { + DATA_ERROR("xmit buf_pool dma map error\n"); + goto failure; + } + + if ((xmit_pool->xmit_data = + kzalloc(xmit_pool->xmitdata_len, GFP_KERNEL)) == NULL) { + DATA_ERROR("failed allocating %d bytes for xmit data\n", + xmit_pool->xmitdata_len); + goto failure; + } + + xmit_pool->xmitdata_dma = + dma_map_single(viport->config->ibdev->dma_device, + xmit_pool->xmit_data, xmit_pool->xmitdata_len, + DMA_TO_DEVICE); + + if (dma_mapping_error(xmit_pool->xmitdata_dma)) { + DATA_ERROR("xmit data dma map error\n"); + goto failure; + } + + rdma_io = &data->free_bufs_io; + rdma_io->io.viport = data->parent; + rdma_io->io.routine = NULL; + + rdma_io->list[0].lkey = data->mr->lkey; + + rdma_io->io.swr.wr_id = (unsigned long)rdma_io; + rdma_io->io.swr.sg_list = rdma_io->list; + rdma_io->io.swr.num_sge = 1; + rdma_io->io.swr.opcode = IB_WR_RDMA_WRITE; + rdma_io->io.swr.send_flags = IB_SEND_SIGNALED; + rdma_io->io.type = RDMA; + + send_io = &data->kick_io; + send_io->io.viport = data->parent; + send_io->io.routine = NULL; + + region_data_dma = dma_map_single(viport->config->ibdev->dma_device, + region_data, 4, DMA_BIDIRECTIONAL); + + if (dma_mapping_error(region_data_dma)) { + DATA_ERROR("region data dma map error\n"); + goto failure; + } + + data->regiondata_dma = region_data_dma; + + send_io->list.addr = region_data_dma; + send_io->list.length = 0; + send_io->list.lkey = data->mr->lkey; + + send_io->io.swr.wr_id = (unsigned long)send_io; + send_io->io.swr.sg_list = &send_io->list; + send_io->io.swr.num_sge = 1; + send_io->io.swr.opcode = IB_WR_SEND; + send_io->io.swr.send_flags = IB_SEND_SIGNALED; + send_io->io.type = SEND; + + INIT_LIST_HEAD(&data->recv_ios); + spin_lock_init(&data->recv_ios_lock); + spin_lock_init(&data->xmit_buf_lock); + for (i = 0; i < data->config->num_recvs; i++) { + recv_io[i].io.viport = data->parent; + recv_io[i].io.routine = data_received_kick; + recv_io[i].list.addr = region_data_dma; + recv_io[i].list.length = 4; + recv_io[i].list.lkey = data->mr->lkey; + + recv_io[i].io.rwr.wr_id = PTR64(&recv_io[i].io); + recv_io[i].io.rwr.sg_list = &recv_io[i].list; + recv_io[i].io.rwr.num_sge = 1; + + list_add(&recv_io[i].io.list_ptrs, &data->recv_ios); + } + INIT_LIST_HEAD(&recv_pool->avail_recv_bufs); + for (i = 0; i < recv_pool->pool_sz; i++) { + rdma_dest = &recv_pool->recv_bufs[i]; + list_add(&rdma_dest->list_ptrs, &recv_pool->avail_recv_bufs); + } + + xmit_dma = xmit_pool->xmitdata_dma; + xmit_data = xmit_pool->xmit_data; + + for (i = 0; i < xmit_pool->num_xmit_bufs; i++) { + rdma_io = &xmit_pool->xmit_bufs[i]; + rdma_io->index = i; + rdma_io->io.viport = data->parent; + rdma_io->io.routine = data_xmit_complete; + + rdma_io->list[0].lkey = data->mr->lkey; + rdma_io->list[1].lkey = data->mr->lkey; + rdma_io->io.swr.wr_id = PTR64(rdma_io); + rdma_io->io.swr.sg_list = rdma_io->list; + rdma_io->io.swr.num_sge = 2; + rdma_io->io.swr.opcode = IB_WR_RDMA_WRITE; + rdma_io->io.swr.send_flags = IB_SEND_SIGNALED; + rdma_io->io.type = RDMA; + + rdma_io->data = xmit_data; + rdma_io->data_dma = xmit_dma; + + xmit_data += ROUNDUPP2(min_xmt_skb, VIPORT_TRAILER_ALIGNMENT); + xmit_dma += ROUNDUPP2(min_xmt_skb, VIPORT_TRAILER_ALIGNMENT); + rdma_io->trailer = (struct viport_trailer *)xmit_data; + rdma_io->trailer_dma = xmit_dma; + xmit_data += sizeof(struct viport_trailer); + xmit_dma += sizeof(struct viport_trailer); + } + + xmit_pool->rdma_rkey = data->mr->rkey; + xmit_pool->rdma_addr = xmit_pool->buf_pool_dma; + + data_post_recvs(data); + + if (vnic_ib_cm_connect(&data->ib_conn)) + return TRUE; +failure: + if (data->local_storage) { + vfree(data->local_storage); + } + + if (region_data) + kfree(region_data); + + if (recv_pool->buf_pool) + kfree(recv_pool->buf_pool); + + if (xmit_pool->buf_pool) + kfree(xmit_pool->buf_pool); + + if (xmit_pool->xmit_data) + kfree(xmit_pool->xmit_data); + + return FALSE; +} + +static void data_add_free_buffer(struct data *data, int index, + struct rdma_dest *rdma_dest) +{ + struct recv_pool *pool = &data->recv_pool; + struct buff_pool_entry *bpe; + + DATA_FUNCTION("data_add_free_buffer()\n"); + rdma_dest->trailer->connection_hash_and_valid = 0; + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + bpe = &pool->buf_pool[index]; + bpe->r_key = hton32(data->mr->rkey); + + bpe->remote_addr = hton64(PTR64(virt_to_phys(rdma_dest->data))); + bpe->valid = (u32) (rdma_dest - &pool->recv_bufs[0]) + 1; + ++pool->num_free_bufs; + + dma_sync_single_for_device(data->parent->config->ibdev->dma_device, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + return; +} + +/* NOTE: this routine is not reentrant */ +static void data_alloc_buffers(struct data *data, BOOLEAN initial_allocation) +{ + struct recv_pool *pool = &data->recv_pool; + struct rdma_dest *rdma_dest; + struct sk_buff *skb; + int index; + + DATA_FUNCTION("data_alloc_buffers()\n"); + index = + ADD(pool->next_free_buf, pool->num_free_bufs, pool->eioc_pool_sz); + DATA_INFO("next_free_buf %x\n", pool->next_free_buf); + while (!list_empty(&pool->avail_recv_bufs)) { + rdma_dest = + list_entry(pool->avail_recv_bufs.next, struct rdma_dest, + list_ptrs); + if (!rdma_dest->skb) { + if (initial_allocation) + skb = + alloc_skb(pool->buffer_sz + 2, GFP_KERNEL); + else + skb = dev_alloc_skb(pool->buffer_sz + 2); + if (skb == NULL) { + DATA_ERROR("failed to alloc skb\n"); + break; + } + skb_reserve(skb, 2); + skb_put(skb, pool->buffer_sz); + rdma_dest->skb = skb; + rdma_dest->data = skb->data; + rdma_dest->trailer = + (struct viport_trailer *)(rdma_dest->data + + pool->buffer_sz - + sizeof(struct + viport_trailer)); + } + rdma_dest->trailer->connection_hash_and_valid = 0; + + list_del_init(&rdma_dest->list_ptrs); + + data_add_free_buffer(data, index, rdma_dest); + index = NEXT(index, pool->eioc_pool_sz); + } + return; +} + +static void data_send_kick_message(struct data *data) +{ + struct xmit_pool *pool = &data->xmit_pool; + DATA_FUNCTION("data_send_kick_message()\n"); + /* stop timer for bundle_timeout */ + if (data->kick_timer_on == TRUE) { + del_timer(&data->kick_timer); + data->kick_timer_on = FALSE; + } + pool->kick_count = 0; + pool->kick_byte_count = 0; + + /* TBD: keep track of when kick is outstanding, and + * don't reuse until complete + */ + if (!vnic_ib_post_send(&data->ib_conn, &data->free_bufs_io.io)) { + DATA_ERROR("failed to post send\n"); + viport_failure(data->parent); + return; + } + return; +} + +static void data_send_free_recv_buffers(struct data *data) +{ + struct recv_pool *pool = &data->recv_pool; + struct ib_send_wr *swr = &data->free_bufs_io.io.swr; + + BOOLEAN bufs_sent = FALSE; + u64 rdma_addr; + u32 offset; + u32 sz; + unsigned int num_to_send, next_increment; + + DATA_FUNCTION("data_send_free_recv_buffers()\n"); + + DATA_INFO("num_free_bufs %x sz_free_bundle %x\n", + pool->num_free_bufs, pool->sz_free_bundle); + + for (num_to_send = pool->sz_free_bundle; + num_to_send <= pool->num_free_bufs; + num_to_send += pool->sz_free_bundle) { + /* handle multiple bundles as one when possible. */ + next_increment = num_to_send + pool->sz_free_bundle; + if ((next_increment <= pool->num_free_bufs) + && (pool->next_free_buf + next_increment <= + pool->eioc_pool_sz)) { + continue; + } + offset = pool->next_free_buf * sizeof(struct buff_pool_entry); + sz = num_to_send * sizeof(struct buff_pool_entry); + rdma_addr = pool->eioc_rdma_addr + offset; + swr->sg_list->length = sz; + swr->sg_list->addr = pool->buf_pool_dma + offset; + swr->wr.rdma.remote_addr = rdma_addr; + + if (!vnic_ib_post_send(&data->ib_conn, + &data->free_bufs_io.io)) { + DATA_ERROR("failed to post send\n"); + viport_failure(data->parent); + break; + } + INC(pool->next_free_buf, num_to_send, pool->eioc_pool_sz); + pool->num_free_bufs -= num_to_send; + pool->num_posted_bufs += num_to_send; + bufs_sent = TRUE; + } + + if (bufs_sent) { + if (pool->kick_on_free) { + data_send_kick_message(data); + } + } + if (pool->num_posted_bufs == 0) { + DATA_ERROR("%s: unable to allocate receive buffers\n", + config_viport_name(data->parent->config)); + viport_failure(data->parent); + } + return; +} + +void data_connected(struct data *data) +{ + DATA_FUNCTION("data_connected()\n"); + data->free_bufs_io.io.swr.wr.rdma.rkey = data->recv_pool.eioc_rdma_rkey; + data_alloc_buffers(data, TRUE); + data_send_free_recv_buffers(data); + data->connected = TRUE; + return; +} + +void data_disconnect(struct data *data) +{ + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct recv_pool *recv_pool = &data->recv_pool; + u8 *region_data = data->region_data; + unsigned int i; + + DATA_FUNCTION("data_disconnect()\n"); + + data->connected = FALSE; + if (data->kick_timer_on) { + del_timer_sync(&data->kick_timer); + data->kick_timer_on = FALSE; + } + + for (i = 0; i < xmit_pool->num_xmit_bufs; i++) { + if (xmit_pool->xmit_bufs[i].skb) + dev_kfree_skb(xmit_pool->xmit_bufs[i].skb); + xmit_pool->xmit_bufs[i].skb = NULL; + + } + for (i = 0; i < recv_pool->pool_sz; i++) { + if (data->recv_pool.recv_bufs[i].skb) + dev_kfree_skb(recv_pool->recv_bufs[i].skb); + recv_pool->recv_bufs[i].skb = NULL; + } + vfree(data->local_storage); + if (region_data) { + dma_unmap_single(data->parent->config->ibdev->dma_device, + data->regiondata_dma, 4, DMA_BIDIRECTIONAL); + kfree(region_data); + } + + if (recv_pool->buf_pool) { + dma_unmap_single(data->parent->config->ibdev->dma_device, + recv_pool->buf_pool_dma, + recv_pool->buf_pool_len, DMA_TO_DEVICE); + kfree(recv_pool->buf_pool); + } + + if (xmit_pool->buf_pool) { + dma_unmap_single(data->parent->config->ibdev->dma_device, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_FROM_DEVICE); + kfree(xmit_pool->buf_pool); + } + + if (xmit_pool->xmit_data) { + dma_unmap_single(data->parent->config->ibdev->dma_device, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); + kfree(xmit_pool->xmit_data); + } + + return; +} + +void data_cleanup(struct data *data) +{ + init_completion(&data->ib_conn.done); + if (ib_send_cm_dreq(data->ib_conn.cm_id, NULL, 0)) { + printk(KERN_DEBUG "data CM DREQ sending failed\n"); + } else + wait_for_completion(&data->ib_conn.done); + + ib_destroy_cm_id(data->ib_conn.cm_id); + + ib_destroy_qp(data->ib_conn.qp); + + ib_destroy_cq(data->ib_conn.cq); + ib_dereg_mr(data->mr); + +} + +static BOOLEAN data_alloc_xmit_buffer(struct data *data, struct sk_buff *skb, + struct buff_pool_entry **pp_bpe, + struct rdma_io **pp_rdma_io, + BOOLEAN * last) +{ + struct xmit_pool *pool = &data->xmit_pool; + unsigned long flags; + + DATA_FUNCTION("data_alloc_xmit_buffer()\n"); + + spin_lock_irqsave(&data->xmit_buf_lock, flags); + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + *last = FALSE; + *pp_rdma_io = &pool->xmit_bufs[pool->next_xmit_buf]; + *pp_bpe = &pool->buf_pool[pool->next_xmit_pool]; + + if ((*pp_bpe)->valid && pool->next_xmit_buf != pool->last_comp_buf) { + INC(pool->next_xmit_buf, 1, pool->num_xmit_bufs); + INC(pool->next_xmit_pool, 1, pool->pool_sz); + if (!pool->buf_pool[pool->next_xmit_pool].valid) { + DATA_INFO("just used the last EIOU receive buffer\n"); + *last = TRUE; + pool->need_buffers = TRUE; + viport_stop_xmit(data->parent); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + data->statistics.kick_reqs++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + } else if (pool->next_xmit_buf == pool->last_comp_buf) { + DATA_INFO("just used our last xmit buffer\n"); + pool->need_buffers = TRUE; + viport_stop_xmit(data->parent); + } + (*pp_rdma_io)->skb = skb; + (*pp_bpe)->valid = 0; + spin_unlock_irqrestore(&data->xmit_buf_lock, flags); + return TRUE; + } else { +#ifdef CONFIG_INFINIBAND_VNIC_STATS + data->statistics.no_xmit_bufs++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + DATA_ERROR("Out of xmit buffers\n"); + viport_stop_xmit(data->parent); + dma_sync_single_for_device(data->parent->config->ibdev-> + dma_device, pool->buf_pool_dma, + pool->buf_pool_len, DMA_TO_DEVICE); + + spin_unlock_irqrestore(&data->xmit_buf_lock, flags); + return FALSE; + } +} + +static void data_rdma_packet(struct data *data, struct buff_pool_entry *bpe, + struct rdma_io *rdma_io) +{ + struct ib_send_wr *swr; + struct sk_buff *skb; + u8 *d; + dma_addr_t trailer_data_dma; + dma_addr_t skb_data_dma; + int len; + int fill_len; + struct xmit_pool *xmit_pool = &data->xmit_pool; + struct viport *viport = data->parent; + + DATA_FUNCTION("data_rdma_packet()\n"); + swr = &rdma_io->io.swr; + skb = rdma_io->skb; + len = ROUNDUPP2(rdma_io->len, VIPORT_TRAILER_ALIGNMENT); + fill_len = len - skb->len; + + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); + + d = (u8 *) rdma_io->trailer - fill_len; + trailer_data_dma = rdma_io->trailer_dma - fill_len; + memset(d, '\0', fill_len); + + swr->sg_list[0].length = skb->len; + if (skb->len <= min_xmt_skb) { + memcpy(rdma_io->data, skb->data, skb->len); + swr->sg_list[0].lkey = data->mr->lkey; + swr->sg_list[0].addr = rdma_io->data_dma; + dev_kfree_skb_any(skb); + rdma_io->skb = NULL; + } else { + swr->sg_list[0].lkey = data->mr->lkey; + + skb_data_dma = dma_map_single(viport->config->ibdev->dma_device, + skb->data, skb->len, + DMA_TO_DEVICE); + + if (dma_mapping_error(skb_data_dma)) { + DATA_ERROR("skb data dma map error\n"); + return; + } + + rdma_io->skb_data_dma = skb_data_dma; + + swr->sg_list[0].addr = skb_data_dma; + skb_orphan(skb); + } + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_TO_DEVICE); + + swr->sg_list[1].addr = trailer_data_dma; + swr->sg_list[1].length = fill_len + sizeof(struct viport_trailer); + swr->sg_list[0].lkey = data->mr->lkey; + swr->wr.rdma.remote_addr = ntoh64(bpe->remote_addr); + swr->wr.rdma.remote_addr += data->xmit_pool.buffer_sz; + swr->wr.rdma.remote_addr -= (sizeof(struct viport_trailer) + len); + swr->wr.rdma.rkey = ntoh32(bpe->r_key); + + dma_sync_single_for_device(data->parent->config->ibdev->dma_device, + xmit_pool->buf_pool_dma, + xmit_pool->buf_pool_len, DMA_TO_DEVICE); + + data->xmit_pool.notify_count++; + if (data->xmit_pool.notify_count >= data->xmit_pool.notify_bundle) { + data->xmit_pool.notify_count = 0; + swr->send_flags = IB_SEND_SIGNALED; + } else { + swr->send_flags = 0; + } + dma_sync_single_for_device(data->parent->config->ibdev->dma_device, + xmit_pool->xmitdata_dma, + xmit_pool->xmitdata_len, DMA_TO_DEVICE); + if (!vnic_ib_post_send(&data->ib_conn, &rdma_io->io)) { + DATA_ERROR("failed to post send for data RDMA write\n"); + viport_failure(data->parent); + return; + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + data->statistics.xmit_num++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + return; +} + +static void data_kick_timeout_handler(unsigned long arg) +{ + struct data *data = (struct data *)arg; + + DATA_FUNCTION("data_kick_timeout_handler()\n"); + data->kick_timer_on = FALSE; + data_send_kick_message(data); + return; +} + +BOOLEAN data_xmit_packet(struct data *data, struct sk_buff *skb) +{ + struct xmit_pool *pool = &data->xmit_pool; + struct rdma_io *rdma_io; + struct buff_pool_entry *bpe; + struct viport_trailer *trailer; + BOOLEAN last; + unsigned int sz = skb->len; + + DATA_FUNCTION("data_xmit_packet()\n"); + if (sz > pool->buffer_sz) { + DATA_ERROR("outbound packet too large, size = %d\n", sz); + return FALSE; + } + + if (!data_alloc_xmit_buffer(data, skb, &bpe, &rdma_io, &last)) { + DATA_ERROR("error in allocating data xmit buffer\n"); + return FALSE; + } + + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + pool->xmitdata_dma, pool->xmitdata_len, + DMA_TO_DEVICE); + + trailer = rdma_io->trailer; + + memset(trailer, '\0', sizeof(struct viport_trailer)); + memcpy(trailer->dest_mac_addr, skb->data, ETH_ALEN); + if (skb->sk) + trailer->connection_hash_and_valid = + 0x40 | ((get_sksport(skb->sk) + get_skdport(skb->sk)) & + 0x3f); + trailer->connection_hash_and_valid |= hton8(CHV_VALID); + if ((sz > 16) && (*(u16 *) (skb->data + 12) == hton16(0x8100))) { + trailer->vlan = *(u16 *) (skb->data + 14); + memmove(skb->data + 4, skb->data, 12); + skb_pull(skb, 4); + trailer->pkt_flags |= PF_VLAN_INSERT; + } + if (last) + trailer->pkt_flags |= PF_KICK; + if (sz < 60) { + /* EIOU requires all packets to be + * of ethernet minimum packet size. + */ + trailer->data_length = hton16(60); + rdma_io->len = 60; + } else { + trailer->data_length = hton16(sz); + rdma_io->len = sz; + } + + if (skb->ip_summed == CHECKSUM_PARTIAL) { + trailer->tx_chksum_flags = TX_CHKSUM_FLAGS_CHECKSUM_V4 + | TX_CHKSUM_FLAGS_IP_CHECKSUM + | TX_CHKSUM_FLAGS_TCP_CHECKSUM + | TX_CHKSUM_FLAGS_UDP_CHECKSUM; + } + + dma_sync_single_for_device(data->parent->config->ibdev->dma_device, + pool->xmitdata_dma, pool->xmitdata_len, + DMA_TO_DEVICE); + + data_rdma_packet(data, bpe, rdma_io); + + if (pool->send_kicks) { + /* EIOC needs kicks to inform it of sent packets */ + pool->kick_count++; + pool->kick_byte_count += sz; + if ((pool->kick_count >= pool->kick_bundle) + || (pool->kick_byte_count >= pool->kick_byte_bundle)) { + data_send_kick_message(data); + } else if (pool->kick_count == 1) { + init_timer(&data->kick_timer); + /* timeout_before_kick is in u_sec */ + data->kick_timer.expires = + (data->eioc_pool_parms.timeout_before_kick * HZ / + 1000000) + jiffies; + data->kick_timer.data = (unsigned long)data; + data->kick_timer.function = data_kick_timeout_handler; + add_timer(&data->kick_timer); + data->kick_timer_on = TRUE; + } + } + return TRUE; +} + +static void data_check_xmit_buffers(struct data *data) +{ + struct xmit_pool *pool = &data->xmit_pool; + unsigned long flags; + + DATA_FUNCTION("data_check_xmit_buffers()\n"); + spin_lock_irqsave(&data->xmit_buf_lock, flags); + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + if (data->xmit_pool.need_buffers + && pool->buf_pool[pool->next_xmit_pool].valid + && pool->next_xmit_buf != pool->last_comp_buf) { + data->xmit_pool.need_buffers = FALSE; + viport_restart_xmit(data->parent); + DATA_INFO("there are free xmit buffers\n"); + } + dma_sync_single_for_device(data->parent->config->ibdev->dma_device, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + spin_unlock_irqrestore(&data->xmit_buf_lock, flags); + return; +} + +static struct sk_buff *data_recv_to_skbuff(struct data *data, + struct rdma_dest *rdma_dest) +{ + struct viport_trailer *trailer; + struct sk_buff *skb; + int start; + unsigned int len; + u8 rx_chksum_flags; + + DATA_FUNCTION("data_recv_to_skbuff()\n"); + trailer = rdma_dest->trailer; + start = data_offset(data, trailer); + len = data_len(data, trailer); +#ifdef INBOUND_COPY + if (len <= min_rcv_skb) { + /* leave room for VLAN header */ + skb = dev_alloc_skb(len + 6); + if (!skb) + goto no_copy; + skb_reserve(skb, 6); + memcpy(skb->data, rdma_dest->data + start, len); + skb_put(skb, len); + } else +#endif + { +no_copy: + skb = rdma_dest->skb; + rdma_dest->skb = NULL; + rdma_dest->trailer = NULL; + rdma_dest->data = NULL; + skb_pull(skb, start); + skb_trim(skb, len); + } + + rx_chksum_flags = trailer->rx_chksum_flags; + DATA_INFO + ("rx_chksum_flags = %d, LOOP = %c, IP = %c, TCP = %c, UDP = %c\n", + rx_chksum_flags, + (rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK) ? 'Y' : 'N', + (rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED) ? 'Y' + : (rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_FAILED) ? 'N' : + '-', + (rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED) ? 'Y' + : (rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_FAILED) ? 'N' : + '-', + (rx_chksum_flags & RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED) ? 'Y' + : (rx_chksum_flags & RX_CHKSUM_FLAGS_UDP_CHECKSUM_FAILED) ? 'N' : + '-'); + + if ((rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK) + || ((rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED) + && ((rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED) + || (rx_chksum_flags & + RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED)))) + skb->ip_summed = CHECKSUM_UNNECESSARY; + else + skb->ip_summed = CHECKSUM_NONE; + if (trailer->pkt_flags & PF_VLAN_INSERT) { + u8 *rv; + + rv = skb_push(skb, 4); + memmove(rv, rv + 4, 12); + *(u16 *) (rv + 12) = hton16(0x8100); + if (trailer->pkt_flags & PF_PVID_OVERRIDDEN) { + *(u16 *) (rv + 14) = trailer->vlan & hton16(0xF000); + } else { + *(u16 *) (rv + 14) = trailer->vlan; + } + } + + return skb; +} + +static BOOLEAN data_incoming_recv(struct data *data) +{ + struct recv_pool *pool = &data->recv_pool; + struct rdma_dest *rdma_dest; + struct viport_trailer *trailer; + struct buff_pool_entry *bpe; + struct sk_buff *skb; + + DATA_FUNCTION("data_incoming_recv()\n"); + if (pool->next_full_buf == pool->next_free_buf) + return FALSE; + bpe = &pool->buf_pool[pool->next_full_buf]; + rdma_dest = &pool->recv_bufs[bpe->valid - 1]; + trailer = rdma_dest->trailer; + if ((trailer != NULL) + && (trailer->connection_hash_and_valid & CHV_VALID)) { + /* received a packet */ + if (trailer->pkt_flags & PF_KICK) { + pool->kick_on_free = TRUE; + } + if ((skb = data_recv_to_skbuff(data, rdma_dest)) != NULL) { + viport_recv_packet(data->parent, skb); + list_add(&rdma_dest->list_ptrs, &pool->avail_recv_bufs); + } + dma_sync_single_for_cpu(data->parent->config->ibdev->dma_device, + pool->buf_pool_dma, pool->buf_pool_len, + DMA_TO_DEVICE); + + bpe->valid = 0; + dma_sync_single_for_device(data->parent->config->ibdev-> + dma_device, pool->buf_pool_dma, + pool->buf_pool_len, DMA_TO_DEVICE); + + INC(pool->next_full_buf, 1, pool->eioc_pool_sz); + pool->num_posted_bufs--; + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + data->statistics.recv_num++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + return TRUE; + } else { + return FALSE; + } +} + +static void data_received_kick(struct io *io) +{ + struct data *data = &io->viport->data; + unsigned long flags; + + DATA_FUNCTION("data_received_kick()\n"); +#ifdef CONFIG_INFINIBAND_VNIC_STATS + recv_ref = get_cycles(); +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + spin_lock_irqsave(&data->recv_ios_lock, flags); + list_add(&io->list_ptrs, &data->recv_ios); + spin_unlock_irqrestore(&data->recv_ios_lock, flags); + data_post_recvs(data); + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + data->statistics.kick_recvs++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + data_check_xmit_buffers(data); + + while (data_incoming_recv(data)) ; + if (data->connected == TRUE) { + data_alloc_buffers(data, FALSE); + data_send_free_recv_buffers(data); + } + return; +} + +static void data_xmit_complete(struct io *io) +{ + struct rdma_io *rdma_io = (struct rdma_io *)io; + struct data *data = &io->viport->data; + struct xmit_pool *pool = &data->xmit_pool; + struct sk_buff *skb; + + DATA_FUNCTION("data_xmit_complete()\n"); + + if (rdma_io->skb) { + dma_unmap_single(data->parent->config->ibdev->dma_device, + rdma_io->skb_data_dma, rdma_io->skb->len, + DMA_TO_DEVICE); + } + + while (pool->last_comp_buf != rdma_io->index) { + INC(pool->last_comp_buf, 1, pool->num_xmit_bufs); + skb = pool->xmit_bufs[pool->last_comp_buf].skb; + if (skb != NULL) { + dev_kfree_skb_any(skb); + } + pool->xmit_bufs[pool->last_comp_buf].skb = NULL; + } +#ifdef LIMIT_OUTSTANDING_SENDS + data_check_xmit_buffers(data); +#endif /* LIMIT_OUTSTANDING_SENDS */ + + return; +} diff --git a/drivers/infiniband/ulp/vnic/vnic_data.h b/drivers/infiniband/ulp/vnic/vnic_data.h new file mode 100644 index 0000000..0588b09 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_data.h @@ -0,0 +1,179 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_DATA_H_INCLUDED +#define VNIC_DATA_H_INCLUDED + +#ifdef CONFIG_INFINIBAND_VNIC_STATS +#include +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + +#include "vnic_ib.h" +#include "vnic_control_pkt.h" +#include "vnic_trailer.h" + +struct rdma_dest { + struct list_head list_ptrs; + struct sk_buff *skb; + u8 *data; + struct viport_trailer *trailer; +}; + +struct buff_pool_entry { + u64 remote_addr; + u32 r_key; + u32 valid; +}; + +struct recv_pool { + u32 buffer_sz; + u32 pool_sz; + u32 eioc_pool_sz; + uint32_t eioc_rdma_rkey; + u64 eioc_rdma_addr; + u32 next_full_buf; + u32 next_free_buf; + u32 num_free_bufs; + u32 num_posted_bufs; + u32 sz_free_bundle; + BOOLEAN kick_on_free; + struct buff_pool_entry *buf_pool; + dma_addr_t buf_pool_dma; + int buf_pool_len; + struct rdma_dest *recv_bufs; + struct list_head avail_recv_bufs; +}; + +struct xmit_pool { + u32 buffer_sz; + u32 pool_sz; + u32 notify_count; + u32 notify_bundle; + u32 next_xmit_buf; + u32 last_comp_buf; + u32 num_xmit_bufs; + u32 next_xmit_pool; + u32 kick_count; + u32 kick_byte_count; + u32 kick_bundle; + u32 kick_byte_bundle; + BOOLEAN need_buffers; + BOOLEAN send_kicks; + uint32_t rdma_rkey; + u64 rdma_addr; + struct buff_pool_entry *buf_pool; + dma_addr_t buf_pool_dma; + int buf_pool_len; + struct rdma_io *xmit_bufs; + u8 *xmit_data; + dma_addr_t xmitdata_dma; + int xmitdata_len; +}; + +struct data { + struct viport *parent; + struct data_config *config; + struct ib_mr *mr; + struct vnic_ib_conn ib_conn; + u8 *local_storage; + struct vnic_recv_pool_config host_pool_parms; + struct vnic_recv_pool_config eioc_pool_parms; + struct recv_pool recv_pool; + struct xmit_pool xmit_pool; + u8 *region_data; + dma_addr_t regiondata_dma; + struct rdma_io free_bufs_io; + struct send_io kick_io; + struct list_head recv_ios; + spinlock_t recv_ios_lock; + spinlock_t xmit_buf_lock; + BOOLEAN kick_timer_on; + BOOLEAN connected; + struct timer_list kick_timer; + struct completion done; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + struct { + u32 xmit_num; + u32 recv_num; + u32 free_buf_sends; + u32 free_buf_num; + u32 free_buf_min; + u32 kick_recvs; + u32 kick_reqs; + u32 no_xmit_bufs; + cycles_t no_xmit_buf_time; + } statistics; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ +}; + +BOOLEAN data_init(struct data *data, struct viport *viport, + struct data_config *config, struct ib_pd *pd, u64 guid); + +BOOLEAN data_connect(struct data *data); +void data_connected(struct data *data); +void data_disconnect(struct data *data); + +BOOLEAN data_xmit_packet(struct data *data, struct sk_buff *skb); + +void data_cleanup(struct data *data); + +#define data_is_connected(data) (ib_conn_connected(&((data)->ib_conn))) +#define data_path_id(data) (data)->config->path_id +#define data_eioc_pool(data) &(data)->eioc_pool_parms +#define data_host_pool(data) &(data)->host_pool_parms +#define data_eioc_pool_min(data) &(data)->config->eioc_min +#define data_host_pool_min(data) &(data)->config->host_min +#define data_eioc_pool_max(data) &(data)->config->eioc_max +#define data_host_pool_max(data) &(data)->config->host_max +#define data_local_pool_addr(data) (data)->xmit_pool.rdma_addr +#define data_local_pool_rkey(data) (data)->xmit_pool.rdma_rkey +#define data_remote_pool_addr(data) &(data)->recv_pool.eioc_rdma_addr +#define data_remote_pool_rkey(data) &(data)->recv_pool.eioc_rdma_rkey + +#define data_max_mtu(data) \ + MAX_PAYLOAD(min((data)->recv_pool.buffer_sz, \ + (data)->xmit_pool.buffer_sz)) - ETH_VLAN_HLEN + +#define data_len(data, trailer) ntoh16(trailer->data_length) +#define data_offset(data, trailer) \ + data->recv_pool.buffer_sz - sizeof(struct viport_trailer) \ + - ROUNDUPP2(data_len(data, trailer), VIPORT_TRAILER_ALIGNMENT) \ + + ntoh8(trailer->data_alignment_offset) + +/* the following macros manipulate ring buffer indexes. + * the ring buffer size must be a power of 2. + */ +#define ADD(index, increment, size) (((index) + (increment))&((size) - 1)) +#define NEXT(index, size) ADD(index, 1, size) +#define INC(index, increment, size) (index) = ADD(index, increment, size) + +#endif /* VNIC_DATA_H_INCLUDED */ diff --git a/drivers/infiniband/ulp/vnic/vnic_trailer.h b/drivers/infiniband/ulp/vnic/vnic_trailer.h new file mode 100644 index 0000000..d6bd6c7 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_trailer.h @@ -0,0 +1,63 @@ +#ifndef VNIC_TRAILER_H_INCLUDED +#define VNIC_TRAILER_H_INCLUDED + +/* pkt_flags values */ +#define PF_CHASH_VALID 0x01 +#define PF_IPSEC_VALID 0x02 +#define PF_TCP_SEGMENT 0x04 +#define PF_KICK 0x08 +#define PF_VLAN_INSERT 0x10 +#define PF_PVID_OVERRIDDEN 0x20 +#define PF_FCS_INCLUDED 0x40 +#define PF_FORCE_ROUTE 0x80 + +/* tx_chksum_flags values */ +#define TX_CHKSUM_FLAGS_CHECKSUM_V4 0x01 +#define TX_CHKSUM_FLAGS_CHECKSUM_V6 0x02 +#define TX_CHKSUM_FLAGS_TCP_CHECKSUM 0x04 +#define TX_CHKSUM_FLAGS_UDP_CHECKSUM 0x08 +#define TX_CHKSUM_FLAGS_IP_CHECKSUM 0x10 + +/* rx_chksum_flags values */ +#define RX_CHKSUM_FLAGS_TCP_CHECKSUM_FAILED 0x01 +#define RX_CHKSUM_FLAGS_UDP_CHECKSUM_FAILED 0x02 +#define RX_CHKSUM_FLAGS_IP_CHECKSUM_FAILED 0x04 +#define RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED 0x08 +#define RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED 0x10 +#define RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED 0x20 +#define RX_CHKSUM_FLAGS_LOOPBACK 0x40 +#define RX_CHKSUM_FLAGS_RESERVED 0x80 + +/* connection_hash_and_valid values */ +#define CHV_VALID 0x80 +#define CHV_HASH_MASH 0x7f + +struct viport_trailer { + s8 data_alignment_offset; + u8 rndis_header_length; /* reserved for use by edp */ + u16 data_length; + u8 pkt_flags; + u8 tx_chksum_flags; + u8 rx_chksum_flags; + u8 ip_sec_flags; + u32 tcp_seq_no; + u32 ip_sec_offload_handle; + u32 ip_sec_next_offload_handle; + u8 dest_mac_addr[6]; + u16 vlan; + u16 time_stamp; + u8 origin; + u8 connection_hash_and_valid; +}; + +#define VIPORT_TRAILER_ALIGNMENT 32 + +#define BUFFER_SIZE(len) \ + (sizeof(struct viport_trailer) + ROUNDUPP2((len), \ + VIPORT_TRAILER_ALIGNMENT)) + +#define MAX_PAYLOAD(len) \ + ROUNDDOWNP2((len) - sizeof(struct viport_trailer), \ + VIPORT_TRAILER_ALIGNMENT) + +#endif /* VNIC_TRAILER_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:08:02 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:38:02 +0530 Subject: [openib-general] [PATCH 6/10] Driver IB files - IB core stack interaction Message-ID: <4521BEFA.6084.4E446F45@rkuchimanchi.silverstorm.com> Adds the files that implement interaction with the core IB stack. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_ib.c | 709 +++++++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_ib.h | 167 ++++++++ 2 files changed, 876 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_ib.c b/drivers/infiniband/ulp/vnic/vnic_ib.c new file mode 100644 index 0000000..0c50b83 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_ib.c @@ -0,0 +1,709 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include + +#include "vnic_util.h" +#include "vnic_config.h" +#include "vnic_ib.h" +#include "vnic_viport.h" +#include "vnic_sys.h" +#include "vnic_main.h" + +static int vnic_ib_inited = 0; + +static void vnic_add_one(struct ib_device *device); +static void vnic_remove_one(struct ib_device *device); + +static struct ib_client vnic_client = { + .name = "vnic", + .add = vnic_add_one, + .remove = vnic_remove_one +}; + +static struct ib_sa_client vnic_sa_client; + +static CLASS_DEVICE_ATTR(create_primary, S_IWUSR, NULL, vnic_create_primary); +static CLASS_DEVICE_ATTR(create_secondary, S_IWUSR, NULL, + vnic_create_secondary); + +static CLASS_DEVICE_ATTR(delete_vnic, S_IWUSR, NULL, vnic_delete); + +static struct vnic_ib_port *vnic_add_port(struct vnic_ib_device *device, u8 port_num) +{ + struct vnic_ib_port *port; + + port = kzalloc(sizeof *port, GFP_KERNEL); + if (!port) + return NULL; + + init_completion(&port->cdev_info.released); + port->dev = device; + port->port_num = port_num; + + port->cdev_info.class_dev.class = &vnic_class; + port->cdev_info.class_dev.dev = device->dev->dma_device; + snprintf(port->cdev_info.class_dev.class_id, BUS_ID_SIZE, "vnic-%s-%d", + device->dev->name, port_num); + + if (class_device_register(&port->cdev_info.class_dev)) + goto free_port; + + if (class_device_create_file(&port->cdev_info.class_dev, + &class_device_attr_create_primary)) + goto err_class; + if (class_device_create_file(&port->cdev_info.class_dev, + &class_device_attr_create_secondary)) + goto err_class; + + return port; +err_class: + class_device_unregister(&port->cdev_info.class_dev); + +free_port: + kfree(port); + + return NULL; +} + +static void vnic_add_one(struct ib_device *device) +{ + struct vnic_ib_device *vnic_dev; + struct vnic_ib_port *port; + int s, e, p; + + vnic_dev = kmalloc(sizeof *vnic_dev, GFP_KERNEL); + vnic_dev->dev = device; + if (!vnic_dev) + return; + + INIT_LIST_HEAD(&vnic_dev->dev_list); + if (device->node_type == RDMA_NODE_IB_SWITCH) { + s = 0; + e = 0; + + } else { + s = 1; + e = device->phys_port_cnt; + + } + + for (p = s; p <= e; p++) { + port = vnic_add_port(vnic_dev, p); + if (port) + list_add_tail(&port->list, &vnic_dev->dev_list); + } + + ib_set_client_data(device, &vnic_client, vnic_dev); + +} + +static void vnic_remove_one(struct ib_device *device) +{ + struct vnic_ib_device *vnic_dev; + struct vnic_ib_port *port, *tmp_port; + + vnic_dev = ib_get_client_data(device, &vnic_client); + list_for_each_entry_safe(port, tmp_port, &vnic_dev->dev_list, list) { + class_device_unregister(&port->cdev_info.class_dev); + /* + * wait for sysfs entries to go away, so that no new vnics + * are created + */ + wait_for_completion(&port->cdev_info.released); + kfree(port); + + } + kfree(vnic_dev); +} + +BOOLEAN vnic_ib_init() +{ + int ret; + + IB_FUNCTION("vnic_ib_init()\n"); + + /* class has to be registered before + * calling ib_register_client() because, that call + * will trigger vnic_add_port() which will register + * class_device of vnic_class for the port + */ + ret = class_register(&vnic_class); + if (ret) { + printk(KERN_ERR "couldn't register class infiniband_vnic"); + goto out; + } + + ib_sa_register_client(&vnic_sa_client); + + ret = ib_register_client(&vnic_client); + if (ret) { + printk(KERN_ERR "couldn't register IB client"); + goto err_ib_reg; + } + + interface_cdev.class_dev.class = &vnic_class; + snprintf(interface_cdev.class_dev.class_id, BUS_ID_SIZE, "interfaces"); + + init_completion(&interface_cdev.released); + if (class_device_register(&interface_cdev.class_dev)) + goto err_class_dev; + + if (class_device_create_file(&interface_cdev.class_dev, + &class_device_attr_delete_vnic)) + goto err_class_file; + + vnic_ib_inited = 1; + + return TRUE; + +err_class_file: + class_device_unregister(&interface_cdev.class_dev); +err_class_dev: + ib_unregister_client(&vnic_client); +err_ib_reg: + ib_sa_unregister_client(&vnic_sa_client); + class_unregister(&vnic_class); +out: + return FALSE; +} + +void vnic_ib_cleanup() +{ + IB_FUNCTION("vnic_ib_cleanup()\n"); + + if (!vnic_ib_inited) + return; + + class_device_unregister(&interface_cdev.class_dev); + wait_for_completion(&interface_cdev.released); + + ib_unregister_client(&vnic_client); + ib_sa_unregister_client(&vnic_sa_client); + class_unregister(&vnic_class); + + return; +} + +static void vnic_path_rec_completion(int status, + struct ib_sa_path_rec *pathrec, + void *context) +{ + struct ib_path_info *p = context; + p->status = status; + if (!status) + p->path = *pathrec; + + complete(&p->done); +} + +BOOLEAN vnic_ib_get_path(struct netpath *netpath, struct vnic * vnic) +{ + struct viport_config *config = netpath->viport->config; + + init_completion(&config->path_info.done); + IB_INFO("Using SA path rec get time out value of %d\n", + config->sa_path_rec_get_timeout); + config->path_info.path_query_id = + ib_sa_path_rec_get(&vnic_sa_client, + config->ibdev, + config->port, + &config->path_info.path, + IB_SA_PATH_REC_DGID | + IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_PKEY, + config->sa_path_rec_get_timeout, + GFP_KERNEL, + vnic_path_rec_completion, + &config->path_info, + &config->path_info.path_query); + + if (config->path_info.path_query_id < 0) { + IB_ERROR("SA path record query failed\n"); + return FALSE; + } + + wait_for_completion(&config->path_info.done); + + if (config->path_info.status < 0) { + printk(KERN_WARNING PFX "path record query failed for dgid " + "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[0]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[2]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[4]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[6]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[8]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[10]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[12]), + (int)be16_to_cpu(*(__be16 *) &config->path_info.path. + dgid.raw[14])); + + if (config->path_info.status == -ETIMEDOUT) + printk(KERN_WARNING PFX + "reason: path record query timed out\n"); + else if (config->path_info.status == -EIO) + printk(KERN_WARNING PFX + "reason: error in sending path record query\n"); + + netpath_timer(netpath, vnic->config->no_path_timeout); + return FALSE; + } + + return TRUE; +} + +static void ib_qp_event(struct ib_event *event, void *context) +{ + IB_ERROR("QP event %d\n", event->event); +} + +static void vnic_ib_completion(struct ib_cq *cq, void *ptr) +{ + struct ib_wc wc; + struct io *io; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + struct vnic_ib_conn *ib_conn = ptr; + cycles_t comp_time; + u32 comp_num = 0; + comp_time = get_cycles(); + ib_conn->statistics.num_callbacks++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + while (ib_poll_cq(cq, 1, &wc) > 0) { + io = (struct io *)(wc.wr_id); + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + ib_conn->statistics.num_ios++; + comp_num++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + if (wc.status) { +#if 0 + IB_ERROR("completion error " + "wc.status %d wc.opcode %d vendor err 0x%x\n", + wc.status, wc.opcode, wc.vendor_err); +#endif + } else if (io) { +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (io->type == RECV) { + io->time = comp_time; + } else if (io->type == RDMA) { + ib_conn->statistics.rdma_comp_time += + comp_time - io->time; + ib_conn->statistics.rdma_comp_ios++; + } else if (io->type == SEND) { + ib_conn->statistics.send_comp_time += + comp_time - io->time; + ib_conn->statistics.send_comp_ios++; + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + if (io->routine) + (*io->routine) (io); + } + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (comp_num > ib_conn->statistics.max_ios) + ib_conn->statistics.max_ios = comp_num; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + +} + +int vnic_ib_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct vnic_ib_conn *ib_conn = cm_id->context; + struct viport *viport = ib_conn->viport; + struct ib_qp_attr *qp_attr = NULL; + int err = 0; + int disconn = 0; + int attr_mask = 0; + + switch (event->event) { + case IB_CM_REQ_ERROR: + IB_ERROR("sending CM REQ failed\n"); + disconn = 1; + break; + case IB_CM_REP_RECEIVED: + IB_INFO("CM REP recvd\n"); + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); + if (!qp_attr) { + err = 1; + break; + } + + qp_attr->qp_state = IB_QPS_RTR; + err = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (err) + break; + + err = ib_modify_qp(ib_conn->qp, qp_attr, attr_mask); + if (err) + break; + + IB_INFO("QP RTR\n"); + + qp_attr->qp_state = IB_QPS_RTS; + err = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (err) + break; + + err = ib_modify_qp(ib_conn->qp, qp_attr, attr_mask); + if (err) + break; + + IB_INFO("QP RTS\n"); + + err = ib_send_cm_rtu(cm_id, NULL, 0); + if (err) + break; + ib_conn->state = IB_CONN_CONNECTED; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + ib_conn->statistics.connection_time = + get_cycles() - ib_conn->statistics.connection_time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + IB_INFO("RTU SENT\n"); + break; + case IB_CM_REJ_RECEIVED: + printk(KERN_ERR PFX "CM rejected control connection \n"); + if (event->param.rej_rcvd.reason == + IB_CM_REJ_INVALID_SERVICE_ID) + printk(KERN_ERR "reason: invalid service ID. " + "IOCGUID value specified may be incorrect\n"); + else + printk(KERN_ERR "reason code : 0x%x\n", + event->param.rej_rcvd.reason); + + disconn = 1; + break; + case IB_CM_MRA_RECEIVED: + IB_INFO("CM MRA received\n"); + break; + + case IB_CM_DREP_RECEIVED: + IB_INFO("CM DREP recvd\n"); + ib_conn->state = IB_CONN_DISCONNECTED; + break; + + case IB_CM_TIMEWAIT_EXIT: + IB_ERROR("CM timewait exit\n"); + err = 1; + break; + + default: + IB_INFO("unhandled CM event %d\n", event->event); + break; + + } + + if (err) { + ib_conn->state = IB_CONN_DISCONNECTED; + viport_failure(viport); + } + + if (disconn) { + ib_conn->state = IB_CONN_DISCONNECTED; + viport_disconnect(viport); + + } + complete(&ib_conn->done); + return 0; +} + + +BOOLEAN vnic_ib_cm_connect(struct vnic_ib_conn *ib_conn) +{ + struct ib_cm_req_param *req = NULL; + struct viport *viport; + int ret; + + if (!ib_conn_initted(ib_conn)) { + IB_ERROR("IB Connection out of state for CM connect (%d)\n", + ib_conn->state); + return FALSE; + } + +#ifdef INIC_STATISTICS + ib_conn->statistics.connection_time = get_cycles(); +#endif + + req = kzalloc(sizeof *req, GFP_KERNEL); + + if (!req) + return -ENOMEM; + + viport = ib_conn->viport; + + req->primary_path = &viport->config->path_info.path; + req->alternate_path = NULL; + req->qp_num = ib_conn->qp->qp_num; + req->qp_type = ib_conn->qp->qp_type; + req->service_id = ib_conn->ib_config->service_id; + req->private_data = &ib_conn->ib_config->conn_data; + req->private_data_len = sizeof(struct vnic_connection_data); + req->flow_control = 1; + + get_random_bytes(&req->starting_psn, 4); + req->starting_psn &= 0xffffff; + + /* + * Both responder_resources and initiator_depth are set to zero + * as we do not need RDMA read. + * + * They also must be set to zero, otherwise data connections + * are rejected by VEx. + */ + req->responder_resources = 0; + req->initiator_depth = 0; + req->remote_cm_response_timeout = 20; + req->local_cm_response_timeout = 20; + req->retry_count = ib_conn->ib_config->retry_count; + req->rnr_retry_count = ib_conn->ib_config->rnr_retry_count; + req->max_cm_retries = 15; + + ib_conn->state = IB_CONN_CONNECTING; + + ret = ib_send_cm_req(ib_conn->cm_id, req); + + kfree(req); + + if (ret) { + IB_ERROR("CM REQ sending failed %d \n", ret); + ib_conn->state = IB_CONN_DISCONNECTED; + return FALSE; + } + return TRUE; + +} + +BOOLEAN vnic_ib_conn_init(struct vnic_ib_conn *ib_conn, struct viport *viport, + struct ib_pd *pd, uint64_t guid, + struct ib_config *config) +{ + struct ib_qp_init_attr *init_attr; + struct ib_qp_attr *attr; + BOOLEAN ret; + int retval; + struct viport_config *viport_config = viport->config; + unsigned int cq_size = config->num_sends + config->num_recvs; + + + if (!ib_conn_uninitted(ib_conn)) { + IB_ERROR("IB Connection out of state for init (%d)\n", + ib_conn->state); + return FALSE; + } + + init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); + if (!init_attr) + return FALSE; + + ib_conn->cq = ib_create_cq(viport_config->ibdev, vnic_ib_completion, + NULL, ib_conn, cq_size); + if (IS_ERR(ib_conn->cq)) { + ret = FALSE; + IB_ERROR("could not create CQ\n"); + goto out; + } + + ib_req_notify_cq(ib_conn->cq, IB_CQ_NEXT_COMP); + + init_attr->event_handler = ib_qp_event; + init_attr->cap.max_send_wr = config->num_sends; + init_attr->cap.max_recv_wr = config->num_recvs; + init_attr->cap.max_recv_sge = config->recv_scatter; + init_attr->cap.max_send_sge = config->send_gather; + init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; + init_attr->qp_type = IB_QPT_RC; + init_attr->send_cq = ib_conn->cq; + init_attr->recv_cq = ib_conn->cq; + + ib_conn->qp = ib_create_qp(pd, init_attr); + + if (IS_ERR(ib_conn->qp)) { + ret = FALSE; + IB_ERROR("could not create QP\n"); + ib_destroy_cq(ib_conn->cq); + goto out; + } + + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) { + ret = FALSE; + goto out; + } + + retval = ib_find_cached_pkey(viport_config->ibdev, + viport_config->port, + be16_to_cpu(viport_config->path_info.path. + pkey), + &attr->pkey_index); + if (retval) { + ret = FALSE; + IB_ERROR("ib_find_cached_pkey() failed\n"); + goto freeattr; + } + + attr->qp_state = IB_QPS_INIT; + attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE; + attr->port_num = viport_config->port; + + retval = ib_modify_qp(ib_conn->qp, attr, + IB_QP_STATE | + IB_QP_PKEY_INDEX | + IB_QP_ACCESS_FLAGS | IB_QP_PORT); + if (retval) { + ret = FALSE; + IB_ERROR("could not modify QP\n"); + ib_destroy_qp(ib_conn->qp); + ib_destroy_cq(ib_conn->cq); + goto freeattr; + } + + ib_conn->conn_lock = SPIN_LOCK_UNLOCKED; + ib_conn->state = IB_CONN_INITTED; + + ret = TRUE; +freeattr: + kfree(attr); +out: + kfree(init_attr); + return ret; +} + +BOOLEAN vnic_ib_post_recv(struct vnic_ib_conn * ib_conn, struct io * io) +{ +#ifdef CONFIG_INFINIBAND_VNIC_STATS + cycles_t post_time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + struct ib_recv_wr *bad_wr; + int ret; + unsigned long flags; + + IB_FUNCTION("vnic_ib_post_recv()\n"); + + spin_lock_irqsave(&ib_conn->conn_lock, flags); + + if (!ib_conn_initted(ib_conn) && !ib_conn_connected(ib_conn)) + goto post_fail; + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + io->type = RECV; + post_time = get_cycles(); + if (io->time != 0) { + ib_conn->statistics.recv_comp_time += post_time - io->time; + ib_conn->statistics.recv_comp_ios++; + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + ret = ib_post_recv(ib_conn->qp, &io->rwr, &bad_wr); + + if (ret) { + IB_ERROR("error in posting rcv wr\n"); + goto post_fail; + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + post_time = get_cycles() - post_time; + ib_conn->statistics.recv_post_time += post_time; + ib_conn->statistics.recv_post_ios++; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + spin_unlock_irqrestore(&ib_conn->conn_lock, flags); + return TRUE; +post_fail: + spin_unlock_irqrestore(&ib_conn->conn_lock, flags); + return FALSE; + +} + +BOOLEAN vnic_ib_post_send(struct vnic_ib_conn * ib_conn, struct io * io) +{ +#ifdef CONFIG_INFINIBAND_VNIC_STATS + cycles_t post_time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + unsigned long flags; + struct ib_send_wr *bad_wr; + int ret; + + IB_FUNCTION("vnic_ib_post_send()\n"); + + spin_lock_irqsave(&ib_conn->conn_lock, flags); + + if (!ib_conn_connected(ib_conn)) { + IB_ERROR("IB Connection out of state for posting sends (%d)\n", + ib_conn->state); + return FALSE; + } + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + io->time = post_time = get_cycles(); +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + + if (io->swr.opcode == IB_WR_RDMA_WRITE) + io->type = RDMA; + else + io->type = SEND; + + ret = ib_post_send(ib_conn->qp, &io->swr, &bad_wr); + + if (ret) + goto send_post_fail; + + +#ifdef CONFIG_INFINIBAND_VNIC_STATS + post_time = get_cycles() - post_time; + + if (io->swr.opcode == IB_WR_RDMA_WRITE) { + ib_conn->statistics.rdma_post_time += post_time; + ib_conn->statistics.rdma_post_ios++; + } else { + ib_conn->statistics.send_post_time += post_time; + ib_conn->statistics.send_post_ios++; + } +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + spin_unlock_irqrestore(&ib_conn->conn_lock, flags); + return TRUE; +send_post_fail: + spin_unlock_irqrestore(&ib_conn->conn_lock, flags); + return FALSE; +} diff --git a/drivers/infiniband/ulp/vnic/vnic_ib.h b/drivers/infiniband/ulp/vnic/vnic_ib.h new file mode 100644 index 0000000..002396f --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_ib.h @@ -0,0 +1,167 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_IB_H_INCLUDED +#define VNIC_IB_H_INCLUDED + +#include +#include +#include +#include +#include +#include + +#include "vnic_sys.h" +#include "vnic_netpath.h" +#define PFX "ib_vnic: " + +struct io; +typedef void (comp_routine_t) (struct io * io); + +enum ib_conn_state { + IB_CONN_UNINITTED = 0, + IB_CONN_INITTED, + IB_CONN_CONNECTING, + IB_CONN_CONNECTED, + IB_CONN_DISCONNECTED +}; + +struct vnic_ib_conn { + struct viport *viport; + struct ib_config *ib_config; + spinlock_t conn_lock; + enum ib_conn_state state; + struct ib_qp *qp; + struct ib_cq *cq; + struct ib_cm_id *cm_id; + struct completion done; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + struct { + cycles_t connection_time; + cycles_t rdma_post_time; + u32 rdma_post_ios; + cycles_t rdma_comp_time; + u32 rdma_comp_ios; + cycles_t send_post_time; + u32 send_post_ios; + cycles_t send_comp_time; + u32 send_comp_ios; + cycles_t recv_post_time; + u32 recv_post_ios; + cycles_t recv_comp_time; + u32 recv_comp_ios; + u32 num_ios; + u32 num_callbacks; + u32 max_ios; + } statistics; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ +}; + +struct ib_path_info { + struct ib_sa_path_rec path; + struct ib_sa_query *path_query; + int path_query_id; + int status; + struct completion done; +}; + +struct vnic_ib_device { + struct list_head dev_list; + struct ib_device *dev; + +}; + +struct vnic; + +struct vnic_ib_port { + struct vnic_ib_device *dev; + u8 port_num; + struct class_dev_info cdev_info; + struct list_head list; +}; + +struct io { + struct list_head list_ptrs; + struct viport *viport; + comp_routine_t *routine; + struct ib_recv_wr rwr; + struct ib_send_wr swr; +#ifdef CONFIG_INFINIBAND_VNIC_STATS + cycles_t time; +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ + enum { RECV, RDMA, SEND } type; +}; + +struct rdma_io { + struct io io; + struct ib_sge list[2]; + u16 index; + u16 len; + u8 *data; + dma_addr_t data_dma; + struct sk_buff *skb; + dma_addr_t skb_data_dma; + struct viport_trailer *trailer; + dma_addr_t trailer_dma; +}; + +struct send_io { + struct io io; + struct ib_sge list; + u8 *virtual_addr; +}; + +struct recv_io { + struct io io; + struct ib_sge list; + u8 *virtual_addr; +}; + +BOOLEAN vnic_ib_init(void); +void vnic_ib_cleanup(void); + +BOOLEAN vnic_ib_get_path(struct netpath *netpath, struct vnic * vnic); +BOOLEAN vnic_ib_conn_init(struct vnic_ib_conn *ib_conn, struct viport *viport, + struct ib_pd *pd, u64 guid, struct ib_config *config); + +BOOLEAN vnic_ib_post_recv(struct vnic_ib_conn *ib_conn, struct io *io); +BOOLEAN vnic_ib_post_send(struct vnic_ib_conn *ib_conn, struct io *io); +BOOLEAN vnic_ib_cm_connect(struct vnic_ib_conn *ib_conn); +int vnic_ib_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); + +#define ib_conn_uninitted(ib_conn) ((ib_conn)->state == IB_CONN_UNINITTED) +#define ib_conn_initted(ib_conn) ((ib_conn)->state == IB_CONN_INITTED) +#define ib_conn_connecting(ib_conn) ((ib_conn)->state == IB_CONN_CONNECTING) +#define ib_conn_connected(ib_conn) ((ib_conn)->state == IB_CONN_CONNECTED) +#define ib_conn_disconnected(ib_conn) ((ib_conn)->state == IB_CONN_DISCONNECTED) + +#endif /* VNIC_IB_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:09:08 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:39:08 +0530 Subject: [openib-general] [PATCH 7/10] Handling of various configurable parameters of the driver Message-ID: <4521BF3C.17740.4E457192@rkuchimanchi.silverstorm.com> Adds the files that handle various configurable parameters of the driver ---- configuration of virtual NIC, control, data connections to the VEx and general IB connection parameters. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_config.c | 739 +++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_config.h | 215 ++++++++ 2 files changed, 954 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_config.c b/drivers/infiniband/ulp/vnic/vnic_config.c new file mode 100644 index 0000000..61db4ee --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_config.c @@ -0,0 +1,739 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include + +#include + +#include "vnic_util.h" +#include "vnic_config.h" +#include "vnic_trailer.h" + +#define CONFIG_PARAM(x) u32 x = 0xffffffff; +#define DEFAULT_PARAM(x, y) \ + do { \ + if (x == 0xffffffff) \ + x = y; \ + } while(0) + +#define boolean_range_check(x) __range_check(x, 0, 1, #x) +#define u32_zero_range_check(x) __range_check(x, 0, 0x7FFFFFFF, #x) +#define u32_range_check(x) __range_check(x, 1, 0x7FFFFFFF, #x) +#define u16_zero_range_check(x) __range_check(x, 0, 0xFFFF, #x) +#define u16_range_check(x) __range_check(x, 1, 0xFFFF, #x) +#define u8_zero_range_check(x) __range_check(x, 0, 0xFF, #x) +#define u8_range_check(x) __range_check(x, 1, 0xFF, #x) + +#define range_check(x, min, max) __range_check(x, min, max, #x) +#define less_or_equal_check(lo, hi) __less_or_equal_check(lo, hi, #lo, #hi) +#define less_than_check(lo, hi) __less_than_check(lo, hi, #lo, #hi) +#define power_of_2_check(num) __power_of_2_check(num, #num) + +CONFIG_PARAM(max_address_entries); +CONFIG_PARAM(min_address_entries); + +CONFIG_PARAM(min_mtu); +module_param(min_mtu, int, 0444); + +CONFIG_PARAM(max_mtu); +module_param(max_mtu, int, 0444); + +CONFIG_PARAM(host_recv_pool_entries); +module_param(host_recv_pool_entries, int, 0444); + +CONFIG_PARAM(min_host_pool_sz); +module_param(min_host_pool_sz, int, 0444); + +CONFIG_PARAM(min_eioc_pool_sz); +module_param(min_eioc_pool_sz, int, 0444); + +CONFIG_PARAM(max_eioc_pool_sz); +module_param(max_eioc_pool_sz, int, 0444); + +CONFIG_PARAM(min_host_kick_timeout); +module_param(min_host_kick_timeout, int, 0444); + +CONFIG_PARAM(max_host_kick_timeout); +module_param(max_host_kick_timeout, int, 0444); + +CONFIG_PARAM(min_host_kick_entries); +module_param(min_host_kick_entries, int, 0444); + +CONFIG_PARAM(max_host_kick_entries); +module_param(max_host_kick_entries, int, 0444); + +CONFIG_PARAM(min_host_kick_bytes); +module_param(min_host_kick_bytes, int, 0444); + +CONFIG_PARAM(max_host_kick_bytes); +module_param(max_host_kick_bytes, int, 0444); + +CONFIG_PARAM(min_host_update_sz); +module_param(min_host_update_sz, int, 0444); + +CONFIG_PARAM(max_host_update_sz); +module_param(max_host_update_sz, int, 0444); + +CONFIG_PARAM(min_eioc_update_sz); +module_param(min_eioc_update_sz, int, 0444); + +CONFIG_PARAM(max_eioc_update_sz); +module_param(max_eioc_update_sz, int, 0444); + +CONFIG_PARAM(notify_bundle_sz); +module_param(notify_bundle_sz, int, 0444); + +CONFIG_PARAM(viport_stats_interval); +CONFIG_PARAM(viport_hb_interval); +CONFIG_PARAM(viport_hb_timeout); +CONFIG_PARAM(control_rsp_timeout); +CONFIG_PARAM(control_req_retry_count); + +/* Infiniband connection values */ +CONFIG_PARAM(retry_count); +module_param(retry_count, int, 0444); + +MODULE_PARM_DESC(retry_count, + "number of errors that sender receives" + " before posting completion error. min:0 max:7"); + +CONFIG_PARAM(sa_path_rec_get_timeout); +module_param(sa_path_rec_get_timeout, int, 0444); +MODULE_PARM_DESC(sa_path_rec_get_timeout, + "Time out value in milliseconds to be used in" + " SA path record get queries"); + +CONFIG_PARAM(min_rnr_timer); + +CONFIG_PARAM(default_viports_per_netpath); +CONFIG_PARAM(max_viports_per_netpath); + +CONFIG_PARAM(default_pkey); +module_param(default_pkey, int, 0444); + +CONFIG_PARAM(default_no_path_timeout); +CONFIG_PARAM(default_primary_connect_timeout); +CONFIG_PARAM(default_primary_reconnect_timeout); +CONFIG_PARAM(default_primary_switch_timeout); +CONFIG_PARAM(default_prefer_primary); + +module_param(default_prefer_primary, int, 0444); +module_param(default_no_path_timeout, int, 0444); +module_param(default_primary_reconnect_timeout, int, 0444); +module_param(default_primary_switch_timeout, int, 0444); + +CONFIG_PARAM(use_rx_csum); +module_param(use_rx_csum, int, 0444); + +CONFIG_PARAM(use_tx_csum); +module_param(use_tx_csum, int, 0444); + +static void config_control_defaults(struct control_config *control_config, + struct path_param *params) +{ + int len; + char *dot; + __be64 sid; + + /* extracting the service id from the IOC guid */ + sid = 0x10LL << 56 | + 0x00LL << 48 | + 0x06LL << 40 | + 0x6aLL << 32 | + 0x00LL << 24 | + 0x00LL << 16 | + 0x00LL << 8 | ((be64_to_cpu(params->ioc_guid) >> 32) & 0xFF); + + control_config->ib_config.service_id = cpu_to_be64(sid); + + control_config->ib_config.conn_data.path_id = 0; + control_config->ib_config.conn_data.vnic_instance = params->instance; + control_config->ib_config.conn_data.path_num = 0; + dot = strchr(system_utsname.nodename, '.'); + if (dot != NULL) { + len = dot - system_utsname.nodename; + } else { + len = strlen(system_utsname.nodename); + } + memcpy(control_config->ib_config.conn_data.nodename, + system_utsname.nodename, len); + + control_config->ib_config.retry_count = retry_count; + control_config->ib_config.rnr_retry_count = retry_count; + control_config->ib_config.min_rnr_timer = min_rnr_timer; + + control_config->ib_config.num_recvs = 5; /* not configurable */ + control_config->ib_config.num_sends = 1; /* not configurable */ + control_config->ib_config.recv_scatter = 1; /* not configurable */ + control_config->ib_config.send_gather = 1; /* not configurable */ + + control_config->num_recvs = control_config->ib_config.num_recvs; + + control_config->vnic_instance = params->instance; + control_config->max_address_entries = max_address_entries; + control_config->min_address_entries = min_address_entries; + control_config->req_retry_count = control_req_retry_count; + control_config->rsp_timeout = CONV2JIFFIES(control_rsp_timeout); + + return; +} + +static void config_data_defaults(struct data_config *data_config, + struct path_param *params) +{ + __be64 sid; + + /* extracting the service id from the IOC guid */ + sid = 0x10LL << 56 | + 0x00LL << 48 | + 0x06LL << 40 | + 0x6aLL << 32 | + 0x00LL << 24 | + 0x00LL << 16 | + 0x01LL << 8 | ((be64_to_cpu(params->ioc_guid) >> 32) & 0xFF); + + data_config->ib_config.service_id = cpu_to_be64(sid); + + data_config->ib_config.conn_data.path_id = jiffies; /* random */ + data_config->ib_config.conn_data.vnic_instance = params->instance; + data_config->ib_config.conn_data.path_num = 0; + + data_config->ib_config.retry_count = retry_count; + data_config->ib_config.rnr_retry_count = retry_count; + data_config->ib_config.min_rnr_timer = min_rnr_timer; + + /* + * NOTE: the num_recvs size assumes that the EIOC could + * RDMA enough packets to fill all of the host recv + * pool entries, plus send a kick message after each + * packet, plus RDMA new buffers for the size of + * the EIOC recv buffer pool, plus send kick messages + * after each min_host_update_sz of new buffers all + * before the host can even pull off the first completed + * receive off the completion queue, and repost the + * receive. NOT LIKELY! + */ + data_config->ib_config.num_recvs = host_recv_pool_entries + + (max_eioc_pool_sz / min_host_update_sz); +#if defined(LIMIT_OUTSTANDING_SENDS) + data_config->ib_config.num_sends = (2 * notify_bundle_sz) + + (host_recv_pool_entries / min_eioc_update_sz) + 1; +#else /* !defined(LIMIT_OUTSTANDING_SENDS) */ + /* + * NOTE: the num_sends size assumes that the HOST could + * post RDMA sends for every single buffer in the eiocs + * receive pool, and allocate a full complement of + * receive buffers on the host, and RDMA free buffers + * every min_eioc_update_sz entries all before the HCA + * can complete a single RDMA transfer. VERY UNLIKELY, + * BUT NOT COMPLETELY IMPOSSIBLE IF THERE IS AN IB + * PROBLEM! + */ + data_config->ib_config.num_sends = max_eioc_pool_sz + + (host_recv_pool_entries / min_eioc_update_sz) + 1; +#endif /* !defined(LIMIT_OUTSTANDING_SENDS) */ + + data_config->ib_config.recv_scatter = 1; /* not configurable */ + data_config->ib_config.send_gather = 2; /* not configurable */ + + data_config->num_recvs = data_config->ib_config.num_recvs; + data_config->path_id = data_config->ib_config.conn_data.path_id; + + data_config->host_min.size_recv_pool_entry = + BUFFER_SIZE(ETH_VLAN_HLEN + min_mtu); + data_config->host_max.size_recv_pool_entry = + BUFFER_SIZE(ETH_VLAN_HLEN + max_mtu); + data_config->eioc_min.size_recv_pool_entry = + BUFFER_SIZE(ETH_VLAN_HLEN + min_mtu); + data_config->eioc_max.size_recv_pool_entry = MAX_PARAM_VALUE; + + data_config->host_recv_pool_entries = host_recv_pool_entries; + + data_config->host_min.num_recv_pool_entries = min_host_pool_sz; + data_config->host_max.num_recv_pool_entries = MAX_PARAM_VALUE; + data_config->eioc_min.num_recv_pool_entries = min_eioc_pool_sz; + data_config->eioc_max.num_recv_pool_entries = max_eioc_pool_sz; + + data_config->host_min.timeout_before_kick = min_host_kick_timeout; + data_config->host_max.timeout_before_kick = max_host_kick_timeout; + data_config->eioc_min.timeout_before_kick = 0; + data_config->eioc_max.timeout_before_kick = MAX_PARAM_VALUE; + + data_config->host_min.num_recv_pool_entries_before_kick = + min_host_kick_entries; + data_config->host_max.num_recv_pool_entries_before_kick = + max_host_kick_entries; + data_config->eioc_min.num_recv_pool_entries_before_kick = 0; + data_config->eioc_max.num_recv_pool_entries_before_kick = + MAX_PARAM_VALUE; + + data_config->host_min.num_recv_pool_bytes_before_kick = + min_host_kick_bytes; + data_config->host_max.num_recv_pool_bytes_before_kick = + max_host_kick_bytes; + data_config->eioc_min.num_recv_pool_bytes_before_kick = 0; + data_config->eioc_max.num_recv_pool_bytes_before_kick = MAX_PARAM_VALUE; + + data_config->host_min.free_recv_pool_entries_per_update = + min_host_update_sz; + data_config->host_max.free_recv_pool_entries_per_update = + max_host_update_sz; + data_config->eioc_min.free_recv_pool_entries_per_update = + min_eioc_update_sz; + data_config->eioc_max.free_recv_pool_entries_per_update = + max_eioc_update_sz; + + data_config->notify_bundle = notify_bundle_sz; + + return; +} + +static void config_path_info_defaults(struct viport_config *config, + struct path_param *params) +{ + int i; + ib_get_cached_gid(config->ibdev, config->port, 0, + &config->path_info.path.sgid); + for (i = 0; i < 16; i++) { + config->path_info.path.dgid.raw[i] = params->dgid[i]; + } + config->path_info.path.pkey = params->pkey; + config->path_info.path.numb_path = 1; + config->sa_path_rec_get_timeout = sa_path_rec_get_timeout; + +} + +static BOOLEAN config_viport_defaults(struct viport_config *config, + struct path_param *params) +{ + config->ibdev = params->ibdev; + config->port = params->port; + config->guid = params->ioc_guid; + config->stats_interval = CONV2JIFFIES(viport_stats_interval); + config->hb_interval = CONV2JIFFIES(viport_hb_interval); + config->hb_timeout = CONV2USEC(viport_hb_timeout); + + config_path_info_defaults(config, params); + + config_control_defaults(&config->control_config, params); + config_data_defaults(&config->data_config, params); + return TRUE; +} + +static void config_vnic_defaults(struct vnic_config *config) +{ + config->no_path_timeout = CONV2JIFFIES(default_no_path_timeout); + config->primary_connect_timeout = + CONV2JIFFIES(default_primary_connect_timeout); + config->primary_reconnect_timeout = + CONV2JIFFIES(default_primary_reconnect_timeout); + config->primary_switch_timeout = + CONV2JIFFIES(default_primary_switch_timeout); + config->prefer_primary = default_prefer_primary; + config->use_rx_csum = use_rx_csum; + config->use_tx_csum = use_tx_csum; + return; +} + +static BOOLEAN config_is_valid(struct viport_config *config) +{ + /* TBD: */ + return TRUE; +} + +struct viport_config *config_alloc_viport(struct path_param *params) +{ + struct viport_config *config; + + config = + (struct viport_config *)kmalloc(sizeof(struct viport_config), + GFP_KERNEL); + if (!config) { + CONFIG_ERROR + ("couldn't allocate memory for struct viport_config\n"); + goto failure; + } + memset(config, '\0', sizeof(struct viport_config)); + + if (!config_viport_defaults(config, params)) { + goto failure; + } + + /* TBD: overrides go in here */ + + if (!config_is_valid(config)) { + CONFIG_ERROR("viport configuration is invalid\n"); + goto failure; + } + + return config; +failure: + if (config) { + kfree(config); + } + return NULL; +} + +void config_free_viport(struct viport_config *config) +{ + if (config) { + kfree(config); + } + return; +} + +struct vnic_config *config_alloc_vnic(void) +{ + struct vnic_config *config; + + config = + (struct vnic_config *)kmalloc(sizeof(struct vnic_config), + GFP_KERNEL); + if (!config) { + CONFIG_ERROR("couldn't allocate memory for" + " struct vnic_config\n"); + goto failure; + } + memset(config, '\0', sizeof(struct vnic_config)); + + config_vnic_defaults(config); + + /* TBD: vnic overrides here */ + + return config; +failure: + if (config) { + kfree(config); + } + return NULL; +} + +void config_free_vnic(struct vnic_config *config) +{ + if (config) { + kfree(config); + } + return; +} + +char *config_viport_name(struct viport_config *config) +{ + /* function only called by one thread, can return a static string */ + static char str[92]; + + sprintf(str, "GUID %llx instance %d", + ntoh64(config->guid), config->control_config.vnic_instance); + return str; +} + +static int __power_of_2_check(u32 num, char *param_name) +{ + if (!is_power_of2(num)) { + CONFIG_ERROR("param %s must be a power of 2\n", param_name); + return 0; + } + + return 1; +} + +static int __less_than_check(u32 lo, u32 hi, char *lo_name, char *hi_name) +{ + if (lo >= hi) { + CONFIG_ERROR("param %s must be less than %s\n", + lo_name, hi_name); + return 0; + } + + return 1; +} + +static int __less_or_equal_check(u32 lo, u32 hi, char *lo_name, char *hi_name) +{ + if (lo > hi) { + CONFIG_ERROR("param %s cannot be greater than %s \n", + lo_name, hi_name); + return 0; + } + + return 1; +} + +static int __range_check(u32 num, u32 min, u32 max, char *param_name) +{ + if ((num < min) || (num > max)) { + CONFIG_ERROR("param %s must be between %d and %d\n", + param_name, min, max); + return 0; + } + + return 1; +} + +void config_cleanup(void) +{ + /* nothing to do here */ + return; +} + +BOOLEAN config_start(void) +{ + DEFAULT_PARAM(max_address_entries, MAX_ADDRESS_ENTRIES); + DEFAULT_PARAM(min_address_entries, MIN_ADDRESS_ENTRIES); + DEFAULT_PARAM(min_mtu, MIN_MTU); + DEFAULT_PARAM(max_mtu, MAX_MTU); + DEFAULT_PARAM(host_recv_pool_entries, HOST_RECV_POOL_ENTRIES); + DEFAULT_PARAM(min_host_pool_sz, MIN_HOST_POOL_SZ); + DEFAULT_PARAM(min_eioc_pool_sz, MIN_EIOC_POOL_SZ); + DEFAULT_PARAM(max_eioc_pool_sz, MAX_EIOC_POOL_SZ); + DEFAULT_PARAM(min_host_kick_timeout, MIN_HOST_KICK_TIMEOUT); + DEFAULT_PARAM(max_host_kick_timeout, MAX_HOST_KICK_TIMEOUT); + DEFAULT_PARAM(min_host_kick_entries, MIN_HOST_KICK_ENTRIES); + DEFAULT_PARAM(max_host_kick_entries, MAX_HOST_KICK_ENTRIES); + DEFAULT_PARAM(min_host_kick_bytes, MIN_HOST_KICK_BYTES); + DEFAULT_PARAM(max_host_kick_bytes, MAX_HOST_KICK_BYTES); + DEFAULT_PARAM(min_host_update_sz, MIN_HOST_UPDATE_SZ); + DEFAULT_PARAM(max_host_update_sz, MAX_HOST_UPDATE_SZ); + DEFAULT_PARAM(min_eioc_update_sz, MIN_EIOC_UPDATE_SZ); + DEFAULT_PARAM(max_eioc_update_sz, MAX_EIOC_UPDATE_SZ); + DEFAULT_PARAM(notify_bundle_sz, NOTIFY_BUNDLE_SZ); + DEFAULT_PARAM(viport_stats_interval, VIPORT_STATS_INTERVAL); + DEFAULT_PARAM(viport_hb_interval, VIPORT_HEARTBEAT_INTERVAL); + DEFAULT_PARAM(viport_hb_timeout, VIPORT_HEARTBEAT_TIMEOUT); + DEFAULT_PARAM(control_rsp_timeout, CONTROL_RSP_TIMEOUT); + DEFAULT_PARAM(control_req_retry_count, CONTROL_REQ_RETRY_COUNT); + DEFAULT_PARAM(retry_count, RETRY_COUNT); + DEFAULT_PARAM(min_rnr_timer, MIN_RNR_TIMER); + DEFAULT_PARAM(sa_path_rec_get_timeout, SA_PATH_REC_GET_TIMEOUT); + DEFAULT_PARAM(default_viports_per_netpath, DEFAULT_VIPORTS_PER_NETPATH); + DEFAULT_PARAM(max_viports_per_netpath, MAX_VIPORTS_PER_NETPATH); + DEFAULT_PARAM(default_pkey, DEFAULT_PKEY); + DEFAULT_PARAM(default_no_path_timeout, DEFAULT_NO_PATH_TIMEOUT); + DEFAULT_PARAM(default_primary_connect_timeout, DEFAULT_PRI_CON_TIMEOUT); + DEFAULT_PARAM(default_primary_reconnect_timeout, + DEFAULT_PRI_RECON_TIMEOUT); + DEFAULT_PARAM(default_primary_switch_timeout, + DEFAULT_PRI_SWITCH_TIMEOUT); + DEFAULT_PARAM(default_prefer_primary, DEFAULT_PREFER_PRIMARY); + DEFAULT_PARAM(use_rx_csum, VNIC_USE_RX_CSUM); + DEFAULT_PARAM(use_tx_csum, VNIC_USE_TX_CSUM); + + if (! u32_range_check(max_address_entries)) + goto failure; + + if (! u32_range_check(min_address_entries)) + goto failure; + + if (! range_check(min_mtu, MIN_MTU, MAX_MTU)) + goto failure; + + if (! range_check(max_mtu, MIN_MTU, MAX_MTU)) + goto failure; + + if (! u32_range_check(host_recv_pool_entries)) + goto failure; + + if (! u32_range_check(min_host_pool_sz)) + goto failure; + + if (! u32_range_check(min_eioc_pool_sz)) + goto failure; + + if (! u32_range_check(max_eioc_pool_sz)) + goto failure; + + if (! u32_zero_range_check(min_host_kick_timeout)) + goto failure; + + if (! u32_zero_range_check(max_host_kick_timeout)) + goto failure; + + if (! u32_zero_range_check(min_host_kick_entries)) + goto failure; + + if (! u32_zero_range_check(max_host_kick_entries)) + goto failure; + + if (! u32_zero_range_check(min_host_kick_bytes)) + goto failure; + + if (! u32_zero_range_check(max_host_kick_bytes)) + goto failure; + + if (! u32_range_check(min_host_update_sz)) + goto failure; + + if (! u32_range_check(max_host_update_sz)) + goto failure; + + if (! u32_range_check(min_eioc_update_sz)) + goto failure; + + if (! u32_range_check(max_eioc_update_sz)) + goto failure; + + if (! u8_range_check(notify_bundle_sz)) + goto failure; + + if (! u32_zero_range_check(viport_stats_interval)) + goto failure; + + if (! u32_zero_range_check(viport_hb_interval)) + goto failure; + + if (! u32_zero_range_check(viport_hb_timeout)) + goto failure; + + if (! u32_range_check(control_rsp_timeout)) + goto failure; + + if (! u8_range_check(control_req_retry_count)) + goto failure; + + if (! range_check(sa_path_rec_get_timeout, + MIN_SA_TIMEOUT, MAX_SA_TIMEOUT)) + goto failure; + + if (! range_check(retry_count, 0, 7)) + goto failure; + + if (! range_check(min_rnr_timer, 0, 31)) + goto failure; + + if (! u32_range_check(default_viports_per_netpath)) + goto failure; + + if (! u8_range_check(max_viports_per_netpath)) + goto failure; + + if (! u16_zero_range_check(default_pkey)) + goto failure; + + if (! u32_range_check(default_no_path_timeout)) + goto failure; + + if (! u32_range_check(default_primary_connect_timeout)) + goto failure; + + if (! u32_range_check(default_primary_reconnect_timeout)) + goto failure; + + if (! u32_range_check(default_primary_switch_timeout)) + goto failure; + + if (! boolean_range_check(default_prefer_primary)) + goto failure; + + if (! boolean_range_check(use_rx_csum)) + goto failure; + + if (! boolean_range_check(use_tx_csum)) + goto failure; + + if (! less_or_equal_check(min_address_entries, max_address_entries)) + goto failure; + + if (! less_or_equal_check(min_mtu, max_mtu)) + goto failure; + + if (! less_or_equal_check(min_host_pool_sz, host_recv_pool_entries)) + goto failure; + + if (! power_of_2_check(host_recv_pool_entries)) + goto failure; + + if (! power_of_2_check(min_host_pool_sz)) + goto failure; + + if (! power_of_2_check(notify_bundle_sz)) + goto failure; + + if (! less_than_check(notify_bundle_sz, min_eioc_pool_sz)) + goto failure; + + if (! less_or_equal_check(min_eioc_pool_sz, max_eioc_pool_sz)) + goto failure; + + if (! power_of_2_check(min_eioc_pool_sz)) + goto failure; + + if (! power_of_2_check(max_eioc_pool_sz)) + goto failure; + + if (! less_or_equal_check(min_host_kick_timeout, max_host_kick_timeout)) + goto failure; + + if (! less_or_equal_check(min_host_kick_entries, max_host_kick_entries)) + goto failure; + + if (! less_or_equal_check(min_host_kick_bytes, max_host_kick_bytes)) + goto failure; + + if (! less_or_equal_check(min_host_update_sz, max_host_update_sz)) + goto failure; + + if (! power_of_2_check(min_host_update_sz)) + goto failure; + + if (! power_of_2_check(max_host_update_sz)) + goto failure; + + if (! less_than_check(min_host_update_sz, min_host_pool_sz)) + goto failure; + + if (! less_than_check(max_host_update_sz, host_recv_pool_entries)) + goto failure; + + if (! less_or_equal_check(min_eioc_update_sz, max_eioc_update_sz)) + goto failure; + + if (! power_of_2_check(min_eioc_update_sz)) + goto failure; + + if (! power_of_2_check(max_eioc_update_sz)) + goto failure; + + if (! less_than_check(min_eioc_update_sz, min_eioc_pool_sz)) + goto failure; + + if (! less_than_check(max_eioc_update_sz, max_eioc_pool_sz)) + goto failure; + + if (! less_or_equal_check(default_viports_per_netpath, + max_viports_per_netpath)) + goto failure; + + return TRUE; +failure: + return FALSE; +} diff --git a/drivers/infiniband/ulp/vnic/vnic_config.h b/drivers/infiniband/ulp/vnic/vnic_config.h new file mode 100644 index 0000000..88b5c44 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_config.h @@ -0,0 +1,215 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_CONFIG_H_INCLUDED +#define VNIC_CONFIG_H_INCLUDED + +#include +#include + +#include "vnic_control.h" +#include "vnic_ib.h" + +/* these are hard, compile time limits. + * lower runtime overrides may be in effect + */ + +#define VNIC_CLASS_SUBCLASS 0x2000066A +#define VNIC_PROTOCOL 0 +#define VNIC_PROT_VERSION 1 + +#define MAX_ADDRESS_ENTRIES 64 /* TBD: arbitrary */ +#define MIN_ADDRESS_ENTRIES 16 /* TBD: arbitrary */ + +#define MIN_MTU 1500 /* minimum negotiated MTU size */ +#define MAX_MTU 9500 /* jumbo frame */ +#define ETH_VLAN_HLEN 18 + +#define HOST_RECV_POOL_ENTRIES 512 /* TBD: abritrary */ +#define MIN_HOST_POOL_SZ 64 /* TBD: abritrary */ +#define MIN_EIOC_POOL_SZ 64 /* TBD: abritrary */ +#define MAX_EIOC_POOL_SZ 256 /* TBD: abritrary */ + +#define MIN_HOST_KICK_TIMEOUT 10 /* TBD: arbitrary */ +#define MAX_HOST_KICK_TIMEOUT 100 /* in u_sec */ + +#define MIN_HOST_KICK_ENTRIES 1 /* TBD: arbitrary */ +#define MAX_HOST_KICK_ENTRIES 128 /* TBD: arbitrary */ + +#define MIN_HOST_KICK_BYTES 0 +#define MAX_HOST_KICK_BYTES 5000 + +#define MIN_HOST_UPDATE_SZ 8 /* TBD: arbitrary */ +#define MAX_HOST_UPDATE_SZ 32 /* TBD: arbitrary */ +#define MIN_EIOC_UPDATE_SZ 8 /* TBD: arbitrary */ +#define MAX_EIOC_UPDATE_SZ 32 /* TBD: arbitrary */ + +#define NOTIFY_BUNDLE_SZ 32 + +#define MAX_PARAM_VALUE 0x40000000 + +#define DEFAULT_VIPORTS_PER_NETPATH 1 +#define MAX_VIPORTS_PER_NETPATH 1 + +#define VNIC_USE_RX_CSUM TRUE +#define VNIC_USE_TX_CSUM TRUE +#define DEFAULT_NO_PATH_TIMEOUT 1000 /* TBD: arbitrary */ +#define DEFAULT_PRI_CON_TIMEOUT 1000 /* TBD: arbitrary */ +#define DEFAULT_PRI_RECON_TIMEOUT 1000 /* TBD: arbitrary */ +#define DEFAULT_PRI_SWITCH_TIMEOUT 1000 /* TBD: arbitrary */ +#define DEFAULT_PREFER_PRIMARY FALSE + +#define VIPORT_STATS_INTERVAL 50 /* .5 sec */ +#define VIPORT_HEARTBEAT_INTERVAL 100 /* 1 second */ +#define VIPORT_HEARTBEAT_TIMEOUT 6400 /* 64 sec */ +#define CONTROL_REQ_RETRY_COUNT 4 +#define CONTROL_RSP_TIMEOUT 100 /* 1 sec */ + +/* infiniband connection parameters */ +#define RETRY_COUNT 3 +#define MIN_RNR_TIMER 22 /* 20 ms */ +#define DEFAULT_PKEY 0 /* pkey table index */ + +#define SA_PATH_REC_GET_TIMEOUT 1000 /* 1000 ms */ +#define MIN_SA_TIMEOUT 100 /* 100 ms */ +#define MAX_SA_TIMEOUT 20000 /* 20s */ + +struct path_param { + __be64 ioc_guid; + u8 port; + u8 instance; + struct ib_device *ibdev; + struct vnic_ib_port *ibport; + char name[IFNAMSIZ]; + u8 dgid[16]; + __be16 pkey; + u64 rx_csum; + u64 tx_csum; + u64 heartbeat; +}; + +struct ib_config { + __be64 service_id; + struct vnic_connection_data conn_data; + u32 retry_count; + u32 rnr_retry_count; + u8 min_rnr_timer; + u32 num_sends; + u32 num_recvs; + u32 recv_scatter; /* 1 */ + u32 send_gather; /* 1 or 2 */ + u32 overrides; +}; + +struct control_config { + struct ib_config ib_config; + u32 num_recvs; + u8 vnic_instance; + u16 max_address_entries; + u16 min_address_entries; + u32 rsp_timeout; + u8 req_retry_count; + u32 overrides; +}; + +struct data_config { + struct ib_config ib_config; + u64 path_id; + u32 num_recvs; + u32 host_recv_pool_entries; + struct vnic_recv_pool_config host_min; + struct vnic_recv_pool_config host_max; + struct vnic_recv_pool_config eioc_min; + struct vnic_recv_pool_config eioc_max; + u32 notify_bundle; + u32 overrides; +}; + +struct viport_config { + struct viport *viport; + struct control_config control_config; + struct data_config data_config; + struct ib_path_info path_info; + u32 sa_path_rec_get_timeout; + struct ib_device *ibdev; + u32 port; + u32 stats_interval; + u32 hb_interval; + u32 hb_timeout; + u64 port_guid; + u64 guid; + size_t path_idx; + char ioc_string[512 / 8 + 1]; +#define HB_INTERVAL_OVERRIDE 0x1 +#define GUID_OVERRIDE 0x2 +#define STRING_OVERRIDE 0x4 +#define HCA_OVERRIDE 0x8 +#define PORT_OVERRIDE 0x10 +#define PORTGUID_OVERRIDE 0x20 + u32 overrides; +}; + +/* + * primary_connect_timeout - if the secondary connects first, how long do we + * give the primary? + * primary_reconnect_timeout - same as above, but used when recovering when + * both paths fail + * primary_reconnect_timeout - how long do we wait before switching to the + * primary when it comes back? + */ +struct vnic_config { + struct vnic *vnic; + char name[IFNAMSIZ]; + u32 no_path_timeout; + u32 primary_connect_timeout; + u32 primary_reconnect_timeout; + u32 primary_switch_timeout; + int prefer_primary; + BOOLEAN use_rx_csum; + BOOLEAN use_tx_csum; +#define USE_RX_CSUM_OVERRIDE 0x1 +#define USE_TX_CSUM_OVERRIDE 0x2 + u32 overrides; +}; + +BOOLEAN config_start(void); +void config_cleanup(void); + +struct viport_config *config_alloc_viport(struct path_param *params); +void config_free_viport(struct viport_config *config); + +struct vnic_config *config_alloc_vnic(void); +void config_free_vnic(struct vnic_config *config); + +char *config_viport_name(struct viport_config *config); + +#endif /* VNIC_CONFIG_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:10:27 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:40:27 +0530 Subject: [openib-general] [PATCH 8/10] sysfs interface implementation Message-ID: <4521BF8B.3021.4E46A8AA@rkuchimanchi.silverstorm.com> Adds the files that implement the sysfs interface of the driver. Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_sys.c | 1118 ++++++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/vnic_sys.h | 51 + 2 files changed, 1169 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_sys.c b/drivers/infiniband/ulp/vnic/vnic_sys.c new file mode 100644 index 0000000..052783e --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_sys.c @@ -0,0 +1,1118 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include + +#ifdef CONFIG_INFINIBAND_VNIC_STATS +#include +#endif + +#include "vnic_util.h" +#include "vnic_config.h" +#include "vnic_ib.h" +#include "vnic_viport.h" +#include "vnic_main.h" + +extern struct list_head vnic_list; + +/* + * target eiocs are added by writing + * + * ioc_guid=,dgid=,pkey=,name= + * to the create_primary sysfs attribute. + */ +enum { + VNIC_OPT_ERR = 0, + VNIC_OPT_IOC_GUID = 1 << 0, + VNIC_OPT_DGID = 1 << 1, + VNIC_OPT_PKEY = 1 << 2, + VNIC_OPT_NAME = 1 << 3, + VNIC_OPT_INSTANCE = 1 << 4, + VNIC_OPT_RXCSUM = 1 << 5, + VNIC_OPT_TXCSUM = 1 << 6, + VNIC_OPT_HEARTBEAT = 1 << 7, + VNIC_OPT_ALL = (VNIC_OPT_IOC_GUID | + VNIC_OPT_DGID | VNIC_OPT_NAME | VNIC_OPT_PKEY), +}; + +static match_table_t vnic_opt_tokens = { + {VNIC_OPT_IOC_GUID, "ioc_guid=%s"}, + {VNIC_OPT_DGID, "dgid=%s"}, + {VNIC_OPT_PKEY, "pkey=%x"}, + {VNIC_OPT_NAME, "name=%s"}, + {VNIC_OPT_INSTANCE, "instance=%d"}, + {VNIC_OPT_RXCSUM, "rx_csum=%s"}, + {VNIC_OPT_TXCSUM, "tx_csum=%s"}, + {VNIC_OPT_HEARTBEAT, "heartbeat=%d"}, + {VNIC_OPT_ERR, NULL} +}; + +static void vnic_release_class_dev(struct class_device *class_dev) +{ + struct class_dev_info *cdev_info = + container_of(class_dev, struct class_dev_info, class_dev); + + complete(&cdev_info->released); + +} + +struct class vnic_class = { + .name = "infiniband_vnic", + .release = vnic_release_class_dev +}; + +struct class_dev_info interface_cdev; + +static int vnic_parse_options(const char *buf, struct path_param *param) +{ + char *options, *sep_opt; + char *p; + char dgid[3]; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + sep_opt = options; + while ((p = strsep(&sep_opt, ",")) != NULL) { + if (!*p) + continue; + + token = match_token(p, vnic_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case VNIC_OPT_IOC_GUID: + p = match_strdup(args); + param->ioc_guid = cpu_to_be64(simple_strtoull(p, NULL, + 16)); + kfree(p); + break; + + case VNIC_OPT_DGID: + p = match_strdup(args); + if (strlen(p) != 32) { + printk(KERN_WARNING PFX + "bad dest GID parameter '%s'\n", p); + kfree(p); + goto out; + } + + for (i = 0; i < 16; ++i) { + strlcpy(dgid, p + i * 2, 3); + param->dgid[i] = simple_strtoul(dgid, NULL, 16); + + } + kfree(p); + break; + + case VNIC_OPT_PKEY: + if (match_hex(args, &token)) { + printk(KERN_WARNING PFX + "bad P_key parameter '%s'\n", p); + goto out; + } + param->pkey = cpu_to_be16(token); + break; + + case VNIC_OPT_NAME: + p = match_strdup(args); + if (strlen(p) >= IFNAMSIZ) { + printk(KERN_WARNING PFX + "interface name parameter too long\n"); + kfree(p); + goto out; + } + strcpy(param->name, p); + kfree(p); + break; + case VNIC_OPT_INSTANCE: + if (match_int(args, &token)) { + printk(KERN_WARNING PFX + "bad instance parameter '%s'\n", p); + goto out; + } + + if (token > 255 || token < 0) { + printk(KERN_WARNING PFX + "instance parameter must be" + " > 0 and <= 255\n"); + goto out; + } + + param->instance = token; + break; + case VNIC_OPT_RXCSUM: + p = match_strdup(args); + if (!strncmp(p, "true", 4)) + param->rx_csum = 1; + else if (!strncmp(p, "false", 5)) + param->rx_csum = 0; + else { + printk(KERN_WARNING PFX + "bad rx_csum parameter." + " must be 'true' or 'false'\n"); + kfree(p); + goto out; + } + kfree(p); + break; + case VNIC_OPT_TXCSUM: + p = match_strdup(args); + if (!strncmp(p, "true", 4)) + param->tx_csum = 1; + else if (!strncmp(p, "false", 5)) + param->tx_csum = 0; + else { + printk(KERN_WARNING PFX + "bad tx_csum parameter." + " must be 'true' or 'false'\n"); + kfree(p); + goto out; + } + kfree(p); + break; + case VNIC_OPT_HEARTBEAT: + if (match_int(args, &token)) { + printk(KERN_WARNING PFX + "bad instance parameter '%s'\n", p); + goto out; + } + + if (token > 6000 || token < 0) { + printk(KERN_WARNING PFX + "heartbeat parameter must be" + " > 0 and <= 6000\n"); + goto out; + } + param->heartbeat = token; + break; + default: + printk(KERN_WARNING PFX + "unknown parameter or missing value " + "'%s' in target creation request\n", p); + goto out; + } + + } + + if ((opt_mask & VNIC_OPT_ALL) == VNIC_OPT_ALL) + ret = 0; + else + for (i = 0; i < ARRAY_SIZE(vnic_opt_tokens); ++i) + if ((vnic_opt_tokens[i].token & VNIC_OPT_ALL) && + !(vnic_opt_tokens[i].token & opt_mask)) + printk(KERN_WARNING PFX + "target creation request is " + "missing parameter '%s'\n", + vnic_opt_tokens[i].pattern); + +out: + kfree(options); + return ret; + +} + +static ssize_t show_vnic_state(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, class_dev_info); + switch (vnic->state) { + case VNIC_UNINITIALIZED: + return sprintf(buf, "VNIC_UNINITIALIZED\n"); + case VNIC_REGISTERED: + return sprintf(buf, "VNIC_REGISTERED\n"); + default: + return sprintf(buf, "INVALID STATE\n"); + } + +} + +static CLASS_DEVICE_ATTR(vnic_state, S_IRUGO, show_vnic_state, NULL); + +static ssize_t show_rx_csum(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, class_dev_info); + + if (vnic->config->use_rx_csum) + return sprintf(buf, "true\n"); + else + return sprintf(buf, "false\n"); +} + +static CLASS_DEVICE_ATTR(rx_csum, S_IRUGO, show_rx_csum, NULL); + +static ssize_t show_tx_csum(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, class_dev_info); + + if (vnic->config->use_tx_csum) + return sprintf(buf, "true\n"); + else + return sprintf(buf, "false\n"); +} + +static CLASS_DEVICE_ATTR(tx_csum, S_IRUGO, show_tx_csum, NULL); + +static ssize_t show_current_path(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, class_dev_info); + + if (vnic->current_path == &vnic->primary_path) + return sprintf(buf, "primary path\n"); + else if (vnic->current_path == &vnic->secondary_path) + return sprintf(buf, "secondary path\n"); + else + return sprintf(buf, "none\n"); + +} + +#ifdef CONFIG_INFINIBAND_VNIC_STATS +u32 clock_rate = 800; +module_param(clock_rate, int, 0444); + +#define CYCLES_TO_NANOSEC(x) \ +{ \ + x *= 1000; \ + do_div(x, clock_rate); \ +} + +/* + * TODO: Statistics reporting for control path, data path, + * RDMA times, IOs etc + * + */ +static int avg_ticks_as_time(cycles_t ticks, u32 count, char *buffer) +{ + unsigned long long average = ticks; + unsigned long long remainder = 0; + + if (count == 0) { + return sprintf(buffer, "[NA]\n"); + } else if (count != 1) { + do_div(average, count); + } + + CYCLES_TO_NANOSEC(average); + + if (average > 1000000000) { + remainder = average; + do_div(average, 1000000000); + remainder -= average * 1000000000; + do_div(remainder, 1000000); + if (average > 60) { + u32 days, hours, minutes; + + days = (u32)average / ((60 * 60) *24); + average -= days * ((60 * 60) * 24); + hours = (u32)average / (60 * 60); + average -= hours * (60 * 60); + minutes = (u32)average / 60; + average -= minutes * 60; + if (days != 0) { + return sprintf(buffer, "%d days, %d:%02d\n", + days, hours, minutes); + } else { + return sprintf(buffer, "%d:%02d\n", + hours, minutes); + } + } else { + return sprintf(buffer, "%"PRId64".%03"PRId64" sec\n", + average, remainder); + } + } else if (average > 1000000) { + remainder = average; + do_div(average, 1000000); + remainder -= average * 1000000; + do_div(remainder, 1000); + return sprintf(buffer, "%"PRId64".%03"PRId64" msec\n", + average,remainder); + } else if (average > 1000) { + remainder = average; + do_div(average, 1000); + remainder -= average * 1000; + return sprintf(buffer, "%"PRId64".%03"PRId64" usec\n", + average,remainder); + } else { + return sprintf(buffer, "%"PRId64" nanosec\n",average); + } + + return 0; +} + +static ssize_t show_lifetime(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + cycles_t time = get_cycles() - vnic->statistics.start_time; + + return avg_ticks_as_time(time, 1, buf); +} + +static CLASS_DEVICE_ATTR(lifetime, S_IRUGO, show_lifetime, NULL); + +static ssize_t show_conntime(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + if (vnic->statistics.conn_time) + return avg_ticks_as_time(vnic->statistics.conn_time, + 1, buf); + return 0; +} + +static CLASS_DEVICE_ATTR(connection_time, S_IRUGO, show_conntime, NULL); + +static ssize_t show_disconnects(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + u32 num; + + if (vnic->statistics.disconn_ref) + num = vnic->statistics.disconn_num + 1; + else + num = vnic->statistics.disconn_num; + + return sprintf(buf, "%d\n", num); +} + +static CLASS_DEVICE_ATTR(disconnects, S_IRUGO, show_disconnects, NULL); + +static ssize_t show_avg_disconn_time(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + cycles_t time; + u32 num; + + if (vnic->statistics.disconn_ref) { + time = vnic->statistics.disconn_time + + get_cycles() - vnic->statistics.disconn_ref; + num = vnic->statistics.disconn_num + 1; + } + else { + time = vnic->statistics.disconn_time; + num = vnic->statistics.disconn_num; + } + + return avg_ticks_as_time(time, num, buf); +} + +static CLASS_DEVICE_ATTR(avg_disconn_time, S_IRUGO, show_avg_disconn_time, NULL); + +static ssize_t show_carrier_losses(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + u32 num; + + if (vnic->statistics.carrier_ref) + num = vnic->statistics.carrier_off_num + 1; + else + num = vnic->statistics.carrier_off_num; + + return sprintf(buf, "%d\n", num); +} + +static CLASS_DEVICE_ATTR(carrier_losses, S_IRUGO, show_carrier_losses, NULL); + +static ssize_t show_avg_carr_loss_time(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + cycles_t time; + u32 num; + + if (vnic->statistics.carrier_ref) { + time = vnic->statistics.carrier_off_time + + get_cycles() - vnic->statistics.carrier_ref; + num = vnic->statistics.disconn_num + 1; + } + else { + time = vnic->statistics.carrier_off_time; + num = vnic->statistics.carrier_off_num; + } + + return avg_ticks_as_time(time, num, buf); +} + +static CLASS_DEVICE_ATTR(avg_carrier_loss_time, S_IRUGO, + show_avg_carr_loss_time, NULL); + +static ssize_t show_avg_recv_time(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return avg_ticks_as_time(vnic->statistics.recv_time, + vnic->statistics.recv_num, + buf); +} + +static CLASS_DEVICE_ATTR(avg_recv_time, S_IRUGO, show_avg_recv_time, NULL); + +static ssize_t show_recvs(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.recv_num); +} + +static CLASS_DEVICE_ATTR(recvs, S_IRUGO, show_recvs, NULL); + + +static ssize_t show_avg_xmit_time(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return avg_ticks_as_time(vnic->statistics.xmit_time, + vnic->statistics.xmit_num, + buf); +} + +static CLASS_DEVICE_ATTR(avg_xmit_time, S_IRUGO, show_avg_xmit_time, NULL); + +static ssize_t show_xmits(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.xmit_num); +} + +static CLASS_DEVICE_ATTR(xmits, S_IRUGO, show_xmits, NULL); + +static ssize_t show_failed_xmits(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic *vnic = container_of(info, struct vnic, stat_info); + + return sprintf(buf, "%d\n", vnic->statistics.xmit_fail); +} + +static CLASS_DEVICE_ATTR(failed_xmits, S_IRUGO, show_failed_xmits, NULL); + + +static int setup_vnic_stats_files(struct vnic *vnic) +{ + + init_completion(&vnic->stat_info.released); + vnic->stat_info.class_dev.class = &vnic_class; + vnic->stat_info.class_dev.parent = &vnic->class_dev_info.class_dev; + snprintf(vnic->stat_info.class_dev.class_id, BUS_ID_SIZE, + "stats"); + + if (class_device_register(&vnic->stat_info.class_dev)) { + SYS_ERROR("create_vnic: error in registering" + " stat class dev\n"); + goto out; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_lifetime)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_xmits)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_avg_xmit_time)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_failed_xmits)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_recvs)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_avg_recv_time)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_connection_time)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_disconnects)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_avg_disconn_time)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_carrier_losses)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + + if (class_device_create_file(&vnic->stat_info.class_dev, + &class_device_attr_avg_carrier_loss_time)) { + SYS_ERROR("create_vnic: error in creating" + "stats attr file\n"); + goto err_file; + } + return 0; +err_file: + class_device_unregister(&vnic->stat_info.class_dev); + wait_for_completion(&vnic->stat_info.released); +out: + return -1; +} + +#endif /*CONFIG_INFINIBAND_VNIC_STATS*/ + + +static CLASS_DEVICE_ATTR(current_path, S_IRUGO, show_current_path, NULL); + +static int create_netpath(struct netpath *n_pdest, struct path_param *p_params) +{ + struct viport_config *viport_config; + struct viport *viport; + struct vnic *vnic; + struct list_head *ptr; + + SYS_INFO(KERN_INFO + "create_netpath: port=%d, instance=%d, heartbeat = %d\n", + (int)p_params->port, (int)p_params->instance, + (int)p_params->heartbeat); + + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (vnic->primary_path.viport) { + viport_config = vnic->primary_path.viport->config; + if ((viport_config->guid == p_params->ioc_guid) + && (viport_config->control_config.vnic_instance == + p_params->instance)) { + SYS_ERROR("GUID %llx," + " INSTANCE %d already in use\n", + p_params->ioc_guid, + p_params->instance); + return 0; + } + } + if (vnic->secondary_path.viport) { + viport_config = vnic->secondary_path.viport->config; + if ((viport_config->guid == p_params->ioc_guid) + && (viport_config->control_config.vnic_instance == + p_params->instance)) { + SYS_ERROR("GUID %llx," + " INSTANCE %d already in use\n", + p_params->ioc_guid, + p_params->instance); + return 0; + } + } + } + + viport_config = config_alloc_viport(p_params); + if (!viport_config) { + SYS_ERROR("create_netpath: failed creating viport config\n"); + return 0; + } + + if (p_params->heartbeat != MAXU64) { + viport_config->hb_interval = p_params->heartbeat; + /* 1/100s of sec */ + viport_config->hb_timeout = (p_params->heartbeat << 6) + * 10000; /* u_sec */ + } + + viport_config->path_idx = 0; + + viport = viport_allocate(viport_config); + if (!viport) { + SYS_ERROR("create_netpath: failed creating viport\n"); + config_free_viport(viport_config); + return 0; + } + if (!netpath_add_path(n_pdest, viport)) { + SYS_ERROR("create_netpath: failed associating" + " viport with vnic\n"); + viport_free(viport); + return 0; + } + netpath_disconnected(n_pdest, viport); + return 1; +} + +struct vnic *create_vnic(struct path_param *param) +{ + struct vnic_config *vnic_config; + struct vnic *vnic; + struct list_head *ptr; + + SYS_INFO("create_vnic: name = %s\n", param->name); + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strcmp(vnic->config->name, param->name)) { + SYS_ERROR("vnic %s already exists\n", + param->name); + return NULL; + } + } + + vnic_config = config_alloc_vnic(); + if (!vnic_config) { + SYS_ERROR("create_vnic: failed creating vnic config\n"); + return NULL; + } + if (param->rx_csum != MAXU64) { + vnic_config->overrides |= USE_RX_CSUM_OVERRIDE; + vnic_config->use_rx_csum = param->rx_csum ? TRUE : FALSE; + } + if (param->tx_csum != MAXU64) { + vnic_config->overrides |= USE_TX_CSUM_OVERRIDE; + vnic_config->use_tx_csum = param->tx_csum ? TRUE : FALSE; + } + strcpy(vnic_config->name, param->name); + vnic = vnic_allocate(vnic_config); + if (!vnic) { + SYS_ERROR("create_vnic: failed creating vnic\n"); + config_free_vnic(vnic_config); + return NULL; + } + + init_completion(&vnic->class_dev_info.released); + + vnic->class_dev_info.class_dev.class = &vnic_class; + vnic->class_dev_info.class_dev.parent = &interface_cdev.class_dev; + snprintf(vnic->class_dev_info.class_dev.class_id, BUS_ID_SIZE, + vnic_config->name); + + if (class_device_register(&vnic->class_dev_info.class_dev)) { + SYS_ERROR("create_vnic: error in registering" + " vnic class dev\n"); + goto free_vnic; + } + + if (class_device_create_file(&vnic->class_dev_info.class_dev, + &class_device_attr_vnic_state)) { + SYS_ERROR("create_vnic: error in creating" + "vnic_state attr file\n"); + goto err_file; + } + + if (class_device_create_file(&vnic->class_dev_info.class_dev, + &class_device_attr_rx_csum)) { + SYS_ERROR("create_vnic: error in creating" + "rx_csum attr file\n"); + goto err_file; + } + + if (class_device_create_file(&vnic->class_dev_info.class_dev, + &class_device_attr_tx_csum)) { + SYS_ERROR("create_vnic: error in creating" + "tx_csum attr file\n"); + goto err_file; + } + + if (class_device_create_file(&vnic->class_dev_info.class_dev, + &class_device_attr_current_path)) { + SYS_ERROR("create_vnic: error in creating" + "current_path attr file\n"); + goto err_file; + } +#ifdef CONFIG_INFINIBAND_VNIC_STATS + if (setup_vnic_stats_files(vnic)) + goto err_file; +#endif /*CONFIG_INFINIBAND_VNIC_STATS*/ + return vnic; +err_file: + class_device_unregister(&vnic->class_dev_info.class_dev); + wait_for_completion(&vnic->class_dev_info.released); +free_vnic: + vnic_free(vnic); + return NULL; +} + +ssize_t vnic_delete(struct class_device * class_dev, + const char *buf, size_t count) +{ + struct vnic *vnic; + struct list_head *ptr; + int ret = -EINVAL; + + if (count > IFNAMSIZ) { + printk(KERN_WARNING PFX "invalid vnic interface name\n"); + return ret; + } + + SYS_INFO("vnic_delete: name = %s\n", buf); + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strcmp(vnic->config->name, buf)) { + vnic_free(vnic); + return count; + } + } + + printk(KERN_WARNING PFX "vnic interface '%s' does not exist\n", buf); + return ret; +} + +static ssize_t show_viport_state(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct netpath *path = + container_of(info, struct netpath, class_dev_info); + switch (path->viport->state) { + case VIPORT_DISCONNECTED: + return sprintf(buf, "VIPORT_DISCONNECTED\n"); + case VIPORT_CONNECTED: + return sprintf(buf, "VIPORT_CONNECTED\n"); + default: + return sprintf(buf, "INVALID STATE\n"); + } + +} + +static CLASS_DEVICE_ATTR(viport_state, S_IRUGO, show_viport_state, NULL); + +static ssize_t show_link_state(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + struct netpath *path = + container_of(info, struct netpath, class_dev_info); + + switch (path->viport->link_state) { + case LINK_UNINITIALIZED: + return sprintf(buf, "LINK_UNINITIALIZED\n"); + case LINK_INITIALIZE: + return sprintf(buf, "LINK_INITIALIZE\n"); + case LINK_INITIALIZECONTROL: + return sprintf(buf, "LINK_INITIALIZECONTROL\n"); + case LINK_INITIALIZEDATA: + return sprintf(buf, "LINK_INITIALIZEDATA\n"); + case LINK_CONTROLCONNECT: + return sprintf(buf, "LINK_CONTROLCONNECT\n"); + case LINK_CONTROLCONNECTWAIT: + return sprintf(buf, "LINK_CONTROLCONNECTWAIT\n"); + case LINK_INITVNICREQ: + return sprintf(buf, "LINK_INITVNICREQ\n"); + case LINK_INITVNICRSP: + return sprintf(buf, "LINK_INITVNICRSP\n"); + case LINK_BEGINDATAPATH: + return sprintf(buf, "LINK_BEGINDATAPATH\n"); + case LINK_CONFIGDATAPATHREQ: + return sprintf(buf, "LINK_CONFIGDATAPATHREQ\n"); + case LINK_CONFIGDATAPATHRSP: + return sprintf(buf, "LINK_CONFIGDATAPATHRSP\n"); + case LINK_DATACONNECT: + return sprintf(buf, "LINK_DATACONNECT\n"); + case LINK_DATACONNECTWAIT: + return sprintf(buf, "LINK_DATACONNECTWAIT\n"); + case LINK_XCHGPOOLREQ: + return sprintf(buf, "LINK_XCHGPOOLREQ\n"); + case LINK_XCHGPOOLRSP: + return sprintf(buf, "LINK_XCHGPOOLRSP\n"); + case LINK_INITIALIZED: + return sprintf(buf, "LINK_INITIALIZED\n"); + case LINK_IDLE: + return sprintf(buf, "LINK_IDLE\n"); + case LINK_IDLING: + return sprintf(buf, "LINK_IDLING\n"); + case LINK_CONFIGLINKREQ: + return sprintf(buf, "LINK_CONFIGLINKREQ\n"); + case LINK_CONFIGLINKRSP: + return sprintf(buf, "LINK_CONFIGLINKRSP\n"); + case LINK_CONFIGADDRSREQ: + return sprintf(buf, "LINK_CONFIGADDRSREQ\n"); + case LINK_CONFIGADDRSRSP: + return sprintf(buf, "LINK_CONFIGADDRSRSP\n"); + case LINK_REPORTSTATREQ: + return sprintf(buf, "LINK_REPORTSTATREQ\n"); + case LINK_REPORTSTATRSP: + return sprintf(buf, "LINK_REPORTSTATRSP\n"); + case LINK_HEARTBEATREQ: + return sprintf(buf, "LINK_HEARTBEATREQ\n"); + case LINK_HEARTBEATRSP: + return sprintf(buf, "LINK_HEARTBEATRSP\n"); + case LINK_RESET: + return sprintf(buf, "LINK_RESET\n"); + case LINK_RESETRSP: + return sprintf(buf, "LINK_RESETRSP\n"); + case LINK_RESETCONTROL: + return sprintf(buf, "LINK_RESETCONTROL\n"); + case LINK_RESETCONTROLRSP: + return sprintf(buf, "LINK_RESETCONTROLRSP\n"); + case LINK_DATADISCONNECT: + return sprintf(buf, "LINK_DATADISCONNECT\n"); + case LINK_CONTROLDISCONNECT: + return sprintf(buf, "LINK_CONTROLDISCONNECT\n"); + case LINK_CLEANUPDATA: + return sprintf(buf, "LINK_CLEANUPDATA\n"); + case LINK_CLEANUPCONTROL: + return sprintf(buf, "LINK_CLEANUPCONTROL\n"); + case LINK_DISCONNECTED: + return sprintf(buf, "LINK_DISCONNECTED\n"); + case LINK_RETRYWAIT: + return sprintf(buf, "LINK_RETRYWAIT\n"); + default: + return sprintf(buf, "INVALID STATE\n"); + + } + +} +static CLASS_DEVICE_ATTR(link_state, S_IRUGO, show_link_state, NULL); + +static ssize_t show_heartbeat(struct class_device *class_dev, char *buf) +{ + struct class_dev_info *info = + container_of(class_dev, struct class_dev_info, class_dev); + + struct netpath *path = + container_of(info, struct netpath, class_dev_info); + + /*hb_inteval is in jiffies, convert it back to 1/100ths of a second */ + return sprintf(buf, "%d\n", + (path->viport->config->hb_interval * 100) / HZ); +} + +static CLASS_DEVICE_ATTR(heartbeat, S_IRUGO, show_heartbeat, NULL); + +static int setup_path_class_files(struct netpath *path, char *name) +{ + init_completion(&path->class_dev_info.released); + + path->class_dev_info.class_dev.class = &vnic_class; + path->class_dev_info.class_dev.parent = + &path->parent->class_dev_info.class_dev; + snprintf(path->class_dev_info.class_dev.class_id, BUS_ID_SIZE, name); + + if (class_device_register(&path->class_dev_info.class_dev)) { + SYS_ERROR("error in registering path class dev\n"); + goto out; + } + + if (class_device_create_file(&path->class_dev_info.class_dev, + &class_device_attr_viport_state)) { + + SYS_ERROR("error in creating viport state file\n"); + goto err_file; + } + + if (class_device_create_file(&path->class_dev_info.class_dev, + &class_device_attr_link_state)) { + + SYS_ERROR("error in creating link state file\n"); + goto err_file; + } + + if (class_device_create_file(&path->class_dev_info.class_dev, + &class_device_attr_heartbeat)) { + + SYS_ERROR("create_vnic: error in creating" + "heartbeat attr file\n"); + goto err_file; + } + + return 0; + +err_file: + class_device_unregister(&path->class_dev_info.class_dev); + wait_for_completion(&path->class_dev_info.released); +out: + return -1; + +} + +ssize_t vnic_create_primary(struct class_device * class_dev, + const char *buf, size_t count) +{ + struct class_dev_info *cdev = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic_ib_port *target = + container_of(cdev, struct vnic_ib_port, cdev_info); + + struct path_param param; + int ret = -EINVAL; + struct vnic *vnic; + + param.instance = 0; + param.rx_csum = MAXU64; + param.tx_csum = MAXU64; + param.heartbeat = MAXU64; + + ret = vnic_parse_options(buf, ¶m); + + if (ret) + goto out; + + param.ibdev = target->dev->dev; + param.ibport = target; + param.port = target->port_num; + + vnic = create_vnic(¶m); + if (!vnic) { + printk(KERN_ERR PFX "creating vnic failed\n"); + ret = -EINVAL; + goto out; + } + + if (!create_netpath(&vnic->primary_path, ¶m)) { + printk(KERN_ERR PFX "creating primary netpath failed\n"); + goto free_vnic; + } + + if (setup_path_class_files(&vnic->primary_path, "primary_path")) + goto free_vnic; + + if (vnic && !vnic->primary_path.viport) { + printk(KERN_ERR PFX "no valid netpaths\n"); + goto free_vnic; + } + + return count; + +free_vnic: + vnic_free(vnic); + ret = -EINVAL; +out: + return ret; +} + +ssize_t vnic_create_secondary(struct class_device * class_dev, + const char *buf, size_t count) +{ + struct class_dev_info *cdev = + container_of(class_dev, struct class_dev_info, class_dev); + struct vnic_ib_port *target = + container_of(cdev, struct vnic_ib_port, cdev_info); + + struct path_param param; + struct vnic *vnic; + int ret = -EINVAL; + struct list_head *ptr; + int found = 0; + + param.instance = 0; + param.rx_csum = MAXU64; + param.tx_csum = MAXU64; + param.heartbeat = MAXU64; + + ret = vnic_parse_options(buf, ¶m); + + if (ret) + goto out; + + list_for_each(ptr, &vnic_list) { + vnic = list_entry(ptr, struct vnic, list_ptrs); + if (!strncmp(vnic->config->name, param.name, IFNAMSIZ)) { + found = 1; + break; + } + } + + if (!found) { + printk(KERN_ERR PFX + "primary connection with name '%s' does not exist\n", + param.name); + ret = -EINVAL; + goto out; + } + + param.ibdev = target->dev->dev; + param.ibport = target; + param.port = target->port_num; + + if (!create_netpath(&vnic->secondary_path, ¶m)) { + printk(KERN_ERR PFX "creating secondary netpath failed\n"); + ret = -EINVAL; + goto out; + } + + if (setup_path_class_files(&vnic->secondary_path, "secondary_path")) + goto free_vnic; + + return count; + +free_vnic: + vnic_free(vnic); + ret = -EINVAL; +out: + return ret; +} diff --git a/drivers/infiniband/ulp/vnic/vnic_sys.h b/drivers/infiniband/ulp/vnic/vnic_sys.h new file mode 100644 index 0000000..c4c961e --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_sys.h @@ -0,0 +1,51 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_SYS_H_INCLUDED +#define VNIC_SYS_H_INCLUDED + +struct class_dev_info { + struct class_device class_dev; + struct completion released; +}; + +extern struct class vnic_class; +extern struct class_dev_info interface_cdev; +extern ssize_t vnic_create_primary(struct class_device *class_dev, + const char *buf, size_t count); + +extern ssize_t vnic_create_secondary(struct class_device *class_dev, + const char *buf, size_t count); + +extern ssize_t vnic_delete(struct class_device *class_dev, + const char *buf, size_t count); +#endif /*VNIC_SYS_H_INCLUDED*/ From rkuchimanchi at silverstorm.com Mon Oct 2 13:11:39 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:41:39 +0530 Subject: [openib-general] [PATCH 9/10] Driver utility file - implements various utility macros Message-ID: <4521BFD3.5876.4E47C073@rkuchimanchi.silverstorm.com> Adds the driver utility file. This file contains utility macros for debugging etc Signed-off-by: Ramachandra K --- drivers/infiniband/ulp/vnic/vnic_util.h | 286 +++++++++++++++++++++++++++++++ 1 files changed, 286 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/vnic/vnic_util.h b/drivers/infiniband/ulp/vnic/vnic_util.h new file mode 100644 index 0000000..ca35fa0 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/vnic_util.h @@ -0,0 +1,286 @@ +/* + * Copyright (c) 2006 SilverStorm Technologies Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef VNIC_UTIL_H_INCLUDED +#define VNIC_UTIL_H_INCLUDED + +#define MODULE_NAME "VNIC" + +extern u32 vnic_debug; + +#define DEBUG_IB_INFO 0x00000001 +#define DEBUG_IB_FUNCTION 0x00000002 +#define DEBUG_IB_FSTATUS 0x00000004 +#define DEBUG_IB_ASSERTS 0x00000008 +#define DEBUG_CONTROL_INFO 0x00000010 +#define DEBUG_CONTROL_FUNCTION 0x00000020 +#define DEBUG_CONTROL_PACKET 0x00000040 +#define DEBUG_CONFIG_INFO 0x00000100 +#define DEBUG_DATA_INFO 0x00001000 +#define DEBUG_DATA_FUNCTION 0x00002000 +#define DEBUG_NETPATH_INFO 0x00010000 +#define DEBUG_VIPORT_INFO 0x00100000 +#define DEBUG_VIPORT_FUNCTION 0x00200000 +#define DEBUG_LINK_STATE 0x00400000 +#define DEBUG_VNIC_INFO 0x01000000 +#define DEBUG_VNIC_FUNCTION 0x02000000 +#define DEBUG_SYS_INFO 0x10000000 +#define DEBUG_SYS_VERBOSE 0x40000000 + +#ifdef CONFIG_INFINIBAND_VNIC_DEBUG +#define PRINT(level, x, fmt, arg...) \ + printk(level "%s: %s: %s, line %d: " fmt, \ + MODULE_NAME, x, __FILE__, __LINE__, ##arg) + +#define PRINT_CONDITIONAL(level, x, condition, fmt, arg...) \ + do { \ + if (condition) \ + printk(level "%s: %s: %s, line %d: " fmt, \ + MODULE_NAME, x, __FILE__, __LINE__, \ + ##arg); \ + } while(0) +#else +#define PRINT(level, x, fmt, arg...) \ + printk( level "%s: " fmt, MODULE_NAME, ##arg) + +#define PRINT_CONDITIONAL(level, x, condition, fmt, arg...) \ + do { \ + if (condition) \ + printk(level "%s: %s: " fmt, \ + MODULE_NAME, x, ##arg); \ + } while(0) +#endif /*CONFIG_INFINIBAND_VNIC_DEBUG*/ + +#define IB_PRINT(fmt, arg...) PRINT(KERN_INFO, "IB", fmt, ##arg) +#define IB_ERROR(fmt, arg...) PRINT(KERN_ERR, "IB", fmt, ##arg) + +#define IB_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "IB", \ + (vnic_debug & DEBUG_IB_FUNCTION), \ + fmt, ##arg) + +#define IB_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "IB", \ + (vnic_debug & DEBUG_IB_INFO), \ + fmt, ##arg) + +#define IB_ASSERT(x) \ + do { \ + if ((vnic_debug & DEBUG_IB_ASSERTS) && !(x)) \ + panic("%s assertion failed, file: %s," \ + " line %d: ", \ + MODULE_NAME,__FILE__,__LINE__) \ + } while(0) + +#define CONTROL_PRINT(fmt, arg...) PRINT(KERN_INFO, "CONTROL", fmt, ##arg) +#define CONTROL_ERROR(fmt, arg...) PRINT(KERN_ERR, "CONTROL", fmt, ##arg) + +#define CONTROL_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "CONTROL", \ + (vnic_debug & DEBUG_CONTROL_INFO), \ + fmt, ##arg) + +#define CONTROL_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "CONTROL", \ + (vnic_debug & DEBUG_CONTROL_FUNCTION), \ + fmt, ##arg) + +#define CONTROL_PACKET(pkt) \ + do { \ + if (vnic_debug & DEBUG_CONTROL_PACKET) \ + control_log_control_packet(pkt); \ + } while(0) + +#define CONFIG_PRINT(fmt, arg...) PRINT(KERN_INFO, "CONFIG", fmt, ##arg) +#define CONFIG_ERROR(fmt, arg...) PRINT(KERN_ERR, "CONFIG", fmt, ##arg) + +#define CONFIG_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "CONFIG", \ + (vnic_debug & DEBUG_CONFIG_INFO), \ + fmt, ##arg) + +#define DATA_PRINT(fmt, arg...) PRINT(KERN_INFO, "DATA", fmt, ##arg) +#define DATA_ERROR(fmt, arg...) PRINT(KERN_ERR, "DATA", fmt, ##arg) + +#define DATA_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "DATA", \ + (vnic_debug & DEBUG_DATA_INFO), \ + fmt, ##arg) + +#define DATA_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "DATA", \ + (vnic_debug & DEBUG_DATA_FUNCTION), \ + fmt, ##arg) + +#define NETPATH_PRINT(fmt, arg...) PRINT(KERN_INFO, "NETPATH", fmt, ##arg) +#define NETPATH_ERROR(fmt, arg...) PRINT(KERN_ERR, "NETPATH", fmt, ##arg) + +#define NETPATH_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "NETPATH", \ + (vnic_debug & DEBUG_NETPATH_INFO), \ + fmt, ##arg) + +#define VIPORT_PRINT(fmt, arg...) PRINT(KERN_INFO, "VIPORT", fmt, ##arg) +#define VIPORT_ERROR(fmt, arg...) PRINT(KERN_ERR, "VIPORT", fmt, ##arg) + +#define VIPORT_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "VIPORT", \ + (vnic_debug & DEBUG_VIPORT_INFO), \ + fmt, ##arg) + +#define VIPORT_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "VIPORT", \ + (vnic_debug & DEBUG_VIPORT_FUNCTION), \ + fmt, ##arg) + +#define LINK_STATE(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "LINK", \ + (vnic_debug & DEBUG_LINK_STATE), \ + fmt, ##arg) + +#define VNIC_PRINT(fmt, arg...) PRINT(KERN_INFO, "NIC", fmt, ##arg) +#define VNIC_ERROR(fmt, arg...) PRINT(KERN_ERR, "NIC", fmt, ##arg) +#define VNIC_INIT(fmt, arg...) PRINT(KERN_INFO, "NIC", fmt, ##arg) + +#define VNIC_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "NIC", \ + (vnic_debug & DEBUG_VNIC_INFO), \ + fmt, ##arg) + +#define VNIC_FUNCTION(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "NIC", \ + (vnic_debug & DEBUG_VNIC_FUNCTION), \ + fmt, ##arg) + +#define SYS_PRINT(fmt, arg...) PRINT(KERN_INFO, "SYS", fmt, ##arg) +#define SYS_ERROR(fmt, arg...) PRINT(KERN_ERR, "SYS", fmt, ##arg) + +#define SYS_INFO(fmt, arg...) \ + PRINT_CONDITIONAL(KERN_INFO, \ + "SYS", \ + (vnic_debug & DEBUG_SYS_INFO), \ + fmt, ##arg) + +#define hton8(x) (x) +#define hton16(x) __cpu_to_be16(x) +#define hton32(x) __cpu_to_be32(x) +#define hton64(x) __cpu_to_be64(x) + +#define ntoh8(x) (x) +#define ntoh16(x) __be16_to_cpu(x) +#define ntoh32(x) __be32_to_cpu(x) +#define ntoh64(x) __be64_to_cpu(x) + +#define get_sksport(sk) inet_sk(sk)->sport +#define get_skdport(sk) inet_sk(sk)->dport + +#define is_power_of2(value) (((value) & ((value - 1))) == 0) + +typedef unsigned long uintn; /* __WORDSIZE/pointer sized integer */ + +/* round down value to align, align must be a power of 2 */ +#ifndef ROUNDDOWNP2 +#define ROUNDDOWNP2(val, align) \ + (((uintn)(val)) & (~((uintn)(align)-1))) +#endif +/* round up value to align, align must be a power of 2 */ +#ifndef ROUNDUPP2 +#define ROUNDUPP2(val, align) \ + (((uintn)(val) + (uintn)(align) - 1) & (~((uintn)(align)-1))) +#endif + +#define BOOLEAN u8 +#define TRUE 1 +#define FALSE 0 + +#define VNIC_MAJORVERSION 1 +#define VNIC_MINORVERSION 1 + +#define MAXU32 0xffffffff +#define MAXU64 ((u64)(~0ULL)) + +#if BITS_PER_LONG == 64 +#define PTR64(what) ((u64)(what)) +#define PTR(what) ((void *)(u64)(what)) +#elif BITS_PER_LONG == 32 +#define PTR64(what) ((u64)(u32)(what)) +#define PTR(what) ((void *)(u32)(what)) +#else +#error "BITS_PER_LONG not 32 nor 64" +#endif + +#if BITS_PER_LONG == 64 +#ifdef __ia64__ +#define __PRI64_PREFIX "l" +#else +#define __PRI64_PREFIX "ll" +#endif +#define PRISZT "lu" +#elif BITS_PER_LONG == 32 +#define __PRI64_PREFIX "L" +#define PRISZT "u" +#else +#error "BITS_PER_LONG not 64 nor 32" +#endif +#define __PRIN_PREFIX "l" +#define PRId64 __PRI64_PREFIX"d" +#define PRIo64 __PRI64_PREFIX"o" +#define PRIu64 __PRI64_PREFIX"u" +#define PRIx64 __PRI64_PREFIX"x" +#define PRIX64 __PRI64_PREFIX"X" +#define PRIdN __PRIN_PREFIX"d" +#define PRIoN __PRIN_PREFIX"o" +#define PRIuN __PRIN_PREFIX"u" +#define PRIxN __PRIN_PREFIX"x" + +/* source time is 100ths of a sec */ +#define CONV2JIFFIES(time) (((time) * HZ) / 100) +#define CONV2USEC(time) ((time) * 10000) + +#ifndef min +#define min(a,b) ((a)<(b)?(a):(b)) +#endif + +#endif /* VNIC_UTIL_H_INCLUDED */ From rkuchimanchi at silverstorm.com Mon Oct 2 13:12:44 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:42:44 +0530 Subject: [openib-general] [PATCH 10/10] Driver Kconfig/Makefile. Modifications to toplevel Kconfig/Makefile Message-ID: <4521C014.14373.4E48BCA6@rkuchimanchi.silverstorm.com> Adds the Kconfig and Makefile for the driver. Modifies the top level Infiniband Kconfig and Makefile to include VNIC. Signed-off-by: Ramachandra K --- drivers/infiniband/Kconfig | 2 ++ drivers/infiniband/Makefile | 1 + drivers/infiniband/ulp/vnic/Kconfig | 28 ++++++++++++++++++++++++++++ drivers/infiniband/ulp/vnic/Makefile | 11 +++++++++++ 4 files changed, 42 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 9edface..5676c6a 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -45,4 +45,6 @@ source "drivers/infiniband/ulp/srp/Kconf source "drivers/infiniband/ulp/iser/Kconfig" +source "drivers/infiniband/ulp/vnic/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index 2b5d109..5407878 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ +obj-$(CONFIG_INFINIBAND_VNIC) += ulp/vnic/ diff --git a/drivers/infiniband/ulp/vnic/Kconfig b/drivers/infiniband/ulp/vnic/Kconfig new file mode 100644 index 0000000..3be14ff --- /dev/null +++ b/drivers/infiniband/ulp/vnic/Kconfig @@ -0,0 +1,28 @@ +config INFINIBAND_VNIC + tristate "VNIC - Support for SilverStorm Virtual Ethernet I/O Controller" + depends on INFINIBAND && NETDEVICES && INET + ---help--- + Support for the SilverStorm Virtual Ethernet I/O Controller + (VEx). In conjunction with the VEx, this provides virtual + ethernet interfaces and transports ethernet packets over + InfiniBand so that you can communicate with Ethernet networks + using your IB device. + +config INFINIBAND_VNIC_DEBUG + bool "VNIC Verbose debugging" + depends on INFINIBAND_VNIC + default n + ---help--- + This option causes verbose debugging code to be compiled + into the VNIC driver. The output can be turned on via the + vnic_debug module parameter. + +config INFINIBAND_VNIC_STATS + bool "VNIC Statistics" + depends on INFINIBAND_VNIC + default n + ---help--- + This option compiles statistics collecting code into the + data path of the VNIC driver to help in profiling and fine + tuning. This adds some overhead in the interest of gathering + data. diff --git a/drivers/infiniband/ulp/vnic/Makefile b/drivers/infiniband/ulp/vnic/Makefile new file mode 100644 index 0000000..253d167 --- /dev/null +++ b/drivers/infiniband/ulp/vnic/Makefile @@ -0,0 +1,11 @@ +obj-$(CONFIG_INFINIBAND_VNIC) += ib_vnic.o + +ib_vnic-y := vnic_main.o \ + vnic_ib.o \ + vnic_viport.o \ + vnic_control.o \ + vnic_data.o \ + vnic_netpath.o \ + vnic_config.o \ + vnic_sys.o + From rdreier at cisco.com Mon Oct 2 13:18:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:18:08 -0700 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <4521BCB1.524.4E3B826B@rkuchimanchi.silverstorm.com> ( Ramachandra K.'s message of "Tue, 03 Oct 2006 01:28:17 +0530") References: <4521BCB1.524.4E3B826B@rkuchimanchi.silverstorm.com> Message-ID: Ramachandra> This patch series is intended for your infiniband.git Ramachandra> for-2.6.19 branch. It also has been tested against Ramachandra> the for-2.6.20 branch. Well, no way is this going to be merged into 2.6.19 at this stage in the release cycle (the merge window is closing in a few days and this has never been reviewed at all). Also, you're going to want to cross-post this to lkml and netdev as well so that people subscribed there can review it. - R From sweitzen at cisco.com Mon Oct 2 13:22:07 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 2 Oct 2006 13:22:07 -0700 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) Message-ID: Is this communication protocols documented anywhere? How does this feature compare to IPoIB and SDP? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Ramachandra K > Sent: Monday, October 02, 2006 12:58 PM > To: Roland Dreier (rdreier) > Cc: rkuchimanchi at silverstorm.com; openib-General > Subject: [openib-general] [PATCH 0/10] [RFC] Support for > SilverStorm Virtual Ethernet I/O controller (VEx) > > Hi Roland, > > This patch series adds support for the SilverStorm Virtual > Ethernet I/O > Controllers (VEx) by adding a new kernel level driver. > > This kernel driver: > > 1. Communicates with the VEx on the SilverStorm fabric > switches/directors using > SilverStorm's native protocol > 2. Presents a standard Ethernet NIC interface to the system > 3. Uses IB reliable connection semantics > 4. Is tuned for high performance and throughput > > The SilverStorm VEx and the associated communication protocol > is in wide use > amongst users of SilverStorm IB fabric solutions. > > This patch series is intended for your infiniband.git > for-2.6.19 branch. It > also has been tested against the for-2.6.20 branch. > > Signed-off-by: Ramachandra K > > Regards, > Ram > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rkuchimanchi at silverstorm.com Mon Oct 2 13:27:24 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 01:57:24 +0530 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: References: <4521BCB1.524.4E3B826B@rkuchimanchi.silverstorm.com> Message-ID: <4521762C.6000007@silverstorm.com> Roland Dreier wrote: > Ramachandra> This patch series is intended for your infiniband.git > Ramachandra> for-2.6.19 branch. It also has been tested against > Ramachandra> the for-2.6.20 branch. > >Well, no way is this going to be merged into 2.6.19 at this stage in >the release cycle (the merge window is closing in a few days and this >has never been reviewed at all). > > In that case, can you please consider this for the for-2.6.20 branch ? Regards, Ram From rdreier at cisco.com Mon Oct 2 13:26:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:26:47 -0700 Subject: [openib-general] [PATCH 1/10] Driver Main files - netdev functions and corresponding state maintenance In-Reply-To: <4521BDD9.27185.4E400870@rkuchimanchi.silverstorm.com> ( Ramachandra K.'s message of "Tue, 03 Oct 2006 01:33:13 +0530") References: <4521BDD9.27185.4E400870@rkuchimanchi.silverstorm.com> Message-ID: > +#ifdef CONFIG_INFINIBAND_VNIC_STATS > + extern cycles_t recv_ref; > +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ put this declaration in a header file somewhere, not inside a function in a .c file. Also is it really worth having CONFIG_INFINIBAND_VNIC_STATS? Who would use it? Or would anyone turn it off? All the #ifdefs make the code much harder to read so I think you need to figure out a better way to make it conditional if you really want it to be configurable. > + /* TBD */ > + /* TBD */ Umm... > +static BOOLEAN vnic_npevent_register(struct vnic *vnic, struct netpath *netpath) What do you gain from having this shouting "BOOLEAN" type? From HNGUYEN at de.ibm.com Mon Oct 2 13:29:16 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 2 Oct 2006 22:29:16 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1] ehca: fix ehca_probe if module loaded after ib_ipoib In-Reply-To: Message-ID: > Looks OK but your mailer mangled the patch. Please resend in a form > that can be applied... > please send unrelated changes as separate patches. > So this should come as two patches -- one to fix the device > registration, and one to change your debug formatting. ok, will resend those two patches soon. From rdreier at cisco.com Mon Oct 2 13:28:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:28:32 -0700 Subject: [openib-general] [PATCH 2/10] Driver netpath files - abstraction of connection to VEx In-Reply-To: <4521BE03.25916.4E40AAF9@rkuchimanchi.silverstorm.com> ( Ramachandra K.'s message of "Tue, 03 Oct 2006 01:33:55 +0530") References: <4521BE03.25916.4E40AAF9@rkuchimanchi.silverstorm.com> Message-ID: > + if (netpath->timer_state == NETPATH_TS_ACTIVE) { > + del_timer_sync(&netpath->timer); > + } kernel style is just to do if (netpath->timer_state == NETPATH_TS_ACTIVE) del_timer_sync(&netpath->timer); this could be fixed many places. > +void netpath_connected(struct netpath *netpath, struct viport *viport) > +{ > + vnic_connected(netpath->parent, netpath); > + return; > +} > + > +void netpath_disconnected(struct netpath *netpath, struct viport *viport) > +{ > + vnic_disconnected(netpath->parent, netpath); > + return; > +} what do the return; statement accomplish here? In fact what do these wrappers accomplish? - R. From rdreier at cisco.com Mon Oct 2 13:29:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:29:59 -0700 Subject: [openib-general] [PATCH 3/10] Driver viport files - implementation of communication protocol with VEx In-Reply-To: <4521BE44.28605.4E41A836@rkuchimanchi.silverstorm.com> ( Ramachandra K.'s message of "Tue, 03 Oct 2006 01:35:00 +0530") References: <4521BE44.28605.4E41A836@rkuchimanchi.silverstorm.com> Message-ID: > + viport = (struct viport *)kmalloc(sizeof(struct viport), GFP_KERNEL); > + memset(viport, 0, sizeof(struct viport)); cast from void * is not necessary. memset can be replaced by just using kzalloc(). - R. From robert.j.woodruff at intel.com Mon Oct 2 13:33:01 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 2 Oct 2006 13:33:01 -0700 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA Message-ID: Hi Roland/Michael, One of my coworkers in Champaign is seeing a performance issue with the latest SVN driver and the OFED 1.1 Mellanox driver on certain platforms. On the older SVN somewhere around 7500 the Mellanox driver did not save and restore certain PCI registers before a reset. Somewhere around SVN 8000 a patch was added to save and restore these registers. However on our Alcolu platform this patch causes the MaxReadReq to be set to 128 bytes (rather than 512) which limits bandwith to 650MBytes/sec. If I remove the save/restore of these registers (attached patch), the bandwidth is back to where we would expect it 1250 Mbytes/sec. Is there some problem with this patch or do you think it is some BIOS issue in the platform ? woody -------------- next part -------------- A non-text attachment was scrubbed... Name: pci_regs.patch Type: application/octet-stream Size: 3100 bytes Desc: pci_regs.patch URL: From hnguyen at de.ibm.com Mon Oct 2 13:32:49 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 2 Oct 2006 22:32:49 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1 1/2] ehca: fix ehca device registration Message-ID: <200610022232.49540.hnguyen@de.ibm.com> Hi Roland! Below is a patch of ehca, which fixes a bug (crash) that occured when ib_ehca is loaded after ib_ipoib. This patch initializes struct ehca_shca with struct device*, then creates internal resources and finally registers the ehca IB device. And that is the proper sequence to do. Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_main.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c 2006-10-02 22:08:57.000000000 +0200 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c 2006-10-02 18:29:53.000000000 +0200 @@ -49,7 +49,7 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0016"); +MODULE_VERSION("SVNEHCA_0017"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -239,7 +239,7 @@ init_node_guid1: return ret; } -int ehca_register_device(struct ehca_shca *shca) +int ehca_init_device(struct ehca_shca *shca) { int ret; @@ -317,11 +317,6 @@ int ehca_register_device(struct ehca_shc /* shca->ib_device.process_mad = ehca_process_mad; */ shca->ib_device.mmap = ehca_mmap; - ret = ib_register_device(&shca->ib_device); - if (ret) - ehca_err(&shca->ib_device, - "ib_register_device() failed ret=%x", ret); - return ret; } @@ -561,9 +556,9 @@ static int __devinit ehca_probe(struct i goto probe1; } - ret = ehca_register_device(shca); + ret = ehca_init_device(shca); if (ret) { - ehca_gen_err("Cannot register Infiniband device"); + ehca_gen_err("Cannot init ehca device struct"); goto probe1; } @@ -571,7 +566,7 @@ static int __devinit ehca_probe(struct i ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); - goto probe2; + goto probe1; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); @@ -600,6 +595,13 @@ static int __devinit ehca_probe(struct i goto probe5; } + ret = ib_register_device(&shca->ib_device); + if (ret) { + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); + goto probe6; + } + /* create AQP1 for port 1 */ if (ehca_open_aqp1 == 1) { shca->sport[0].port_state = IB_PORT_DOWN; @@ -607,7 +609,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 1."); - goto probe6; + goto probe7; } } @@ -618,7 +620,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 2."); - goto probe7; + goto probe8; } } @@ -630,12 +632,15 @@ static int __devinit ehca_probe(struct i return 0; -probe7: +probe8: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) ehca_err(&shca->ib_device, "Cannot destroy AQP1 for port 1. ret=%x", ret); +probe7: + ib_unregister_device(&shca->ib_device); + probe6: ret = ehca_dereg_internal_maxmr(shca); if (ret) @@ -660,9 +665,6 @@ probe3: ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); -probe2: - ib_unregister_device(&shca->ib_device); - probe1: ib_dealloc_device(&shca->ib_device); @@ -750,7 +752,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0016)\n"); + "(Rel.: SVNEHCA_0017)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); -------------- next part -------------- diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_main.c 2006-10-02 22:08:57.000000000 +0200 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_main.c 2006-10-02 18:29:53.000000000 +0200 @@ -49,7 +49,7 @@ MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0016"); +MODULE_VERSION("SVNEHCA_0017"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -239,7 +239,7 @@ init_node_guid1: return ret; } -int ehca_register_device(struct ehca_shca *shca) +int ehca_init_device(struct ehca_shca *shca) { int ret; @@ -317,11 +317,6 @@ int ehca_register_device(struct ehca_shc /* shca->ib_device.process_mad = ehca_process_mad; */ shca->ib_device.mmap = ehca_mmap; - ret = ib_register_device(&shca->ib_device); - if (ret) - ehca_err(&shca->ib_device, - "ib_register_device() failed ret=%x", ret); - return ret; } @@ -561,9 +556,9 @@ static int __devinit ehca_probe(struct i goto probe1; } - ret = ehca_register_device(shca); + ret = ehca_init_device(shca); if (ret) { - ehca_gen_err("Cannot register Infiniband device"); + ehca_gen_err("Cannot init ehca device struct"); goto probe1; } @@ -571,7 +566,7 @@ static int __devinit ehca_probe(struct i ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); - goto probe2; + goto probe1; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); @@ -600,6 +595,13 @@ static int __devinit ehca_probe(struct i goto probe5; } + ret = ib_register_device(&shca->ib_device); + if (ret) { + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); + goto probe6; + } + /* create AQP1 for port 1 */ if (ehca_open_aqp1 == 1) { shca->sport[0].port_state = IB_PORT_DOWN; @@ -607,7 +609,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 1."); - goto probe6; + goto probe7; } } @@ -618,7 +620,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 2."); - goto probe7; + goto probe8; } } @@ -630,12 +632,15 @@ static int __devinit ehca_probe(struct i return 0; -probe7: +probe8: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) ehca_err(&shca->ib_device, "Cannot destroy AQP1 for port 1. ret=%x", ret); +probe7: + ib_unregister_device(&shca->ib_device); + probe6: ret = ehca_dereg_internal_maxmr(shca); if (ret) @@ -660,9 +665,6 @@ probe3: ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); -probe2: - ib_unregister_device(&shca->ib_device); - probe1: ib_dealloc_device(&shca->ib_device); @@ -750,7 +752,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0016)\n"); + "(Rel.: SVNEHCA_0017)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); From rdreier at cisco.com Mon Oct 2 13:36:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:36:46 -0700 Subject: [openib-general] [PATCH 9/10] Driver utility file - implements various utility macros In-Reply-To: <4521BFD3.5876.4E47C073@rkuchimanchi.silverstorm.com> ( Ramachandra K.'s message of "Tue, 03 Oct 2006 01:41:39 +0530") References: <4521BFD3.5876.4E47C073@rkuchimanchi.silverstorm.com> Message-ID: > +#define hton8(x) (x) > +#define hton16(x) __cpu_to_be16(x) > +#define hton32(x) __cpu_to_be32(x) > +#define hton64(x) __cpu_to_be64(x) > + > +#define ntoh8(x) (x) > +#define ntoh16(x) __be16_to_cpu(x) > +#define ntoh32(x) __be32_to_cpu(x) > +#define ntoh64(x) __be64_to_cpu(x) Please just use the standard cpu_to_beXX / beXX_to_cpu functions directly (without the __). > +#define is_power_of2(value) (((value) & ((value - 1))) == 0) > + > +typedef unsigned long uintn; /* __WORDSIZE/pointer sized integer */ > + > +/* round down value to align, align must be a power of 2 */ > +#ifndef ROUNDDOWNP2 > +#define ROUNDDOWNP2(val, align) \ > + (((uintn)(val)) & (~((uintn)(align)-1))) > +#endif > +/* round up value to align, align must be a power of 2 */ > +#ifndef ROUNDUPP2 > +#define ROUNDUPP2(val, align) \ > + (((uintn)(val) + (uintn)(align) - 1) & (~((uintn)(align)-1))) > +#endif If you need this stuff it should probably go in some common kernel include. > +#if BITS_PER_LONG == 64 > +#define PTR64(what) ((u64)(what)) > +#define PTR(what) ((void *)(u64)(what)) > +#elif BITS_PER_LONG == 32 > +#define PTR64(what) ((u64)(u32)(what)) > +#define PTR(what) ((void *)(u32)(what)) > +#else > +#error "BITS_PER_LONG not 32 nor 64" > +#endif umm.. what the heck is this trying to do? If you want to cast a pointer to an integer, just use 'unsigned long' to hold it. > +#endif > +#define __PRIN_PREFIX "l" > +#define PRId64 __PRI64_PREFIX"d" > +#define PRIo64 __PRI64_PREFIX"o" > +#define PRIu64 __PRI64_PREFIX"u" > +#define PRIx64 __PRI64_PREFIX"x" > +#define PRIX64 __PRI64_PREFIX"X" > +#define PRIdN __PRIN_PREFIX"d" > +#define PRIoN __PRIN_PREFIX"o" > +#define PRIuN __PRIN_PREFIX"u" > +#define PRIxN __PRIN_PREFIX"x" kernel style is just to use "%llx" or whatever for printing 64-bit values, and cast them to unsigned long long to avoid warnings about printf formats. > +/* source time is 100ths of a sec */ > +#define CONV2JIFFIES(time) (((time) * HZ) / 100) > +#define CONV2USEC(time) ((time) * 10000) Why are you using such a wacky unit? This looks really error-prone -- the conversions in should be good enough I think. > +#ifndef min > +#define min(a,b) ((a)<(b)?(a):(b)) > +#endif Unneeded since the kernel _does_ have a better definition of min() (type-safe, evaluations parameters only once, etc) - R. From hnguyen at de.ibm.com Mon Oct 2 13:33:50 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 2 Oct 2006 22:33:50 +0200 Subject: [openib-general] [PATCH 2.6.19-rc1 2/2] ehca: improved ehca debug format Message-ID: <200610022233.50497.hnguyen@de.ibm.com> Hi, here is the 2nd patch of ehca with a small format improvement in ehca debug function. Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_tools.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_tools.h infiniband_work/drivers/infiniband/hw/ehca/ehca_tools.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_tools.h 2006-10-02 22:08:57.000000000 +0200 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_tools.h 2006-10-02 18:29:53.000000000 +0200 @@ -117,7 +117,7 @@ extern int ehca_debug_level; unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - printk("EHCA_DMP:%s" format \ + printk("EHCA_DMP:%s " format \ " adr=%p ofs=%04x %016lx %016lx\n", \ __FUNCTION__, ##args, deb, x, \ *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ -------------- next part -------------- diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_tools.h infiniband_work/drivers/infiniband/hw/ehca/ehca_tools.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_tools.h 2006-10-02 22:08:57.000000000 +0200 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_tools.h 2006-10-02 18:29:53.000000000 +0200 @@ -117,7 +117,7 @@ extern int ehca_debug_level; unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - printk("EHCA_DMP:%s" format \ + printk("EHCA_DMP:%s " format \ " adr=%p ofs=%04x %016lx %016lx\n", \ __FUNCTION__, ##args, deb, x, \ *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ From rdreier at cisco.com Mon Oct 2 13:38:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:38:35 -0700 Subject: [openib-general] [PATCH 7/10] Handling of various configurable parameters of the driver In-Reply-To: <4521BF3C.17740.4E457192@rkuchimanchi.silverstorm.com> ( Ramachandra K.'s message of "Tue, 03 Oct 2006 01:39:08 +0530") References: <4521BF3C.17740.4E457192@rkuchimanchi.silverstorm.com> Message-ID: > + sid = 0x10LL << 56 | > + 0x00LL << 48 | > + 0x06LL << 40 | > + 0x6aLL << 32 | > + 0x00LL << 24 | > + 0x00LL << 16 | > + 0x00LL << 8 | ((be64_to_cpu(params->ioc_guid) >> 32) & 0xFF); What is this magic number code doing?? Wouldn't it be clearer just to use the constant 0x1000066a00000000 rather than making it by hand? What does that value mean? - R. From rdreier at cisco.com Mon Oct 2 13:40:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:40:18 -0700 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <4521762C.6000007@silverstorm.com> (Ramachandra K.'s message of "Tue, 03 Oct 2006 01:57:24 +0530") References: <4521BCB1.524.4E3B826B@rkuchimanchi.silverstorm.com> <4521762C.6000007@silverstorm.com> Message-ID: Ramachandra> In that case, can you please consider this for the Ramachandra> for-2.6.20 branch ? I'm happy to keep this in a vex branch or something like that, but as the emails I just sent show, this is not ready for merging yet (which is to be expected -- it's never been reviewed). I think Scott's question about protocol documentation is a good one. And also as I said this needs to be sent to lkml and netdev for full review by everyone. - R. From rdreier at cisco.com Mon Oct 2 13:44:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 13:44:53 -0700 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA In-Reply-To: (Robert J. Woodruff's message of "Mon, 2 Oct 2006 13:33:01 -0700") References: Message-ID: Does using the "tune_pci=1" module option for ib_mthca bring the performance back up? The reason the driver was changed to work this way is that presumably the BIOS is setting the PCI configuration as it does for a reason. So you might want investigate why the BIOS sets MaxReadReq down to 128 in the first place. (removing the save/restore across reset lets the HCA pick a new default for all the settings, but may cause problems by getting rid of BIOS settings, which we assume were done for a reason). However tune_pci=1 will make the driver override this setting if you really know what you're doing. - R. From bos at pathscale.com Mon Oct 2 13:48:36 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 02 Oct 2006 13:48:36 -0700 Subject: [openib-general] [PATCH 1/10] Driver Main files - netdev functions and corresponding state maintenance In-Reply-To: <4521BDD9.27185.4E400870@rkuchimanchi.silverstorm.com> References: <4521BDD9.27185.4E400870@rkuchimanchi.silverstorm.com> Message-ID: <45217B24.5020903@pathscale.com> Ramachandra K wrote: > +#include Not needed. > +#include Not needed. > +#ifdef CONFIG_INFINIBAND_VNIC_STATS > + if (vnic->statistics.conn_time == 0) { > + vnic->statistics.conn_time = > + get_cycles() - vnic->statistics.start_time; > + } > + if (vnic->statistics.disconn_ref != 0) { > + vnic->statistics.disconn_time += > + get_cycles() - vnic->statistics.disconn_ref; > + vnic->statistics.disconn_num++; > + vnic->statistics.disconn_ref = 0; > + } > +#endif /* CONFIG_INFINIBAND_VNIC_STATS */ Why does none of your stats code use locks? > +static int vnic_open(struct net_device *device) > +{ > + struct vnic *vnic; > + int ret = 0; > + struct netpath *np; > + > + VNIC_FUNCTION("vnic_open()\n"); > + vnic = (struct vnic *)device->priv; > + np = vnic->current_path; > + > + if (vnic->state != VNIC_REGISTERED) { > + ret = -ENODEV; > + } > + > + vnic->open++; > + vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_SETLINK); > + vnic->xmit_started = 1; > + netif_start_queue(&vnic->netdevice); > + > + return ret; > +} If you're returning an error value, you shouldn't be finishing the open call as if nothing happened. > +static int vnic_hard_start_xmit(struct sk_buff *skb, struct net_device *device) > +{ > > + dev_kfree_skb(skb); > + return 0; /* TBD: what should I return? */ > +} Any non-zero value means "try again". > +static void vnic_tx_timeout(struct net_device *device) > > + return; Not needed. > +static int vnic_do_ioctl(struct net_device *device, struct ifreq *ifr, int cmd) > +{ > + struct vnic *vnic; > + int ret = 0; > + > + VNIC_FUNCTION("vnic_do_ioctl()\n"); > + vnic = (struct vnic *)device->priv; > + > + /* TBD */ > + > + return ret; > +} If you don't do anything, don't implement this. And especially don't return success no matter what you're passed. > +static int vnic_set_config(struct net_device *device, struct ifmap *map) > +{ > + struct vnic *vnic; > + int ret = 0; > + > + VNIC_FUNCTION("vnic_set_config()\n"); > + vnic = (struct vnic *)device->priv; > + > + /* TBD */ > + > + return ret; > +} Likewise. > +static BOOLEAN vnic_npevent_register(struct vnic *vnic, struct netpath *netpath) There's no BOOLEAN type in the kernel; please don't add one. > + if (register_netdev(&vnic->netdevice) != 0) { > + VNIC_ERROR("failed registering netdev\n"); > + return FALSE; > + } Propagate the error value instead. > + vnic->state = VNIC_REGISTERED; > + vnic->carrier = 2; /* special value to force netif_carrier_(on|off) */ > + return TRUE; > +} And return 0 on success. > + BOOLEAN delay = TRUE; No BOOLEANs, please. > + if (!vnic->carrier) { > + switch (netpath->timer_state) { > + case NETPATH_TS_IDLE: > + netpath->timer_state = > + NETPATH_TS_ACTIVE; > + if (vnic->state == VNIC_UNINITIALIZED) > + netpath_timer(netpath, This is a very deep nesting of conditionals. Please restructure into something more compreshensible. A general comment: I don't understand why you've moved a bunch of code with well-defined entry points into this big ugly single-function state machine. It means you have a whole lot of trivial wrapper code that serves no purpose, and decreases the readability of the driver significantly. References: <4521BE44.28605.4E41A836@rkuchimanchi.silverstorm.com> Message-ID: <45217BC5.4010707@pathscale.com> Ramachandra K wrote: > Adds the driver viport files. These files implement the state machine > for the communication protocol with the VEx. This looks like a cut-and-paste of the main driver file, and has the same big problem of a single huge state machine function and a bunch of tiny trivial stubs that all serve to obfuscate the code. (Bryan O'Sullivan's message of "Mon, 02 Oct 2006 13:51:17 -0700") References: <4521BE44.28605.4E41A836@rkuchimanchi.silverstorm.com> <45217BC5.4010707@pathscale.com> Message-ID: Bryan> This looks like a cut-and-paste of the main driver file, Bryan> and has the same big problem of a single huge state machine Bryan> function and a bunch of tiny trivial stubs that all serve Bryan> to obfuscate the code. Yes, in general it seems like this all could be made quite a bit smaller and easier to understand by removing some of the extraneous layering -- almost all the functions look like trivial pass-throughs to lower layers. - R. From robert.j.woodruff at intel.com Mon Oct 2 13:50:33 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 2 Oct 2006 13:50:33 -0700 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA Message-ID: Roland wrote, >However tune_pci=1 will make the driver override this setting if you >really know what you're doing. > - R. Peter, can you give this a try ? I think you set this in /etc/modprobe.conf add the line, options mthca tune_pci=1 Also, we need to understand why the BIOS in your platform is setting it to 128 rather than 512. Is this an oversight in the BIOS or are they doing it for a reason. woody From bos at pathscale.com Mon Oct 2 13:59:53 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 02 Oct 2006 13:59:53 -0700 Subject: [openib-general] [PATCH 5/10] Implementation of Data path of the communication protocol In-Reply-To: <4521BEB8.15266.4E436D27@rkuchimanchi.silverstorm.com> References: <4521BEB8.15266.4E436D27@rkuchimanchi.silverstorm.com> Message-ID: <45217DC9.1060307@pathscale.com> Ramachandra K wrote: > Adds the files that implement the data transfer part of the > communication protocol with the VEx. The RDMA of ethernet > packets is implemented in here. I see no sparse annotations to indicate endianness or user visibility of any data throughout the driver. The driver should pass make C=1 CF=-D__CHECK_ENDIAN__ cleanly, which it looks like it won't right now. Also, I see a number of non-standard macros like ntoh16 and so on. Please use the normal cpu_to_be16 etc instead. Adding: Options ib_mthca tune_pci=1 Puts MaxReadReq = 4096. I get 1250MB/s bandwidth. -- Peter -----Original Message----- From: Woodruff, Robert J Sent: Monday, October 02, 2006 3:51 PM To: Roland Dreier; Hartman, Peter Cc: Michael S. Tsirkin; openib-general; EWG; Hartman, Peter Subject: RE: Drop in performance on Mellanox MT25204 single port DDR HCA Roland wrote, >However tune_pci=1 will make the driver override this setting if you >really know what you're doing. > - R. Peter, can you give this a try ? I think you set this in /etc/modprobe.conf add the line, options mthca tune_pci=1 Also, we need to understand why the BIOS in your platform is setting it to 128 rather than 512. Is this an oversight in the BIOS or are they doing it for a reason. woody From bos at pathscale.com Mon Oct 2 14:10:07 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 02 Oct 2006 14:10:07 -0700 Subject: [openib-general] [PATCH 8/10] sysfs interface implementation In-Reply-To: <4521BF8B.3021.4E46A8AA@rkuchimanchi.silverstorm.com> References: <4521BF8B.3021.4E46A8AA@rkuchimanchi.silverstorm.com> Message-ID: <4521802F.4010405@pathscale.com> Ramachandra K wrote: > > +/* > + * target eiocs are added by writing > + * > + * ioc_guid=,dgid=,pkey=,name= > + * to the create_primary sysfs attribute. > + */ > +enum { > + VNIC_OPT_ERR = 0, > + VNIC_OPT_IOC_GUID = 1 << 0, > + VNIC_OPT_DGID = 1 << 1, > + VNIC_OPT_PKEY = 1 << 2, > + VNIC_OPT_NAME = 1 << 3, > + VNIC_OPT_INSTANCE = 1 << 4, > + VNIC_OPT_RXCSUM = 1 << 5, > + VNIC_OPT_TXCSUM = 1 << 6, > + VNIC_OPT_HEARTBEAT = 1 << 7, > + VNIC_OPT_ALL = (VNIC_OPT_IOC_GUID | > + VNIC_OPT_DGID | VNIC_OPT_NAME | VNIC_OPT_PKEY), > +}; This is not OK. You can't pass in multiple values to a sysfs file. Either set the values separately or (if they have to be set all at once) find some other way to do this work. Also, putting all of this parsing cruft in a driver is a sign you're trying to do something you shouldn't be. > +static int avg_ticks_as_time(cycles_t ticks, u32 count, char *buffer) Leave out all the pretty printing. Just print a number in standard units, and let userspace do the parsing. > +static int setup_vnic_stats_files(struct vnic *vnic) > +{ This code needs to use sysfs_create_group instead. > > +static int create_netpath(struct netpath *n_pdest, struct path_param *p_params) > +{ Why does this not return any error values? > +struct vnic *create_vnic(struct path_param *param) > +{ Ditto with the sysfs_create_group. References: <4521BFD3.5876.4E47C073@rkuchimanchi.silverstorm.com> Message-ID: <452181A0.20600@pathscale.com> Ramachandra K wrote: > > +#define PRINT(level, x, fmt, arg...) \ > + printk(level "%s: %s: %s, line %d: " fmt, \ > + MODULE_NAME, x, __FILE__, __LINE__, ##arg) Use dev_info and friends instead of printk. > +#define hton8(x) (x) > +#define hton16(x) __cpu_to_be16(x) > +#define hton32(x) __cpu_to_be32(x) > +#define hton64(x) __cpu_to_be64(x) Drop these macros. > +#define get_sksport(sk) inet_sk(sk)->sport > +#define get_skdport(sk) inet_sk(sk)->dport And these. > +typedef unsigned long uintn; /* __WORDSIZE/pointer sized integer */ And this typedef. > +/* round down value to align, align must be a power of 2 */ > +#ifndef ROUNDDOWNP2 > +#define ROUNDDOWNP2(val, align) \ > + (((uintn)(val)) & (~((uintn)(align)-1))) > +#endif Perhaps introduce a generic ALIGN_DOWN macro here. > +/* round up value to align, align must be a power of 2 */ > +#ifndef ROUNDUPP2 > +#define ROUNDUPP2(val, align) \ > + (((uintn)(val) + (uintn)(align) - 1) & (~((uintn)(align)-1))) > +#endif Use ALGIN instead of this macro. > +#define BOOLEAN u8 > +#define TRUE 1 > +#define FALSE 0 Yeuch. These have to go. > +#define MAXU32 0xffffffff > +#define MAXU64 ((u64)(~0ULL)) Drop these. > +#if BITS_PER_LONG == 64 > +#define PTR64(what) ((u64)(what)) > +#define PTR(what) ((void *)(u64)(what)) > +#elif BITS_PER_LONG == 32 > +#define PTR64(what) ((u64)(u32)(what)) > +#define PTR(what) ((void *)(u32)(what)) And these. > +#if BITS_PER_LONG == 64 > +#ifdef __ia64__ > +#define __PRI64_PREFIX "l" > +#else > +#define __PRI64_PREFIX "ll" > +#endif > +#define PRISZT "lu" > +#elif BITS_PER_LONG == 32 > +#define __PRI64_PREFIX "L" > +#define PRISZT "u" > +#else > +#error "BITS_PER_LONG not 64 nor 32" > +#endif > +#define __PRIN_PREFIX "l" Just cast 64-bit values to unsigned long long, use %lld etc everywhere, and drop all of this. > +/* source time is 100ths of a sec */ > +#define CONV2JIFFIES(time) (((time) * HZ) / 100) > +#define CONV2USEC(time) ((time) * 10000) > + > +#ifndef min > +#define min(a,b) ((a)<(b)?(a):(b)) > +#endif Use the standard macros for these. (Peter Hartman's message of "Mon, 2 Oct 2006 14:02:20 -0700") References: <4D97B70CF7F72144881F66DFF4BD7A12C5ABB6@fmsmsx413.amr.corp.intel.com> Message-ID: > Adding: > Options ib_mthca tune_pci=1 > > Puts MaxReadReq = 4096. > > I get 1250MB/s bandwidth. Is that good? I lost track from the beginning of the thread. I would suggest working with your platform people to figure out why the BIOS is setting the PCI Express parameters to non-optimal values. - R. From robert.j.woodruff at intel.com Mon Oct 2 14:42:05 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 2 Oct 2006 14:42:05 -0700 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA Message-ID: Roland wrote, >Is that good? I lost track from the beginning of the thread. >I would suggest working with your platform people to figure out why >the BIOS is setting the PCI Express parameters to non-optimal values. > - R. Yes. 1250Mbytes/sec is what we expect. You say the 128 value comes from the BIOS ? If so, we need to discuss this with our BIOS team to find out why they limit it to 128, perhaps it is a BIOS bug. woody From rdreier at cisco.com Mon Oct 2 14:43:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 14:43:55 -0700 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA In-Reply-To: (Robert J. Woodruff's message of "Mon, 2 Oct 2006 14:42:05 -0700") References: Message-ID: Robert> Yes. 1250Mbytes/sec is what we expect. You say the 128 Robert> value comes from the BIOS ? If so, we need to discuss this Robert> with our BIOS team to find out why they limit it to 128, Robert> perhaps it is a BIOS bug. Yes, I believe that the BIOS is the only place that would set that value. We know that resetting the device makes it go back to a different default value, and nothing in the kernel that I know of is going to set it down to 128. - R. From trimmer at silverstorm.com Mon Oct 2 14:46:12 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 2 Oct 2006 17:46:12 -0400 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: Message-ID: > From: Scott Weitzenkamp (sweitzen) > Sent: Monday, October 02, 2006 4:22 PM > To: Kuchimanchi, Ramachandra; Roland Dreier (rdreier) > Cc: openib-General > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm > Virtual Ethernet I/O controller (VEx) > > Is this communication protocols documented anywhere? How does this > feature compare to IPoIB and SDP? > This protocol is distinct from IPoIB and SDP. In brief: IPoIB treats an IB fabric as a LAN. As such it has UD semantics. SDP essentially treats the HCA as a TOE and leverages IB's RC semantics to emulate TCP/IP SOCK_STREAM sockets. This protocol implements the interface to communicate to the SilverStorm VEx Ethernet Virtual IO Controllers. The VEx card presents a true Ethernet NIC to the host and essentially treats IB as an IO bus to allow a host CPU to use the VEx card as its NIC. Todd Rimmer From rdreier at cisco.com Mon Oct 2 14:50:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 14:50:48 -0700 Subject: [openib-general] [PATCH 2.6.19-rc1 2/2] ehca: improved ehca debug format In-Reply-To: <200610022233.50497.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's message of "Mon, 2 Oct 2006 22:33:50 +0200") References: <200610022233.50497.hnguyen@de.ibm.com> Message-ID: Thanks, applied both patches. From rdreier at cisco.com Mon Oct 2 14:57:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 14:57:01 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus We're through the bulk of our 2.6.19 merge, but this will get some fixes for drivers and the RDMA CM: Hoang-Nam Nguyen: IB/ehca: Fix device registration IB/ehca: Tweak trace message format Krishna Kumar: RDMA/cma: Fix leak of cm_ids in case of failures RDMA/cma: Fix device removal race RDMA/cma: Eliminate unnecessary remove_list RDMA/cma: Optimize error handling Ralph Campbell: IB/ipath: Fix RDMA reads Sean Hefty: RDMA/cma: Set status correctly on route resolution error drivers/infiniband/core/cma.c | 47 +++++++++++++++---------- drivers/infiniband/hw/ehca/ehca_main.c | 36 ++++++++++--------- drivers/infiniband/hw/ehca/ehca_tools.h | 2 + drivers/infiniband/hw/ipath/ipath_rc.c | 59 +++++++++++++++++-------------- 4 files changed, 80 insertions(+), 64 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 1178bd4..9ae4f3a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -874,23 +874,25 @@ static struct rdma_id_private *cma_new_i __u16 port; u8 ip_ver; + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; + id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps); if (IS_ERR(id)) - return NULL; + goto err; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); rt = &id->route; rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; - rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); + rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, + GFP_KERNEL); if (!rt->path_rec) - goto err; + goto destroy_id; - if (cma_get_net_info(ib_event->private_data, listen_id->ps, - &ip_ver, &port, &src, &dst)) - goto err; - - cma_save_net_info(&id->route.addr, &listen_id->route.addr, - ip_ver, port, src, dst); rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (rt->num_paths == 2) rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; @@ -903,8 +905,10 @@ static struct rdma_id_private *cma_new_i id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; return id_priv; -err: + +destroy_id: rdma_destroy_id(id); +err: return NULL; } @@ -932,6 +936,7 @@ static int cma_req_handler(struct ib_cm_ mutex_unlock(&lock); if (ret) { ret = -ENODEV; + cma_exch(conn_id, CMA_DESTROYING); cma_release_remove(conn_id); rdma_destroy_id(&conn_id->id); goto out; @@ -1307,6 +1312,7 @@ static void cma_query_handler(int status work->old_state = CMA_ROUTE_QUERY; work->new_state = CMA_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_ERROR; + work->event.status = status; } queue_work(cma_wq, &work->work); @@ -1862,6 +1868,11 @@ static int cma_connect_ib(struct rdma_id ret = ib_send_cm_req(id_priv->cm_id.ib, &req); out: + if (ret && !IS_ERR(id_priv->cm_id.ib)) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } + kfree(private_data); return ret; } @@ -1889,10 +1900,8 @@ static int cma_connect_iw(struct rdma_id cm_id->remote_addr = *sin; ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) { - iw_destroy_cm_id(cm_id); - return ret; - } + if (ret) + goto out; iw_param.ord = conn_param->initiator_depth; iw_param.ird = conn_param->responder_resources; @@ -1904,6 +1913,10 @@ static int cma_connect_iw(struct rdma_id iw_param.qpn = conn_param->qp_num; ret = iw_cm_connect(cm_id, &iw_param); out: + if (ret && !IS_ERR(cm_id)) { + iw_destroy_cm_id(cm_id); + id_priv->cm_id.iw = NULL; + } return ret; } @@ -2142,12 +2155,9 @@ static int cma_remove_id_dev(struct rdma static void cma_process_remove(struct cma_device *cma_dev) { - struct list_head remove_list; struct rdma_id_private *id_priv; int ret; - INIT_LIST_HEAD(&remove_list); - mutex_lock(&lock); while (!list_empty(&cma_dev->id_list)) { id_priv = list_entry(cma_dev->id_list.next, @@ -2158,8 +2168,7 @@ static void cma_process_remove(struct cm continue; } - list_del(&id_priv->list); - list_add_tail(&id_priv->list, &remove_list); + list_del_init(&id_priv->list); atomic_inc(&id_priv->refcount); mutex_unlock(&lock); diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 2380994..024d511 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -49,7 +49,7 @@ #include "hcp_if.h" MODULE_LICENSE("Dual BSD/GPL"); MODULE_AUTHOR("Christoph Raisch "); MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); -MODULE_VERSION("SVNEHCA_0016"); +MODULE_VERSION("SVNEHCA_0017"); int ehca_open_aqp1 = 0; int ehca_debug_level = 0; @@ -239,7 +239,7 @@ init_node_guid1: return ret; } -int ehca_register_device(struct ehca_shca *shca) +int ehca_init_device(struct ehca_shca *shca) { int ret; @@ -317,11 +317,6 @@ int ehca_register_device(struct ehca_shc /* shca->ib_device.process_mad = ehca_process_mad; */ shca->ib_device.mmap = ehca_mmap; - ret = ib_register_device(&shca->ib_device); - if (ret) - ehca_err(&shca->ib_device, - "ib_register_device() failed ret=%x", ret); - return ret; } @@ -561,9 +556,9 @@ static int __devinit ehca_probe(struct i goto probe1; } - ret = ehca_register_device(shca); + ret = ehca_init_device(shca); if (ret) { - ehca_gen_err("Cannot register Infiniband device"); + ehca_gen_err("Cannot init ehca device struct"); goto probe1; } @@ -571,7 +566,7 @@ static int __devinit ehca_probe(struct i ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); - goto probe2; + goto probe1; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); @@ -600,6 +595,13 @@ static int __devinit ehca_probe(struct i goto probe5; } + ret = ib_register_device(&shca->ib_device); + if (ret) { + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); + goto probe6; + } + /* create AQP1 for port 1 */ if (ehca_open_aqp1 == 1) { shca->sport[0].port_state = IB_PORT_DOWN; @@ -607,7 +609,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 1."); - goto probe6; + goto probe7; } } @@ -618,7 +620,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 2."); - goto probe7; + goto probe8; } } @@ -630,12 +632,15 @@ static int __devinit ehca_probe(struct i return 0; -probe7: +probe8: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) ehca_err(&shca->ib_device, "Cannot destroy AQP1 for port 1. ret=%x", ret); +probe7: + ib_unregister_device(&shca->ib_device); + probe6: ret = ehca_dereg_internal_maxmr(shca); if (ret) @@ -660,9 +665,6 @@ probe3: ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); -probe2: - ib_unregister_device(&shca->ib_device); - probe1: ib_dealloc_device(&shca->ib_device); @@ -750,7 +752,7 @@ int __init ehca_module_init(void) int ret; printk(KERN_INFO "eHCA Infiniband Device Driver " - "(Rel.: SVNEHCA_0016)\n"); + "(Rel.: SVNEHCA_0017)\n"); idr_init(&ehca_qp_idr); idr_init(&ehca_cq_idr); spin_lock_init(&ehca_qp_idr_lock); diff --git a/drivers/infiniband/hw/ehca/ehca_tools.h b/drivers/infiniband/hw/ehca/ehca_tools.h index 9f56bb8..809da3e 100644 --- a/drivers/infiniband/hw/ehca/ehca_tools.h +++ b/drivers/infiniband/hw/ehca/ehca_tools.h @@ -117,7 +117,7 @@ #define ehca_dmp(adr, len, format, args. unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - printk("EHCA_DMP:%s" format \ + printk("EHCA_DMP:%s " format \ " adr=%p ofs=%04x %016lx %016lx\n", \ __FUNCTION__, ##args, deb, x, \ *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index a504cf6..ce60387 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -241,10 +241,7 @@ int ipath_make_rc_req(struct ipath_qp *q * original work request since we may need to resend * it. */ - qp->s_sge.sge = wqe->sg_list[0]; - qp->s_sge.sg_list = wqe->sg_list + 1; - qp->s_sge.num_sge = wqe->wr.num_sge; - qp->s_len = len = wqe->length; + len = wqe->length; ss = &qp->s_sge; bth2 = 0; switch (wqe->wr.opcode) { @@ -368,14 +365,23 @@ int ipath_make_rc_req(struct ipath_qp *q default: goto done; } + qp->s_sge.sge = wqe->sg_list[0]; + qp->s_sge.sg_list = wqe->sg_list + 1; + qp->s_sge.num_sge = wqe->wr.num_sge; + qp->s_len = wqe->length; if (newreq) { qp->s_tail++; if (qp->s_tail >= qp->s_size) qp->s_tail = 0; } - bth2 |= qp->s_psn++ & IPATH_PSN_MASK; - if ((int)(qp->s_psn - qp->s_next_psn) > 0) - qp->s_next_psn = qp->s_psn; + bth2 |= qp->s_psn & IPATH_PSN_MASK; + if (wqe->wr.opcode == IB_WR_RDMA_READ) + qp->s_psn = wqe->lpsn + 1; + else { + qp->s_psn++; + if ((int)(qp->s_psn - qp->s_next_psn) > 0) + qp->s_next_psn = qp->s_psn; + } /* * Put the QP on the pending list so lost ACKs will cause * a retry. More than one request can be pending so the @@ -690,13 +696,6 @@ void ipath_restart_rc(struct ipath_qp *q struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last); struct ipath_ibdev *dev; - /* - * If there are no requests pending, we are done. - */ - if (ipath_cmp24(psn, qp->s_next_psn) >= 0 || - qp->s_last == qp->s_tail) - goto done; - if (qp->s_retry == 0) { wc->wr_id = wqe->wr.wr_id; wc->status = IB_WC_RETRY_EXC_ERR; @@ -731,8 +730,6 @@ void ipath_restart_rc(struct ipath_qp *q dev->n_rc_resends += (int)qp->s_psn - (int)psn; reset_psn(qp, psn); - -done: tasklet_hi_schedule(&qp->s_task); bail: @@ -765,6 +762,7 @@ static int do_rc_ack(struct ipath_qp *qp struct ib_wc wc; struct ipath_swqe *wqe; int ret = 0; + u32 ack_psn; /* * Remove the QP from the timeout queue (or RNR timeout queue). @@ -777,26 +775,26 @@ static int do_rc_ack(struct ipath_qp *qp list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); + /* Nothing is pending to ACK/NAK. */ + if (unlikely(qp->s_last == qp->s_tail)) + goto bail; + /* * Note that NAKs implicitly ACK outstanding SEND and RDMA write * requests and implicitly NAK RDMA read and atomic requests issued * before the NAK'ed request. The MSN won't include the NAK'ed * request but will include an ACK'ed request(s). */ + ack_psn = psn; + if (aeth >> 29) + ack_psn--; wqe = get_swqe_ptr(qp, qp->s_last); - /* Nothing is pending to ACK/NAK. */ - if (qp->s_last == qp->s_tail) - goto bail; - /* * The MSN might be for a later WQE than the PSN indicates so * only complete WQEs that the PSN finishes. */ - while (ipath_cmp24(psn, wqe->lpsn) >= 0) { - /* If we are ACKing a WQE, the MSN should be >= the SSN. */ - if (ipath_cmp24(aeth, wqe->ssn) < 0) - break; + while (ipath_cmp24(ack_psn, wqe->lpsn) >= 0) { /* * If this request is a RDMA read or atomic, and the ACK is * for a later operation, this ACK NAKs the RDMA read or @@ -807,7 +805,8 @@ static int do_rc_ack(struct ipath_qp *qp * is sent but before the response is received. */ if ((wqe->wr.opcode == IB_WR_RDMA_READ && - opcode != OP(RDMA_READ_RESPONSE_LAST)) || + (opcode != OP(RDMA_READ_RESPONSE_LAST) || + ipath_cmp24(ack_psn, wqe->lpsn) != 0)) || ((wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) && (opcode != OP(ATOMIC_ACKNOWLEDGE) || @@ -825,6 +824,10 @@ static int do_rc_ack(struct ipath_qp *qp */ goto bail; } + if (wqe->wr.opcode == IB_WR_RDMA_READ || + wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP || + wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) + tasklet_hi_schedule(&qp->s_task); /* Post a send completion queue entry if requested. */ if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &qp->s_flags) || (wqe->wr.send_flags & IB_SEND_SIGNALED)) { @@ -1055,7 +1058,8 @@ static inline void ipath_rc_rcv_resp(str /* no AETH, no ACK */ if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) { dev->n_rdma_seq++; - ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); + if (qp->s_last != qp->s_tail) + ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); goto ack_done; } rdma_read: @@ -1091,7 +1095,8 @@ static inline void ipath_rc_rcv_resp(str /* ACKs READ req. */ if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) { dev->n_rdma_seq++; - ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); + if (qp->s_last != qp->s_tail) + ipath_restart_rc(qp, qp->s_last_psn + 1, &wc); goto ack_done; } /* FALLTHROUGH */ From rdreier at cisco.com Mon Oct 2 15:24:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 15:24:49 -0700 Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel In-Reply-To: <20061001111553.GJ1796@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 1 Oct 2006 13:15:53 +0200") References: <20061001111553.GJ1796@mellanox.co.il> Message-ID: Michael> We could just let the user specify the Id Ext when adding Michael> the device. How does this sound? Yes, that makes the most sense -- just add another optional option for use when adding a target. - R. From swise at opengridcomputing.com Mon Oct 2 15:43:23 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 02 Oct 2006 17:43:23 -0500 Subject: [openib-general] rdma_cm branch Message-ID: <1159829003.31507.110.camel@stevo-desktop> Hey Roland/Sean, What is the status of the rdma_cm branch in Roland's infiniband.git tree? It doesn't have the iwarp stuff in it. I'm wondering if it can be merged with the 2.6.19 stuff to create a branch that was iwarp + ucma support? Or is that a dumb idea? Steve. From rdreier at cisco.com Mon Oct 2 15:46:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 15:46:09 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <1159829003.31507.110.camel@stevo-desktop> (Steve Wise's message of "Mon, 02 Oct 2006 17:43:23 -0500") References: <1159829003.31507.110.camel@stevo-desktop> Message-ID: Steve> Hey Roland/Sean, What is the status of the rdma_cm branch Steve> in Roland's infiniband.git tree? It doesn't have the iwarp Steve> stuff in it. I'm wondering if it can be merged with the Steve> 2.6.19 stuff to create a branch that was iwarp + ucma Steve> support? Or is that a dumb idea? I'm waiting for a ucma patch from Sean to fix things up. What's there doesn't even build... - R. From hnguyen at de.ibm.com Mon Oct 2 15:50:02 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Tue, 3 Oct 2006 00:50:02 +0200 Subject: [openib-general] [PATCH ofed-1.1 1/2] ehca: fix ehca device registration Message-ID: <200610030050.03102.hnguyen@de.ibm.com> Hi Michael! Please consider this patch of ehca for ofed-1.1 as it fixes a bug (crash) that occured when ib_ehca is loaded after ib_ipoib. This patch initializes struct ehca_shca with struct device*, then creates internal resources and finally registers the ehca IB device. And that is the proper sequence we have to implement. I wanted to create this patch against the ofed git tree branch ehca_branch, but saw that ehca_main.c has version SVNEHCA_0012, which is much older than the version SVNEHCA_0015 in ofed-1.1-rc6. Tried to do a pull and git said that it's already updated. Thus I don't know what I did wrong. Anyway I created this patch against the dir openib-1.1 extracted from ofed-1.1-rc6/SOURCES/openib-1.1.tgz. Hope that it still works for you. Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_main.c | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff -Nurp openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_main.c openib-1.1_work/drivers/infiniband/hw/ehca/ehca_main.c --- openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_main.c 2006-09-20 06:28:56.000000000 -0700 +++ openib-1.1_work/drivers/infiniband/hw/ehca/ehca_main.c 2006-10-02 15:24:48.010001888 -0700 @@ -5,6 +5,7 @@ * * Authors: Heiko J Schick * Hoang-Nam Nguyen + * Joachim Fenkes * * Copyright (c) 2005 IBM Corporation * @@ -238,7 +239,7 @@ init_node_guid1: return ret; } -int ehca_register_device(struct ehca_shca *shca) +int ehca_init_device(struct ehca_shca *shca) { int ret; @@ -316,11 +317,6 @@ int ehca_register_device(struct ehca_shc /* shca->ib_device.process_mad = ehca_process_mad; */ shca->ib_device.mmap = ehca_mmap; - ret = ib_register_device(&shca->ib_device); - if (ret) - ehca_err(&shca->ib_device, - "ib_register_device() failed ret=%x", ret); - return ret; } @@ -446,7 +442,7 @@ static ssize_t ehca_show_##name(struct kfree(rblock); \ return 0; \ } \ - \ + \ data = rblock->name; \ kfree(rblock); \ \ @@ -560,9 +556,9 @@ static int __devinit ehca_probe(struct i goto probe1; } - ret = ehca_register_device(shca); + ret = ehca_init_device(shca); if (ret) { - ehca_gen_err("Cannot register Infiniband device"); + ehca_gen_err("Cannot init ehca device struct"); goto probe1; } @@ -570,7 +566,7 @@ static int __devinit ehca_probe(struct i ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); - goto probe2; + goto probe1; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); @@ -599,6 +595,13 @@ static int __devinit ehca_probe(struct i goto probe5; } + ret = ib_register_device(&shca->ib_device); + if (ret) { + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); + goto probe6; + } + /* create AQP1 for port 1 */ if (ehca_open_aqp1 == 1) { shca->sport[0].port_state = IB_PORT_DOWN; @@ -606,7 +609,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 1."); - goto probe6; + goto probe7; } } @@ -617,7 +620,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 2."); - goto probe7; + goto probe8; } } @@ -629,12 +632,15 @@ static int __devinit ehca_probe(struct i return 0; -probe7: +probe8: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) ehca_err(&shca->ib_device, "Cannot destroy AQP1 for port 1. ret=%x", ret); +probe7: + ib_unregister_device(&shca->ib_device); + probe6: ret = ehca_dereg_internal_maxmr(shca); if (ret) @@ -659,9 +665,6 @@ probe3: ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); -probe2: - ib_unregister_device(&shca->ib_device); - probe1: ib_dealloc_device(&shca->ib_device); -------------- next part -------------- diff -Nurp openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_main.c openib-1.1_work/drivers/infiniband/hw/ehca/ehca_main.c --- openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_main.c 2006-09-20 06:28:56.000000000 -0700 +++ openib-1.1_work/drivers/infiniband/hw/ehca/ehca_main.c 2006-10-02 15:24:48.010001888 -0700 @@ -5,6 +5,7 @@ * * Authors: Heiko J Schick * Hoang-Nam Nguyen + * Joachim Fenkes * * Copyright (c) 2005 IBM Corporation * @@ -238,7 +239,7 @@ init_node_guid1: return ret; } -int ehca_register_device(struct ehca_shca *shca) +int ehca_init_device(struct ehca_shca *shca) { int ret; @@ -316,11 +317,6 @@ int ehca_register_device(struct ehca_shc /* shca->ib_device.process_mad = ehca_process_mad; */ shca->ib_device.mmap = ehca_mmap; - ret = ib_register_device(&shca->ib_device); - if (ret) - ehca_err(&shca->ib_device, - "ib_register_device() failed ret=%x", ret); - return ret; } @@ -446,7 +442,7 @@ static ssize_t ehca_show_##name(struct kfree(rblock); \ return 0; \ } \ - \ + \ data = rblock->name; \ kfree(rblock); \ \ @@ -560,9 +556,9 @@ static int __devinit ehca_probe(struct i goto probe1; } - ret = ehca_register_device(shca); + ret = ehca_init_device(shca); if (ret) { - ehca_gen_err("Cannot register Infiniband device"); + ehca_gen_err("Cannot init ehca device struct"); goto probe1; } @@ -570,7 +566,7 @@ static int __devinit ehca_probe(struct i ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); if (ret) { ehca_err(&shca->ib_device, "Cannot create EQ."); - goto probe2; + goto probe1; } ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); @@ -599,6 +595,13 @@ static int __devinit ehca_probe(struct i goto probe5; } + ret = ib_register_device(&shca->ib_device); + if (ret) { + ehca_err(&shca->ib_device, + "ib_register_device() failed ret=%x", ret); + goto probe6; + } + /* create AQP1 for port 1 */ if (ehca_open_aqp1 == 1) { shca->sport[0].port_state = IB_PORT_DOWN; @@ -606,7 +609,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 1."); - goto probe6; + goto probe7; } } @@ -617,7 +620,7 @@ static int __devinit ehca_probe(struct i if (ret) { ehca_err(&shca->ib_device, "Cannot create AQP1 for port 2."); - goto probe7; + goto probe8; } } @@ -629,12 +632,15 @@ static int __devinit ehca_probe(struct i return 0; -probe7: +probe8: ret = ehca_destroy_aqp1(&shca->sport[0]); if (ret) ehca_err(&shca->ib_device, "Cannot destroy AQP1 for port 1. ret=%x", ret); +probe7: + ib_unregister_device(&shca->ib_device); + probe6: ret = ehca_dereg_internal_maxmr(shca); if (ret) @@ -659,9 +665,6 @@ probe3: ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", ret); -probe2: - ib_unregister_device(&shca->ib_device); - probe1: ib_dealloc_device(&shca->ib_device); From hnguyen at de.ibm.com Mon Oct 2 15:50:05 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Tue, 3 Oct 2006 00:50:05 +0200 Subject: [openib-general] [PATCH ofed-1.1 2/2] ehca: improved ehca debug format Message-ID: <200610030050.05816.hnguyen@de.ibm.com> Michael, here is the 2nd patch of ehca with a small format improvement in ehca debug function. It would be great if we could include it for ofed-1.1. Note that I created this patch against the dir openib-1.1 extracted from ofed-1.1-rc6/SOURCES/openib-1.1.tgz. Thanks! Nam Nguyen Signed-off-by: Hoang-Nam Nguyen --- ehca_tools.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -Nurp openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_tools.h openib-1.1_work/drivers/infiniband/hw/ehca/ehca_tools.h --- openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_tools.h 2006-09-20 06:28:56.000000000 -0700 +++ openib-1.1_work/drivers/infiniband/hw/ehca/ehca_tools.h 2006-10-02 09:29:53.000000000 -0700 @@ -117,7 +117,7 @@ extern int ehca_debug_level; unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - printk("EHCA_DMP:%s" format \ + printk("EHCA_DMP:%s " format \ " adr=%p ofs=%04x %016lx %016lx\n", \ __FUNCTION__, ##args, deb, x, \ *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ -------------- next part -------------- diff -Nurp openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_tools.h openib-1.1_work/drivers/infiniband/hw/ehca/ehca_tools.h --- openib-1.1_orig/drivers/infiniband/hw/ehca/ehca_tools.h 2006-09-20 06:28:56.000000000 -0700 +++ openib-1.1_work/drivers/infiniband/hw/ehca/ehca_tools.h 2006-10-02 09:29:53.000000000 -0700 @@ -117,7 +117,7 @@ extern int ehca_debug_level; unsigned int l = (unsigned int)(len); \ unsigned char *deb = (unsigned char*)(adr); \ for (x = 0; x < l; x += 16) { \ - printk("EHCA_DMP:%s" format \ + printk("EHCA_DMP:%s " format \ " adr=%p ofs=%04x %016lx %016lx\n", \ __FUNCTION__, ##args, deb, x, \ *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ From ardavis at ichips.intel.com Mon Oct 2 16:16:38 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 02 Oct 2006 16:16:38 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: References: Message-ID: <45219DD6.8030602@ichips.intel.com> Woodruff, Robert J wrote: >Aviram wrote, > > >>Pending that IPoIB HA is solved would like to issue RC7 that suppose to >> >> >>be final. Is everyone OK with this approach? >> >> >>Aviram >> >> > >Sounds good, > >What is the target date for RC7 ? > > Do we have a new target date? From mshefty at ichips.intel.com Mon Oct 2 16:17:16 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 02 Oct 2006 16:17:16 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <1159829003.31507.110.camel@stevo-desktop> References: <1159829003.31507.110.camel@stevo-desktop> Message-ID: <45219DFC.1010306@ichips.intel.com> Steve Wise wrote: > What is the status of the rdma_cm branch in Roland's infiniband.git > tree? It doesn't have the iwarp stuff in it. I'm wondering if it can > be merged with the 2.6.19 stuff to create a branch that was iwarp + ucma > support? Or is that a dumb idea? I'm currently working on moving the rdma_cm code that's in svn forward to what's upstream. (I was just typing a message on this...) My plan is to ask Roland to host one, maybe two, branches in the infiniband.git tree. Here are the main pieces missing from the kernel: 1. We need to add rdma_establish() and expose the rdma_conn_param values as part of the connection event. I'm working on a patch for the latter. 2. We need a ucma branch. To merge upstream, it makes sense to include item 1 first, but this leads to a conflict with the OFED releases. OFED ABI version 1 includes RC QP support, but without item 1 changes, and SVN ABI version 2 includes multicast support. 3. There's been requests for an rdma_cm branch that includes UD QP / multicast. The cleanest solution from an ABI perspective is to merge multicast support before the ucma; however, I'm not sure that makes the most sense for merging upstream. Thoughts? - Sean From halr at voltaire.com Mon Oct 2 16:31:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Oct 2006 19:31:45 -0400 Subject: [openib-general] Kernel Oops in user-mad, mad In-Reply-To: <200610011153.16702.jackm@dev.mellanox.co.il> References: <200610011153.16702.jackm@dev.mellanox.co.il> Message-ID: <1159831904.18903.4855.camel@hal.voltaire.com> On Sun, 2006-10-01 at 05:53, Jack Morgenstein wrote: > We received the following kernel Oops while running regression > (see console picture attached). > > This looks like a possible race condition between handling umad send completions > and ib_unregister_mad_agent. > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler. > > Is there a possibility that there is a double deletion from a list somewhere? Perhaps but I don't see it. Sean ? Roland ? -- Hal > Jack From mshefty at ichips.intel.com Mon Oct 2 16:34:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 02 Oct 2006 16:34:33 -0700 Subject: [openib-general] Kernel Oops in user-mad, mad In-Reply-To: <1159831904.18903.4855.camel@hal.voltaire.com> References: <200610011153.16702.jackm@dev.mellanox.co.il> <1159831904.18903.4855.camel@hal.voltaire.com> Message-ID: <4521A209.1000105@ichips.intel.com> Hal Rosenstock wrote: >> Is there a possibility that there is a double deletion from a list >> somewhere? > > > Perhaps but I don't see it. Sean ? Roland ? I looked at this and couldn't find anything obviously wrong. I was waiting to hear back to Michael's question about module unload being involved. - Sean From rjwalsh at pathscale.com Mon Oct 2 17:10:51 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Mon, 02 Oct 2006 17:10:51 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <45218737.7080901@pathscale.com> References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> Message-ID: <4521AA8B.80702@pathscale.com> The attached patch fixes this problem by deferring creation of our diagpkt device until at least one piece of hardware has been found. Michael: this will fix the OFED testing problem you were seeing. Roland: please queue for 2.6.19. Regards, Robert. From rjwalsh at pathscale.com Mon Oct 2 17:17:53 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Mon, 02 Oct 2006 17:17:53 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <4521AA8B.80702@pathscale.com> References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> Message-ID: <4521AC31.3020200@pathscale.com> Robert Walsh wrote: > The attached patch fixes this problem by deferring creation of our > diagpkt device until at least one piece of hardware has been found. Of course, if I'd actually attached the patch, it might have been a bit more useful :-) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: diagpkt-init-fixup.patch URL: From ebiederm at xmission.com Mon Oct 2 18:22:31 2006 From: ebiederm at xmission.com (ebiederm at xmission.com) Date: Mon, 02 Oct 2006 19:22:31 -0600 Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19 In-Reply-To: <452156F4.4050004@pathscale.com> (Bryan O'Sullivan's message of "Mon, 02 Oct 2006 11:14:12 -0700") References: <452156F4.4050004@pathscale.com> Message-ID: "Bryan O'Sullivan" writes: > Eric W. Biederman wrote: > >> Have you tested your driver against the -mm tree? > > No. > >> To the best of my knowledge the irq handling of your hypertransport card >> is a complete and total hack that works only by chance. > > And a happy Monday morning to you, too :-) :) >> In the -mm tree I have added a first pass at proper support for the >> hypertranport interrupt capability. As this code is slated to go into >> 2.6.19 could you please test against that? > > I'm on vacation for a few weeks. We'll find someone to do it. Sure. I talked to Dave Olson about this a while ago, and I couldn't get anything happening. Eric From sweitzen at cisco.com Mon Oct 2 19:14:08 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 2 Oct 2006 19:14:08 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: Aviram, can I try Mellanox binary RPMs? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Scott Weitzenkamp (sweitzen) > Sent: Sunday, October 01, 2006 9:31 PM > To: 'Aviram Gutman'; Scott Weitzenkamp (sweitzen) > Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > Subject: RE: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > $ uname -a > Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 > 18:25:39 UTC 2006 x86_64 > x86_64 x86_64 GNU/Linux > $ > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh > -np 2 192.168.2.46 192.168.2.49 hostname > svbu-qa1850-4 > svbu-qa1850-3 > $ > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh > -np 2 192.168.2.46 192.168.2.49 > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench > marks-2.2/osu_latency > > The last command just hangs. Can I try your binary RPMs? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > > Sent: Sunday, October 01, 2006 2:29 AM > > To: Scott Weitzenkamp (sweitzen) > > Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > > OFED 1.1 rc6 with SLES10 x86_64 > > > > Can you please elaborate on MVAPICH issues, can you send > > command line? > > We ran it here on 32 Opteron nodes each quad core and also rigorous > > tests on the many other nodes? > > > > > > > > Scott Weitzenkamp (sweitzen) wrote: > > > We are just getting started with OFED testing on SLES10, first > > > platform is x86_64. > > > > > > IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > > working so far. > > > MVAPICH with OSU benchmarks just hang. This same > hardware works > > > fine with OFED and RHEL4 U3. > > > > > > Has anyone else seen this? > > > > > > Scott Weitzenkamp > > > SQA and Release Manager > > > Server Virtualization Business Unit > > > Cisco Systems > > > > > > > > -------------------------------------------------------------- > > ---------- > > > > > > _______________________________________________ > > > openfabrics-ewg mailing list > > > openfabrics-ewg at openib.org > > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > > > From mlakshmanan at silverstorm.com Mon Oct 2 19:43:18 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Mon, 2 Oct 2006 22:43:18 -0400 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: Message-ID: Ramachandra> In that case, can you please consider this for the Ramachandra> for-2.6.20 branch ? > I'm happy to keep this in a vex branch or something like that, but as > the emails I just sent show, this is not ready for merging yet (which > is to be expected -- it's never been reviewed). > Thanks. That's pretty much what we are expecting at this early stage. I fully agree that it is not ready for merging yet. We'll work on the items pointed out by the various IB reviewers and then take it from there. It is premature at this point to discuss when, where and how to merge. Madhu From rdreier at cisco.com Mon Oct 2 22:21:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 02 Oct 2006 22:21:01 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <4521AC31.3020200@pathscale.com> (Robert Walsh's message of "Mon, 02 Oct 2006 17:17:53 -0700") References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> Message-ID: > modprobe would go into the D state and stay there. Why? What was the process stuck sleeping on? > From: Robert Walsh I assume this is supposed to be Signed-off-by: ? > +void ipath_diagpkt_add(void) > +{ > + if (diagpkt_count == 0) > + ipath_cdev_init(IPATH_DIAGPKT_MINOR, > + "ipath_diagpkt", &diagpkt_file_ops, > + &diagpkt_cdev, &diagpkt_class_dev); > + > + diagpkt_count++; > +} This seems dangerous, especially now that we have PCI_MULTITHREAD_PROBE: nothing prevents ipath_cdev_init() from being called twice. Better to use something like test_and_set_bit() to make sure this is done exactly once. - R. From ogerlitz at voltaire.com Mon Oct 2 22:58:50 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 03 Oct 2006 07:58:50 +0200 Subject: [openib-general] RHEL 4 U3 - lost completions In-Reply-To: <45215BC4.C8BB5E22@austin.rr.com> References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> Message-ID: <4521FC1A.6000603@voltaire.com> Bill Hartner wrote: > > Roland Dreier wrote: >> Bill> At 1st, I thought that was the case, a fork, however, I do >> Bill> not think get_user_pages(), and the increment of the ref >> Bill> count, will guarantee the page struct does not change for >> Bill> RHEL 4 U3, I need to verify that though. >> >> Are you doing a fork()? If so then, yes, you will not be able to make >> your app work on a RHEL4 kernel. After get_user_pages(), if you do a >> fork() then a copy-on-write will still happen, which will cause the >> physical page to move as you have discovered. > > There is no fork that I am aware of in the code. The pthread that > created the EVD and any other thread in the process that executes the > debug code sees the changed page struct. I will try to recreate this in > a test app. Bill - Is there a chance your code uses daemonize()? Roland - If indeed, does it make sense that the problem does not reproduce with single threaded runs? Or. From sweitzen at cisco.com Mon Oct 2 23:24:13 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 2 Oct 2006 23:24:13 -0700 Subject: [openib-general] Problems with OFED IPoIB HA on SLES10 Message-ID: Vlad, I filed a bug for these issues. 1) If I start IPoIB HA with ib0 IB port shut down (from IB switch) and ib1 IB port enabled, then IPoIB does not work because "ip monitor link all" does not report NO-CARRIER at startup like ipoib_ha.pl is looking for. This is a major hole. 2) /etc/init.d/openibd runs ipoib_ha.pl with its stdout and stderr redirected to /dev/null, should we run with -v for verbose instead and redirect log file to /var/log? # fgrep ipoib_ha.pl /etc/init.d/openibd ipoib_ha.pl -p ${PRIMARY_IPOIB_DEV} -s ${SECONDARY_IPOIB_DEV} -- with-arping --with-multicast > /dev/null 2>&1 & 3) I got IPoIB HA working on SLES 10, but the documentation is a little lacking. Looks like I have to put the same IP address in ifcfg-ib0 and ifcfg-ib1, is this correct? # pwd /etc/sysconfig/network # cat ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static IPADDR=192.168.2.46 NETMASK=255.255.255.0 ONBOOT=yes # cat ifcfg-ib1 DEVICE=ib1 BOOTPROTO=static IPADDR=192.168.2.46 NETMASK=255.255.255.0 ONBOOT=yes 4) If I shutdown ib0 IB port, I see this from "/usr/local/ofed/bin/ipoib_ha.pl -v --with-arping --with-multicast" Use of uninitialized value in concatenation (.) or string at /usr/local/ofed/bin/ipoib_ha.pl line 287. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Mon Oct 2 23:27:34 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 03 Oct 2006 08:27:34 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45219DFC.1010306@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> Message-ID: <452202D6.6040605@voltaire.com> Sean Hefty wrote: > Steve Wise wrote: >> What is the status of the rdma_cm branch in Roland's infiniband.git >> tree? It doesn't have the iwarp stuff in it. I'm wondering if it can >> be merged with the 2.6.19 stuff to create a branch that was iwarp + ucma >> support? Or is that a dumb idea? > > I'm currently working on moving the rdma_cm code that's in svn forward to what's > upstream. (I was just typing a message on this...) My plan is to ask Roland to > host one, maybe two, branches in the infiniband.git tree. Here are the main > pieces missing from the kernel: > > 1. We need to add rdma_establish() and expose the rdma_conn_param values as > part of the connection event. I'm working on a patch for the latter. > > 2. We need a ucma branch. To merge upstream, it makes sense to include item 1 > first, but this leads to a conflict with the OFED releases. OFED ABI version 1 > includes RC QP support, but without item 1 changes, and SVN ABI version 2 > includes multicast support. > > 3. There's been requests for an rdma_cm branch that includes UD QP / multicast. > > The cleanest solution from an ABI perspective is to merge multicast support > before the ucma; however, I'm not sure that makes the most sense for merging > upstream. Thoughts? Since the ucma will not make it for the 2.6.19 feature merge window, why not target both the ucma and the cma ud/ud-mulitast support for 2.6.20? This way you would be doing one big ABI change and would not carry this HUGE svn/git diff. As I have mentioned in the other thread, once it would make sense from your schedule to do the patch preparation work, it would be good to push it into the for-2.6.20 branch of Roland's tree from where it can go to the -mm tree so people can start testing it. Once the code is in the for-2.6.20 branch, it would be also possible to include it in OFED 1.2 release which is expected on December this year. Or. From mst at mellanox.co.il Mon Oct 2 23:31:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 08:31:58 +0200 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA In-Reply-To: References: Message-ID: <20061003063158.GB15885@mellanox.co.il> Quoting r. Woodruff, Robert J : > Subject: Drop in performance on Mellanox MT25204 single port DDR HCA > > > Hi Roland/Michael, > > One of my coworkers in Champaign is seeing a performance > issue with the latest SVN driver and the OFED 1.1 Mellanox > driver on certain platforms. > > On the older SVN somewhere around 7500 the Mellanox driver > did not save and restore certain PCI registers before a reset. > Somewhere around SVN 8000 a patch was added to save and > restore these registers. However on our Alcolu platform > this patch causes the MaxReadReq to be set to 128 bytes (rather than > 512) > which limits bandwith to 650MBytes/sec. If I remove the > save/restore of these registers (attached patch), the > bandwidth is back to where we would expect it 1250 Mbytes/sec. > > Is there some problem with this patch or do you think it is > some BIOS issue in the platform ? This is a BIOS issue - it should set the MaxReadReq register for maximum performance and stability. As a work-around, you can use the setpci utility to modify MaxReadReq before loading the driver. Unfortunately, mthca has no way to know which values are legal and which will give the best performance, and previous behaviour was out of spec, reportedly causing stability issues on compliant platforms. I will look into adding this info in release notes. -- MST From mst at mellanox.co.il Mon Oct 2 23:37:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 08:37:52 +0200 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <4521AA8B.80702@pathscale.com> References: <4521AA8B.80702@pathscale.com> Message-ID: <20061003063752.GC15885@mellanox.co.il> Quoting r. Robert Walsh : > Subject: Re: [openfabrics-ewg] OFED Status > > The attached patch fixes this problem by deferring creation of our > diagpkt device until at least one piece of hardware has been found. > > Michael: this will fix the OFED testing problem you were seeing. > > Roland: please queue for 2.6.19. Just saw this, thanks, I'll try. Do you want to update the patch following Roland's comments? -- MST From chevchenkovic at gmail.com Mon Oct 2 23:43:27 2006 From: chevchenkovic at gmail.com (Chevchenkovic Chevchenkovic) Date: Tue, 3 Oct 2006 12:13:27 +0530 Subject: [openib-general] IB multicast Message-ID: <1c16cdf90610022343s22a67f6j276ada731b6f0e7e@mail.gmail.com> Hi, I have a configuration consisting of 6 nodes connected through a single IB switch. I am sending data from a single node to all the remaining 5 nodes using IB multicast. I get a bandwidth of not more than 1.5 - 2 Gbps. I was expecting it to be around 10 Gbps(i.e same as point to point b/w). Bandwidth here is defined as (total sent data from the source)/(time for getting completion acks from all the 5 nodes on receiving source data). 1. What could be the reason? 2. What is the expected bandwidth? regards, -Chev -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Mon Oct 2 23:58:54 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 2 Oct 2006 23:58:54 -0700 (PDT) Subject: [openib-general] [Bug 261] New: can't configure IPoIB pkey interfaces at boot time Message-ID: <20061003065854.8FE972283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=261 Summary: can't configure IPoIB pkey interfaces at boot time Product: OpenFabrics Linux Version: 1.1rc6 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com It would be nice if /etc/init.d/openibd could configure pkey interfaces like ib0.8001, perhaps using config files like ifcfg-ib0.8001. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From glebn at voltaire.com Mon Oct 2 23:56:57 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 3 Oct 2006 08:56:57 +0200 Subject: [openib-general] RHEL 4 U3 - lost completions In-Reply-To: <4521FC1A.6000603@voltaire.com> References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> Message-ID: <20061003065657.GG19813@minantech.com> On Tue, Oct 03, 2006 at 07:58:50AM +0200, Or Gerlitz wrote: > Bill Hartner wrote: > > > > Roland Dreier wrote: > >> Bill> At 1st, I thought that was the case, a fork, however, I do > >> Bill> not think get_user_pages(), and the increment of the ref > >> Bill> count, will guarantee the page struct does not change for > >> Bill> RHEL 4 U3, I need to verify that though. > >> > >> Are you doing a fork()? If so then, yes, you will not be able to make > >> your app work on a RHEL4 kernel. After get_user_pages(), if you do a > >> fork() then a copy-on-write will still happen, which will cause the > >> physical page to move as you have discovered. > > > > There is no fork that I am aware of in the code. The pthread that > > created the EVD and any other thread in the process that executes the > > debug code sees the changed page struct. I will try to recreate this in > > a test app. > AFAIR there is a bug in kernel 2.6.9 that makes it possible for page to be changed in process's VM even though it is locked by get_user_pages(). That is why Mellanox driver used mlock() in addition to get_user_pages(). I think this bug was fixed somewhere around 2.6.11. -- Gleb. From mst at mellanox.co.il Tue Oct 3 00:12:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 09:12:37 +0200 Subject: [openib-general] RHEL 4 U3 - lost completions In-Reply-To: <20061003065657.GG19813@minantech.com> References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> <20061003065657.GG19813@minantech.com> Message-ID: <20061003071237.GC1288@mellanox.co.il> Quoting r. glebn at voltaire.com : > AFAIR there is a bug in kernel 2.6.9 that makes it possible for page to > be changed in process's VM even though it is locked by get_user_pages(). > That is why Mellanox driver used mlock() in addition to > get_user_pages(). I think this bug was fixed somewhere around 2.6.11. I think it got fixed around 2.6.7. RHEL4 U3 has this fix, and AFAIK last SLES9 update has backported that to 2.6.7 too. -- MST From mst at mellanox.co.il Tue Oct 3 00:16:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 09:16:32 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45219DFC.1010306@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> Message-ID: <20061003071632.GE1288@mellanox.co.il> Quoting r. Sean Hefty : > 1. We need to add rdma_establish() and expose the rdma_conn_param values as > part of the connection event. I'm working on a patch for the latter. I have both patches as part of OFED. Should I post them for review? -- MST From mst at mellanox.co.il Tue Oct 3 00:15:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 09:15:25 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45219DFC.1010306@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> Message-ID: <20061003071525.GD1288@mellanox.co.il> Quoting r. Sean Hefty : > The cleanest solution from an ABI perspective is to merge multicast support > before the ucma; however, I'm not sure that makes the most sense for merging > upstream. Thoughts? OTOH ucma is already used so it makes sense to merge that first. Can not this be solved by adding some reserved fields where multicast data is supposed to go? Worst case, we'll use for something else. -- MST From ogerlitz at voltaire.com Tue Oct 3 00:24:13 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 03 Oct 2006 09:24:13 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45219DFC.1010306@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> Message-ID: <4522101D.4040102@voltaire.com> Sean Hefty wrote: > I'm currently working on moving the rdma_cm code that's in svn forward to what's > upstream. (I was just typing a message on this...) My plan is to ask Roland to > host one, maybe two, branches in the infiniband.git tree. Here are the main > pieces missing from the kernel: > > 1. We need to add rdma_establish() and expose the rdma_conn_param values as > part of the connection event. I'm working on a patch for the latter. > > 2. We need a ucma branch. To merge upstream, it makes sense to include item 1 > first, but this leads to a conflict with the OFED releases. OFED ABI version 1 > includes RC QP support, but without item 1 changes, and SVN ABI version 2 > includes multicast support. Can you clarify what do you mean "(ABI) conflict with OFED releases"? Is an issue with someone wishing to work with OFED user space and IB code from upstream kernel? The approach i suggest is: it makes sense to take some care not to create too much non working scenarios... however the upstream push process must **not** be restricted by the existence of OFED. My understanding is that libibverbs and ib_uverbs driver development (eg the exclusion of libibverbs 1.1 from OFED 1.1) follow this approach, and it would be good to apply it also on librdmacm. Specifically, can you push rhe rdma_establish() ***kernel*** API support which was integrated into OFED 1.1 as a bug fix for 2.6.19 ? Note that doing so is a must to have IB ULPs which are not upstream nor part of OFED - RDS, Lustre's o2ibnld, NFSoRDMA, iSER being able to compile against both OFED and IB kernel code. Else you create a conflict which places the kernel IB code in a second place relative to OFED. Or. From moshek at voltaire.com Tue Oct 3 00:27:59 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Tue, 3 Oct 2006 09:27:59 +0200 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD Message-ID: Michael Wrote : > OK, so for OFED just mmap from /proc/bus/pci/ should be sufficient > work-around - it will make things work when driver is loaded. Correct? No ! A work-around that enable the use of mstflint only when the driver is loaded is not sufficient What you plan to do when you have system error -> - boot fail when IB started, - driver loading fail as result of driver error / miss match FWR, etc. When driver is not loaded/operating o.k. We must be able to check the HCA FWR version, and reload FWR if needed. Having mstflint working only when driver is loaded o.k. will not permit us to access the HCA in this case !! Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Sunday, October 01, 2006 9:51 AM To: Tseng-Hui (Frank) Lin Cc: Moshe Kazir; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD Quoting r. Tseng-Hui (Frank) Lin : > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is > notloaded on AMD > > The ppc64 problem is actually in pci_64.c. Here is the patch: > > ============ cut here ============= > diff --git a/arch/powerpc/kernel/pci_64.c > b/arch/powerpc/kernel/pci_64.c index 4c4449b..490403c 100644 > --- a/arch/powerpc/kernel/pci_64.c > +++ b/arch/powerpc/kernel/pci_64.c > @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_ > if (hose == 0) > return NULL; /* should never happen */ > > - /* If memory, add on the PCI bridge address offset */ > if (mmap_state == pci_mmap_mem) { > - *offset += hose->pci_mem_offset; > res_bit = IORESOURCE_MEM; > } else { > io_offset = (unsigned long)hose->io_base_virt - pci_io_base; > ============= end cut ============= > > The mmap() system call on resource0 does not work on ppc64 without > this patch. PowerMAC G5 got away with this because its > hose->pci_mem_offset was set to 0. > > The fix is made on 8/21. It may be able to make it into 2.6.19. But it > certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have > already been released. > > To cover both cases with and without the fix, my patch try to mmap > /sys/bus/pci/..../resource0 first. It it failed it tries mmap > /proc/bus/pci/.... If it failed again, we have no choice but fall back > to use PCI config space. OK, so for OFED just mmap from /proc/bus/pci/ should be sufficient work-around - it will make things work when driver is loaded. Correct? -- MST From mst at mellanox.co.il Tue Oct 3 00:28:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 09:28:34 +0200 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD In-Reply-To: <1159817663.21249.103.camel@flin.austin.ibm.com> References: <1159472595.21249.79.camel@flin.austin.ibm.com> <20061001075048.GC888@mellanox.co.il> <1159817663.21249.103.camel@flin.austin.ibm.com> Message-ID: <20061003072833.GG1288@mellanox.co.il> Quoting r. Tseng-Hui (Frank) Lin : > Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD > > On Sun, 2006-10-01 at 09:50 +0200, Michael S. Tsirkin wrote: > > Quoting r. Tseng-Hui (Frank) Lin : > > > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD > > > > > > The ppc64 problem is actually in pci_64.c. Here is the patch: > > > > > > ============ cut here ============= > > > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c > > > index 4c4449b..490403c 100644 > > > --- a/arch/powerpc/kernel/pci_64.c > > > +++ b/arch/powerpc/kernel/pci_64.c > > > @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_ > > > if (hose == 0) > > > return NULL; /* should never happen */ > > > > > > - /* If memory, add on the PCI bridge address offset */ > > > if (mmap_state == pci_mmap_mem) { > > > - *offset += hose->pci_mem_offset; > > > res_bit = IORESOURCE_MEM; > > > } else { > > > io_offset = (unsigned long)hose->io_base_virt - pci_io_base; > > > ============= end cut ============= > > > > > > The mmap() system call on resource0 does not work on ppc64 without this > > > patch. PowerMAC G5 got away with this because its hose->pci_mem_offset > > > was set to 0. > > > > > > The fix is made on 8/21. It may be able to make it into 2.6.19. But it > > > certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have > > > already been released. > > > > > > To cover both cases with and without the fix, my patch try to > > > mmap /sys/bus/pci/..../resource0 first. It it failed it tries > > > mmap /proc/bus/pci/.... If it failed again, we have no choice but fall > > > back to use PCI config space. Yack. OK, I'll put this in the OFED documentation. Try to make sure it works on 2.6.19 at least. Was it posted on lkml already? Wrt read/write access - we have a problem to resolve before we can enable it silently: currently two concurrent runs of e.g. mstflint on the same /proc device will conflict corrupting each other's output. -- MST From mst at mellanox.co.il Tue Oct 3 00:32:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 09:32:16 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45219DFC.1010306@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> Message-ID: <20061003073216.GI1288@mellanox.co.il> Quoting r. Sean Hefty : > 1. We need to add rdma_establish() and expose the rdma_conn_param values as > part of the connection event. I'm working on a patch for the latter. > > 2. We need a ucma branch. To merge upstream, it makes sense to include item 1 > first, but this leads to a conflict with the OFED releases. OFED ABI version 1 > includes RC QP support, but without item 1 changes, and SVN ABI version 2 > includes multicast support. Hmm, OFED actually does include rdma_establish. Isn't that item 1? -- MST From mst at mellanox.co.il Tue Oct 3 00:45:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 09:45:06 +0200 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD In-Reply-To: References: Message-ID: <20061003074505.GJ1288@mellanox.co.il> Quoting r. Moshe Kazir : > A work-around that enable the use of mstflint only when the driver is > loaded is not sufficient I think I somewhat understand the mmap related kernel bug thing, (although I'd like to see this discussed on lkml) but I still don't understand where the "driver is loaded" thing comes from, and I'd like to. > What you plan to do when you have system error -> > - boot fail when IB started, > - driver loading fail as result of driver error / miss match FWR, etc. Not sure what's miss match FWR, but if you can't boot the need to specify /proc/bus/pci/ is going to be the least of your problems. Please understand, I'd like people to start using mthca0 for device name rather than defaulting to lspci as the first resort. -- MST From jackm at dev.mellanox.co.il Tue Oct 3 00:46:36 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 3 Oct 2006 09:46:36 +0200 Subject: [openib-general] Kernel Oops in user-mad, mad In-Reply-To: <20061001111413.GI1796@mellanox.co.il> References: <200610011153.16702.jackm@dev.mellanox.co.il> <20061001111413.GI1796@mellanox.co.il> Message-ID: <200610030946.36819.jackm@dev.mellanox.co.il> On Sunday 01 October 2006 13:14, Michael S. Tsirkin wrote: > Quoting r. Jack Morgenstein : > > Subject: Kernel Oops in user-mad, mad > > > > We received the following kernel Oops while running regression > > (see console picture attached). > > > > This looks like a possible race condition between handling umad send completions > > and ib_unregister_mad_agent. > > > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler. > > > > Is there a possibility that there is a double deletion from a list somewhere? > > > > Jack > > > > > > > > Was this during module unload? No. From glebn at voltaire.com Tue Oct 3 01:06:40 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 3 Oct 2006 10:06:40 +0200 Subject: [openib-general] RHEL 4 U3 - lost completions In-Reply-To: <20061003071237.GC1288@mellanox.co.il> References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> <20061003065657.GG19813@minantech.com> <20061003071237.GC1288@mellanox.co.il> Message-ID: <20061003080640.GH19813@minantech.com> On Tue, Oct 03, 2006 at 09:12:37AM +0200, Michael S. Tsirkin wrote: > Quoting r. glebn at voltaire.com : > > AFAIR there is a bug in kernel 2.6.9 that makes it possible for page to > > be changed in process's VM even though it is locked by get_user_pages(). > > That is why Mellanox driver used mlock() in addition to > > get_user_pages(). I think this bug was fixed somewhere around 2.6.11. > > I think it got fixed around 2.6.7. RHEL4 U3 has this fix, > and AFAIK last SLES9 update has backported that to 2.6.7 too. > Yes, you are right. It was fixed in 2.6.7 and RHEL has this fix. -- Gleb. From erezz at voltaire.com Tue Oct 3 01:36:29 2006 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 03 Oct 2006 10:36:29 +0200 Subject: [openib-general] [PATCH 0/3] IB/iser: bug fixes for 2.6.19 rc1 In-Reply-To: References: Message-ID: <4522210D.3040007@voltaire.com> Roland Dreier wrote: > Thanks, applied > > although I had to fix up patch 3/3 by hand, since it did not apply to my tree > > I merge > 100 patches every kernel release. If I have to spend an > extra 5 minutes for each one fixing a patch or pulling it out of svn, > then I end up burning an extra 9 hours of stupid work. If 20+ people > who contribute patches sent me clean patches, then everyone will be > happier because I'll be able to merge things quicker and focus on > productive work. > > Sorry, I guess that this was caused because I'm using the open-iscsi git tree for submission of patches. I understood from Or that I need to 'git pull' from your tree in order to sync with openib on my local tree. I'll do that next time. -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From erezz at voltaire.com Tue Oct 3 02:40:38 2006 From: erezz at voltaire.com (Erez Zilber) Date: Tue, 03 Oct 2006 11:40:38 +0200 Subject: [openib-general] Coverity found iSER bug? In-Reply-To: References: Message-ID: <45223016.2010402@voltaire.com> Roland Dreier wrote: > > (This is from the Coverity scanner, CID 1396) > > In iser_initiator.c there is suspicious code in iser_rcv_completion(). > We start with > > char *rx_data = NULL; > int rx_data_len = 0; > > and then do > > if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */ > rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN; > rx_data = dto->regd[1]->virt_addr; > rx_data += dto->offset[1]; > } > > I see no assignment to rx_data if dto_xfer_len <= ISER_TOTAL_HEADERS_LEN. > Then after a bunch of other stuff, we do > > iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len); > > Coverity eventually follows this path to iscsi_scsi_cmd_rsp(), which > might dereference rx_data directly. > > Is this a "can't happen" false positive or is there really a problem here? > > - R. > Roland, This cannot happen. If there's no data (dto_xfer_len <= ISER_TOTAL_HEADERS_LEN), iSER & open-iscsi code will not try to look into the NULL buffer. Just to be sure, I checked the possible paths from iscsi_iser_recv and it seems ok. Thanks -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From pasha at mellanox.co.il Tue Oct 3 03:37:26 2006 From: pasha at mellanox.co.il (Pavel Shamis (Pasha)) Date: Tue, 03 Oct 2006 12:37:26 +0200 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 In-Reply-To: References: Message-ID: <45223D66.40106@mellanox.co.il> Hi Scott, Unfortunately was not able to reproduce the failure on our platforms. Do you see the problem with all tests or with the specific only ? Is it consistent problem ? Regards, Pasha Scott Weitzenkamp (sweitzen) wrote: > $ uname -a > Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 > x86_64 > x86_64 x86_64 GNU/Linux > $ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > 192.168.2.46 192.168.2.49 hostname > svbu-qa1850-4 > svbu-qa1850-3 > $ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > 192.168.2.46 192.168.2.49 > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/ > osu_latency > > The last command just hangs. Can I try your binary RPMs? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > >> -----Original Message----- >> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] >> Sent: Sunday, October 01, 2006 2:29 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >> OFED 1.1 rc6 with SLES10 x86_64 >> >> Can you please elaborate on MVAPICH issues, can you send >> command line? >> We ran it here on 32 Opteron nodes each quad core and also rigorous >> tests on the many other nodes? >> >> >> >> Scott Weitzenkamp (sweitzen) wrote: >>> We are just getting started with OFED testing on SLES10, first >>> platform is x86_64. >>> >>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are >> working so far. >>> MVAPICH with OSU benchmarks just hang. This same hardware works >>> fine with OFED and RHEL4 U3. >>> >>> Has anyone else seen this? >>> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Server Virtualization Business Unit >>> Cisco Systems >>> >>> >> -------------------------------------------------------------- >> ---------- >>> _______________________________________________ >>> openfabrics-ewg mailing list >>> openfabrics-ewg at openib.org >>> http://openib.org/mailman/listinfo/openfabrics-ewg >>> > -- Pavel Shamis (Pasha) Software Engineer Mellanox Technologies LTD. pasha at mellanox.co.il From halr at voltaire.com Tue Oct 3 03:46:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 06:46:01 -0400 Subject: [openib-general] Kernel Oops in user-mad, mad In-Reply-To: <200610030946.36819.jackm@dev.mellanox.co.il> References: <200610011153.16702.jackm@dev.mellanox.co.il> <20061001111413.GI1796@mellanox.co.il> <200610030946.36819.jackm@dev.mellanox.co.il> Message-ID: <1159872361.18903.29595.camel@hal.voltaire.com> On Tue, 2006-10-03 at 03:46, Jack Morgenstein wrote: > On Sunday 01 October 2006 13:14, Michael S. Tsirkin wrote: > > Quoting r. Jack Morgenstein : > > > Subject: Kernel Oops in user-mad, mad > > > > > > We received the following kernel Oops while running regression > > > (see console picture attached). > > > > > > This looks like a possible race condition between handling umad send completions > > > and ib_unregister_mad_agent. > > > > > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > > > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler. > > > > > > Is there a possibility that there is a double deletion from a list somewhere? > > > > > > Jack > > > > > > > > > > > > > Was this during module unload? > No. What caused the ib_unregister_mad_agent routine to be invoked ? Was OpenSM shutting down when this occurred ? Can you provide any more details on the scenario which caused this ? -- Hal From vlad at mellanox.co.il Tue Oct 3 03:52:58 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 03 Oct 2006 12:52:58 +0200 Subject: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10 In-Reply-To: References: Message-ID: <1159872778.8333.59.camel@mtlsws13.yok.mtl.com> Hi Scott, Please see my comments below: On Mon, 2006-10-02 at 23:24 -0700, Scott Weitzenkamp (sweitzen) wrote: > Vlad, > > I filed a bug for these issues. > > 1) If I start IPoIB HA with ib0 IB port shut down (from IB switch) and > ib1 IB port enabled, then IPoIB does not work because "ip monitor link > all" does not report NO-CARRIER at startup like ipoib_ha.pl is looking > for. This is a major hole. Fixed, will be updated in OFED-1.1-rc7. > > > 2) /etc/init.d/openibd runs ipoib_ha.pl with its stdout and stderr > redirected to /dev/null, should we run with -v for verbose instead and > redirect log file to /var/log? > > # fgrep ipoib_ha.pl /etc/init.d/openibd > ipoib_ha.pl -p ${PRIMARY_IPOIB_DEV} -s > ${SECONDARY_IPOIB_DEV} -- > with-arping --with-multicast > /dev/null 2>&1 & > Added /var/log/ipoib_ha.log and the verbose output of ipoib_ha.pl redirected into this log file. > 3) I got IPoIB HA working on SLES 10, but the documentation is a > little lacking. Looks like I have to put the same IP address in > ifcfg-ib0 and ifcfg-ib1, is this correct? > Yes, IP address should be the same. Actually the configuration of the secondary interface does not matter. The High Availability daemon reads the configuration of the primary interface and migrates it between the interfaces in case of failure. > # pwd > /etc/sysconfig/network > # cat ifcfg-ib0 > DEVICE=ib0 > BOOTPROTO=static > IPADDR=192.168.2.46 > NETMASK=255.255.255.0 > ONBOOT=yes > # cat ifcfg-ib1 > DEVICE=ib1 > BOOTPROTO=static > IPADDR=192.168.2.46 > NETMASK=255.255.255.0 > ONBOOT=yes > > 4) If I shutdown ib0 IB port, I see this from > "/usr/local/ofed/bin/ipoib_ha.pl -v --with-arping --with-multicast" > > Use of uninitialized value in concatenation (.) or string > at /usr/local/ofed/bin/ipoib_ha.pl line 287. > Fixed. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg Best Regards, Vladimir Sokolovsky Software Integration Engineer Mellanox Technologies Ltd. Tell: +972 (4) 909-7200 ext. 338 -------------- next part -------------- An HTML attachment was scrubbed... URL: From moshek at voltaire.com Tue Oct 3 05:30:57 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Tue, 3 Oct 2006 14:30:57 +0200 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD Message-ID: Michael, > Not sure what's miss match FWR, but if you can't boot > the need to specify /proc/bus/pci/ is going to be the least of your problems. Are you sure that we will never face a situation were mthca driver is not loaded and we need to burn a new HCA FWR ? What we expect the user to do in this case ? Send the HCA to Mellanox ? > > Please understand, I'd like people to start using mthca0 for device name rather than defaulting to lspci as the first resort. I think that having " mstflint -d mthca0 ... " is really good and user friendly. BUT please notice that , Plenty of the mstflint uses are done when a customer buy / install new IB lab equipment. New users that does not know IB yet knows lspci . Lspci is easy to find info and very convenient for scripts writing. So , Can you explain what's wrong with " mstflint -d .." and why you don't want user to use it ? Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Tuesday, October 03, 2006 9:45 AM To: Moshe Kazir Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD Quoting r. Moshe Kazir : > A work-around that enable the use of mstflint only when the driver is > loaded is not sufficient I think I somewhat understand the mmap related kernel bug thing, (although I'd like to see this discussed on lkml) but I still don't understand where the "driver is loaded" thing comes from, and I'd like to. > What you plan to do when you have system error -> > - boot fail when IB started, > - driver loading fail as result of driver error / miss match FWR, etc. Not sure what's miss match FWR, but if you can't boot the need to specify /proc/bus/pci/ is going to be the least of your problems. Please understand, I'd like people to start using mthca0 for device name rather than defaulting to lspci as the first resort. -- MST From halr at voltaire.com Tue Oct 3 05:58:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 08:58:49 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response In-Reply-To: <4517CD19.20700@dev.mellanox.co.il> References: <450F7D7E.8070408@mellanox.co.il> <4517CD19.20700@dev.mellanox.co.il> Message-ID: <1159880328.18903.34469.camel@hal.voltaire.com> Hi Yevgeny, On Mon, 2006-09-25 at 08:35, Yevgeny Kliteynik wrote: > Hi Hal. > > The patch looks ok. Thanks. Did you just look at things or did you also run any tests ? > A few remarks thought: > > It appears that the multicast group mtu/rate selectors > are actually not referenced by anyone - the SM/SA code > implicitly assumes that they should be 'exact', and acts > accordingly. If this is the case, then it seems to me that it is a different bug which also needs fixing but I'm not sure why you say this as I see code in osm_sa_mcmember_record.c which obtains the various selectors (MTU and rate) and handles them. PacketLifeTime is the only one not currently handled. > Same goes for the response - the selectors > that are filled in are hard-coded to 'exact'. That's what the spec requires on response regardless of what selectors were supplied on the request. > This is the reason why the bug that this patch fixes has > never appeared, and why fixing it will not change the SM > behavior. It does change the SM (SA) behavior (at least in terms of the response). It does not affect the SA client (end stack) behavior. > But of course, it is better to have this fix anyway. Yes, I think it is required for spec compliance. -- Hal From mst at mellanox.co.il Tue Oct 3 12:59:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 15:59:07 -0400 Subject: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD In-Reply-To: References: Message-ID: <20061003195907.GA4569@mellanox.co.il> > So , Can you explain what's wrong with " mstflint -d .." > and why you don't want user to use it ? What bothers me most is the pread/pwrite config space access. Unfortunately, the way it is implemented currently, using this interface, it is too easy to hang the system, or worse, just by running several mstflint instances. So, it was intended just as a work around for cases where firmware is in a bad way, and I am reluctant to enable that transparently for all users (even on some architectures), until I am sure I understand the whole problem and there's no better work-around, or until I figure out a good fix for this issue. -- MST From johnt1johnt2 at gmail.com Tue Oct 3 06:11:52 2006 From: johnt1johnt2 at gmail.com (john t) Date: Tue, 3 Oct 2006 18:41:52 +0530 Subject: [openib-general] Multi-port HCA Message-ID: Hi, I have two HCA cards, each having two ports and each connected to a separate PCI-E x8 slot. Using one HCA port I get end to end BW of 11.6 Gb/sec (uni-direction RDMA). If I use two ports of the same HCA or different HCA, I get between 5 to 6.5Gb/sec point-to-point BW on each port. BW on each port further reduces if I use more ports. I am not able to understand this behaviour. Is there any limitation on max. BW that a system can provide? Does the available BW get divided among multiple HCA ports (which means having multiple ports will not increase the BW)? Regards, John T -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue Oct 3 06:16:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 15:16:15 +0200 Subject: [openib-general] [PATCH ofed-1.1 2/2] ehca: improved ehca debug format In-Reply-To: <200610030050.05816.hnguyen@de.ibm.com> References: <200610030050.05816.hnguyen@de.ibm.com> Message-ID: <20061003131615.GA3721@mellanox.co.il> Quoting r. Hoang-Nam Nguyen : > Subject: [PATCH ofed-1.1 2/2] ehca: improved ehca debug format > > Michael, > here is the 2nd patch of ehca with a small format improvement in ehca > debug function. It would be great if we could include it for ofed-1.1. > Note that I created this patch against the dir openib-1.1 extracted > from ofed-1.1-rc6/SOURCES/openib-1.1.tgz. > Thanks! > Nam Nguyen Both patches applied without problems, we'll add them for rc7. -- MST From mst at mellanox.co.il Tue Oct 3 06:29:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 15:29:34 +0200 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <4521AA8B.80702@pathscale.com> References: <4521AA8B.80702@pathscale.com> Message-ID: <20061003132934.GB3721@mellanox.co.il> Quoting r. Robert Walsh : > Subject: Re: [openfabrics-ewg] OFED Status > > The attached patch fixes this problem by deferring creation of our > diagpkt device until at least one piece of hardware has been found. > > Michael: this will fix the OFED testing problem you were seeing. The patch didn't apply - looks like it's not against the ofed tree - it was conflicting in the second chunk in ipath_driver. Since the conflict was trivial, Vlad fixed that up manually - here's the patch that we put in RC7. Rob, please try to make sure patches apply in OFED tree next time before you post. Patch for review, below --- IB/ipath - initialize diagpkt file on device init only Don't attempt to set up the diagpkt device in the module init code. Instead, wait until a piece of hardware is initted. Fixes a problem when loading the ib_ipath module when no InfiniPath hardware is present: modprobe would go into the D state and stay there. From: Robert Walsh diff -r 2ed7140d5700 drivers/infiniband/hw/ipath/ipath_diag.c --- a/drivers/infiniband/hw/ipath/ipath_diag.c Mon Oct 02 16:56:55 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_diag.c Mon Oct 02 16:58:29 2006 -0700 @@ -286,17 +286,23 @@ static struct file_operations diagpkt_fi static struct cdev *diagpkt_cdev; static struct class_device *diagpkt_class_dev; - -int __init ipath_diagpkt_add(void) -{ - return ipath_cdev_init(IPATH_DIAGPKT_MINOR, - "ipath_diagpkt", &diagpkt_file_ops, - &diagpkt_cdev, &diagpkt_class_dev); -} - -void __exit ipath_diagpkt_remove(void) -{ - ipath_cdev_cleanup(&diagpkt_cdev, &diagpkt_class_dev); +static int diagpkt_count; + +void ipath_diagpkt_add(void) +{ + if (diagpkt_count == 0) + ipath_cdev_init(IPATH_DIAGPKT_MINOR, + "ipath_diagpkt", &diagpkt_file_ops, + &diagpkt_cdev, &diagpkt_class_dev); + + diagpkt_count++; +} + +void ipath_diagpkt_remove(void) +{ + diagpkt_count--; + if (diagpkt_count == 0) + ipath_cdev_cleanup(&diagpkt_cdev, &diagpkt_class_dev); } /** diff -r 2ed7140d5700 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Mon Oct 02 16:56:55 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Mon Oct 02 17:00:39 2006 -0700 @@ -559,6 +559,7 @@ static int __devinit ipath_init_one(stru ipathfs_add_device(dd); ipath_user_add(dd); ipath_diag_add(dd); + ipath_diagpkt_add(); ipath_register_ib_device(dd); /* Check that we have a LID in LID_TIMEOUT seconds. */ @@ -700,6 +701,7 @@ static void __devexit ipath_remove_one(s ipath_unregister_ib_device(dd->verbs_dev); + ipath_diagpkt_remove(); ipath_diag_remove(dd); ipath_user_remove(dd); ipathfs_remove_device(dd); @@ -2183,17 +2185,7 @@ static int __init infinipath_init(void) goto bail_group; } - ret = ipath_diagpkt_add(); - if (ret < 0) { - printk(KERN_ERR IPATH_DRV_NAME ": Unable to create " - "diag data device: error %d\n", -ret); - goto bail_ipathfs; - } - goto bail; - -bail_ipathfs: - ipath_exit_ipathfs(); bail_group: ipath_driver_remove_group(&ipath_driver.driver); diff -r 2ed7140d5700 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Mon Oct 02 16:56:55 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Mon Oct 02 16:58:29 2006 -0700 @@ -889,7 +889,7 @@ void ipath_device_remove_group(struct de void ipath_device_remove_group(struct device *, struct ipath_devdata *); int ipath_expose_reset(struct device *); -int ipath_diagpkt_add(void); +void ipath_diagpkt_add(void); void ipath_diagpkt_remove(void); int ipath_init_ipathfs(void); -- MST From rkuchimanchi at silverstorm.com Tue Oct 3 06:39:27 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 19:09:27 +0530 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: References: <4521BCB1.524.4E3B826B@rkuchimanchi.silverstorm.com> <4521762C.6000007@silverstorm.com> Message-ID: <4522680F.6060105@silverstorm.com> Roland, Bryan Thanks for all the review comments. I will start working on the changes you suggested. Regards, Ram Roland Dreier wrote: > Ramachandra> In that case, can you please consider this for the > Ramachandra> for-2.6.20 branch ? > > I'm happy to keep this in a vex branch or something like that, but as > the emails I just sent show, this is not ready for merging yet (which > is to be expected -- it's never been reviewed). > > I think Scott's question about protocol documentation is a good one. > And also as I said this needs to be sent to lkml and netdev for full > review by everyone. > > - R. From jackm at dev.mellanox.co.il Tue Oct 3 06:46:03 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 3 Oct 2006 15:46:03 +0200 Subject: [openib-general] Kernel Oops in user-mad, mad In-Reply-To: <1159872361.18903.29595.camel@hal.voltaire.com> References: <200610011153.16702.jackm@dev.mellanox.co.il> <200610030946.36819.jackm@dev.mellanox.co.il> <1159872361.18903.29595.camel@hal.voltaire.com> Message-ID: <200610031546.04151.jackm@dev.mellanox.co.il> On Tuesday 03 October 2006 12:46, Hal Rosenstock wrote: > On Tue, 2006-10-03 at 03:46, Jack Morgenstein wrote: > > On Sunday 01 October 2006 13:14, Michael S. Tsirkin wrote: > > > Quoting r. Jack Morgenstein : > > > > Subject: Kernel Oops in user-mad, mad > > > > > > > > We received the following kernel Oops while running regression > > > > (see console picture attached). > > > > > > > > This looks like a possible race condition between handling umad send completions > > > > and ib_unregister_mad_agent. > > > > > > > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > > > > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler. > > > > > > > > Is there a possibility that there is a double deletion from a list somewhere? > > > > > > > > Jack > > > > > > > > > > > > > > > > > > Was this during module unload? > > No. > > What caused the ib_unregister_mad_agent routine to be invoked ? Was > OpenSM shutting down when this occurred ? Can you provide any more > details on the scenario which caused this ? > > -- Hal This was during the testing of MPI. Opensm is invoked once (also shut down) before running an MPI test; Evidently, this occurred between MPI tests. We don't have any info beyond this. - Jack From rkuchimanchi at silverstorm.com Tue Oct 3 06:59:58 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 03 Oct 2006 19:29:58 +0530 Subject: [openib-general] [PATCH 8/10] sysfs interface implementation In-Reply-To: <4521802F.4010405@pathscale.com> References: <4521BF8B.3021.4E46A8AA@rkuchimanchi.silverstorm.com> <4521802F.4010405@pathscale.com> Message-ID: <45226CDE.1060205@silverstorm.com> Bryan O'Sullivan wrote: > Ramachandra K wrote: >> >> +/* >> + * target eiocs are added by writing >> + * >> + * ioc_guid=,dgid=> GID>,pkey=,name= >> + * to the create_primary sysfs attribute. >> + */ >> +enum { >> + VNIC_OPT_ERR = 0, >> + VNIC_OPT_IOC_GUID = 1 << 0, >> + VNIC_OPT_DGID = 1 << 1, >> + VNIC_OPT_PKEY = 1 << 2, >> + VNIC_OPT_NAME = 1 << 3, >> + VNIC_OPT_INSTANCE = 1 << 4, >> + VNIC_OPT_RXCSUM = 1 << 5, >> + VNIC_OPT_TXCSUM = 1 << 6, >> + VNIC_OPT_HEARTBEAT = 1 << 7, >> + VNIC_OPT_ALL = (VNIC_OPT_IOC_GUID | >> + VNIC_OPT_DGID | VNIC_OPT_NAME | VNIC_OPT_PKEY), >> +}; > > This is not OK. You can't pass in multiple values to a sysfs file. > Either set the values separately or (if they have to be set all at once) > find some other way to do this work. Also, putting all of this parsing > cruft in a driver is a sign you're trying to do something you shouldn't be. > This is similar to what is done in the SRP driver. In fact I had chosen this approach of adding targets looking at the SRP driver as the input parameters that are required here are almost same as that of the SRP driver. Regards, Ram From swise at opengridcomputing.com Tue Oct 3 07:00:49 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 03 Oct 2006 09:00:49 -0500 Subject: [openib-general] rdma_cm branch In-Reply-To: <452202D6.6040605@voltaire.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <452202D6.6040605@voltaire.com> Message-ID: <1159884049.24791.15.camel@stevo-desktop> > Since the ucma will not make it for the 2.6.19 feature merge window, why > not target both the ucma and the cma ud/ud-mulitast support for 2.6.20? > > This way you would be doing one big ABI change and would not carry this > HUGE svn/git diff. > > As I have mentioned in the other thread, once it would make sense from > your schedule to do the patch preparation work, it would be good to push > it into the for-2.6.20 branch of Roland's tree from where it can go to > the -mm tree so people can start testing it. > This sounds good to me too. From rdreier at cisco.com Tue Oct 3 07:08:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 07:08:32 -0700 Subject: [openib-general] RHEL 4 U3 - lost completions References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> Message-ID: Or> Roland - If indeed, does it make sense that the problem does Or> not reproduce with single threaded runs? Sorry, I can't parse the question. However, the problem here seems to be that the CQ buffer pages end up being marked for copy-on-write, and I don't know of any reason why that would happen other than a fork() happening somewhere (possibly behind the scenes in a system() call or something like that). - R. From bugzilla-daemon at openib.org Tue Oct 3 07:14:39 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 3 Oct 2006 07:14:39 -0700 (PDT) Subject: [openib-general] [Bug 261] can't configure IPoIB pkey interfaces at boot time Message-ID: <20061003141439.845192283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=261 ------- Comment #1 from chas at cmf.nrl.navy.mil 2006-10-03 07:14 ------- Created an attachment (id=49) --> (http://openib.org/bugzilla/attachment.cgi?id=49&action=view) diff to support vlans on a parent interface we use the following locally. it could be better. the 0x prefix could be handled by awk instead of eval. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From swise at opengridcomputing.com Tue Oct 3 07:25:43 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 03 Oct 2006 09:25:43 -0500 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <451C3F02.3000907@ichips.intel.com> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> Message-ID: <1159885543.24791.21.camel@stevo-desktop> > Someday soon I hear, OFA will be able to host git repositories, so my preference > is to delay any svn to git transition until then. (I cannot host git from > inside Intel's firewall, nor can I access a git repository which isn't hosted at > kernel.org.) How would you handle merging in changes from the main branch to > side branches? > Can OFA give us a date on when this will happen? Steve. From wombat2 at us.ibm.com Tue Oct 3 07:35:47 2006 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Tue, 3 Oct 2006 10:35:47 -0400 Subject: [openib-general] Multi-port HCA In-Reply-To: Message-ID: John, Who's adapter (manufacturer) are you using? It is usually an adapter implementation or driver issue that occures when you cannot scale across multiple links. The fact that you don't scale up from one link, but it appears they share a fixed bandwidth across N links means that there is a driver or stack issue. At one time I think that IPoIB and maybe other IB drivers used only one event queue across multiple links which would be a bottleneck. We added code in the IBM EHCA driver to get round this bottleneck. Are your measurements using MPI or IP. Are you using separate tasks/sockets per link and using different subnets if using IP? Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner john t" wrote on 10/03/2006 09:42:24 AM: > > Hi, > > I have two HCA cards, each having two ports and each connected to a > separate PCI-E x8 slot. > > Using one HCA port I get end to end BW of 11.6 Gb/sec (uni-direction RDMA). > If I use two ports of the same HCA or different HCA, I get between 5 > to 6.5 Gb/sec point-to-point BW on each port. BW on each port > further reduces if I use more ports. I am not able to understand > this behaviour. Is there any limitation on max. BW that a system can > provide? Does the available BW get divided among multiple HCA ports > (which means having multiple ports will not increase the BW)? > > > Regards, > John T -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Oct 3 07:38:48 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 10:38:48 -0400 Subject: [openib-general] diags/ibportstate: Support explicit port reset Message-ID: <1159886328.18903.38061.camel@hal.voltaire.com> diags/ibportstate: Support explicit port reset in addition to disable and enable Signed-off-by: Hal Rosenstock Index: src/ibportstate.c =================================================================== --- src/ibportstate.c (revision 9670) +++ src/ibportstate.c (working copy) @@ -252,6 +252,8 @@ main(int argc, char **argv) port_op = 1; else if (!strcmp(argv[2], "disable")) port_op = 2; + else if (!strcmp(argv[2], "reset")) + port_op = 3; } if (port_op) @@ -266,13 +268,18 @@ main(int argc, char **argv) if (port_op) { if (port_op == 1) /* Enable port */ mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 2); /* Polling */ - else if (port_op == 2) { /* Disable port */ + else if ((port_op == 2) || (port_op == 3)) { /* Disable port */ mad_set_field(data, 0, IB_PORT_STATE_F, 1); /* Down */ mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 3); /* Disabled */ } if ((err = set_port_info(&portid, data, argv+1, argc-1))) IBERROR("smpset portinfo: %s", err); + if (port_op == 3) { /* Reset port - so also enable */ + mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 2); /* Polling */ + if ((err = set_port_info(&portid, data, argv+1, argc-1))) + IBERROR("smpset portinfo: %s", err); + } } exit(0); Index: man/ibportstate.8 =================================================================== --- man/ibportstate.8 (revision 9670) +++ man/ibportstate.8 (working copy) @@ -1,4 +1,4 @@ -.TH IBPORTSTATE 8 "July 25, 2006" "OpenIB" "OpenIB Diagnostics" +.TH IBPORTSTATE 8 "October 3, 2006" "OpenIB" "OpenIB Diagnostics" .SH NAME ibportstate \- handle port state and port physical state of an InfiniBand port @@ -10,7 +10,7 @@ ibportstate \- handle port state and por .SH DESCRIPTION .PP ibportstate allows the port state and port physical state of an IB port -to be queried or a switch port to be disabled or enabled. +to be queried or a switch port to be disabled, enabled, or reset. .SH OPTIONS @@ -18,7 +18,7 @@ to be queried or a switch port to be dis .TP op Port operations allowed - supported ops: enable, disable, query + supported ops: enable, disable, reset, query Default is query .SH COMMON OPTIONS From tom at opengridcomputing.com Tue Oct 3 07:46:41 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 03 Oct 2006 09:46:41 -0500 Subject: [openib-general] [PATCH] Add spinlocks to serialize ib_post_send/ib_post_recv Message-ID: <20061003144641.8901.28566.stgit@dell3.ogc.int> From: Tom Tucker The AMSO driver was not thread-safe in the post WR code and had code that would sleep if the WR post FIFO was full. Since these functions can be called on interrupt level I changed the sleep to a udelay. Signed-off-by: Tom Tucker --- drivers/infiniband/hw/amso1100/c2_qp.c | 15 +++++++++++---- 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c index 1226113..681c130 100644 --- a/drivers/infiniband/hw/amso1100/c2_qp.c +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -35,6 +35,7 @@ * */ +#include #include "c2.h" #include "c2_vq.h" #include "c2_status.h" @@ -705,10 +706,8 @@ static inline void c2_activity(struct c2 * cannot get on the bus and the card and system hang in a * deadlock -- thus the need for this code. [TOT] */ - while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x80000000) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(0); - } + while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x80000000) + udelay(10); __raw_writel(C2_HINT_MAKE(mq_index, shared), c2dev->regs + PCI_BAR0_ADAPTER_HINT); @@ -766,6 +765,7 @@ int c2_post_send(struct ib_qp *ibqp, str struct c2_dev *c2dev = to_c2dev(ibqp->device); struct c2_qp *qp = to_c2qp(ibqp); union c2wr wr; + unsigned long lock_flags; int err = 0; u32 flags; @@ -881,8 +881,10 @@ int c2_post_send(struct ib_qp *ibqp, str /* * Post the puppy! */ + spin_lock_irqsave(&qp->lock, lock_flags); err = qp_wr_post(&qp->sq_mq, &wr, qp, msg_size); if (err) { + spin_unlock_irqrestore(&qp->lock, lock_flags); break; } @@ -890,6 +892,7 @@ int c2_post_send(struct ib_qp *ibqp, str * Enqueue mq index to activity FIFO. */ c2_activity(c2dev, qp->sq_mq.index, qp->sq_mq.hint_count); + spin_unlock_irqrestore(&qp->lock, lock_flags); ib_wr = ib_wr->next; } @@ -905,6 +908,7 @@ int c2_post_receive(struct ib_qp *ibqp, struct c2_dev *c2dev = to_c2dev(ibqp->device); struct c2_qp *qp = to_c2qp(ibqp); union c2wr wr; + unsigned long lock_flags; int err = 0; if (qp->state > IB_QPS_RTS) @@ -945,8 +949,10 @@ int c2_post_receive(struct ib_qp *ibqp, break; } + spin_lock_irqsave(&qp->lock, lock_flags); err = qp_wr_post(&qp->rq_mq, &wr, qp, qp->rq_mq.msg_size); if (err) { + spin_unlock_irqrestore(&qp->lock, lock_flags); break; } @@ -954,6 +960,7 @@ int c2_post_receive(struct ib_qp *ibqp, * Enqueue mq index to activity FIFO */ c2_activity(c2dev, qp->rq_mq.index, qp->rq_mq.hint_count); + spin_unlock_irqrestore(&qp->lock, lock_flags); ib_wr = ib_wr->next; } From aviram at dev.mellanox.co.il Tue Oct 3 07:48:43 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Tue, 03 Oct 2006 16:48:43 +0200 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <45219DD6.8030602@ichips.intel.com> References: <45219DD6.8030602@ichips.intel.com> Message-ID: <4522784B.40601@dev.mellanox.co.il> Arlin Davis wrote: > Woodruff, Robert J wrote: > >> Aviram wrote, >> >> >>> Pending that IPoIB HA is solved would like to issue RC7 that suppose to >>> >>> be final. Is everyone OK with this approach? >>> >>> Aviram >>> >> >> Sounds good, >> >> What is the target date for RC7 ? >> > > Do we have a new target date? I hope this week. From bos at pathscale.com Tue Oct 3 08:25:31 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 03 Oct 2006 08:25:31 -0700 Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19 In-Reply-To: References: <452156F4.4050004@pathscale.com> Message-ID: <452280EB.4000300@pathscale.com> Eric W. Biederman wrote: > Sure. I talked to Dave Olson about this a while ago, and I couldn't > get anything happening. Driver authors tend to find imminent breakage quite stimulating, in my experience :-) References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <20061003073216.GI1288@mellanox.co.il> Message-ID: <45228CB9.9060206@ichips.intel.com> Michael S. Tsirkin wrote: >>1. We need to add rdma_establish() and expose the rdma_conn_param values as >>part of the connection event. I'm working on a patch for the latter. >> >>2. We need a ucma branch. To merge upstream, it makes sense to include item 1 >>first, but this leads to a conflict with the OFED releases. OFED ABI version 1 >>includes RC QP support, but without item 1 changes, and SVN ABI version 2 >>includes multicast support. > > > Hmm, OFED actually does include rdma_establish. Isn't that item 1? > But it doesn't export that to userspace, does it? - Sean From mshefty at ichips.intel.com Tue Oct 3 09:14:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 03 Oct 2006 09:14:50 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <20061003071632.GE1288@mellanox.co.il> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <20061003071632.GE1288@mellanox.co.il> Message-ID: <45228C7A.5070402@ichips.intel.com> Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : > >>1. We need to add rdma_establish() and expose the rdma_conn_param values as >>part of the connection event. I'm working on a patch for the latter. > > > I have both patches as part of OFED. > Should I post them for review? > I have a patch for rdma_establish(), but please post both. I can compare the two rdma_establish() patches, and pull in the rdma_conn_param patch. Did you carry the rdma_conn_param patch up to userspace as well? - Sean From mshefty at ichips.intel.com Tue Oct 3 09:54:14 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 03 Oct 2006 09:54:14 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <4522101D.4040102@voltaire.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <4522101D.4040102@voltaire.com> Message-ID: <452295B6.20201@ichips.intel.com> Or Gerlitz wrote: > Can you clarify what do you mean "(ABI) conflict with OFED releases"? > > Is an issue with someone wishing to work with OFED user space and IB > code from upstream kernel? Yes - there could be issues there. > The approach i suggest is: it makes sense to take some care not to > create too much non working scenarios... however the upstream push > process must **not** be restricted by the existence of OFED. I agree with this. > Specifically, can you push rhe rdma_establish() ***kernel*** API support > which was integrated into OFED 1.1 as a bug fix for 2.6.19 ? Yes, but I'd like a user of it to go in at the same time. - Sean From AHKumar at odu.edu Tue Oct 3 10:34:05 2006 From: AHKumar at odu.edu (Amit H Kumar) Date: Tue, 3 Oct 2006 13:34:05 -0400 Subject: [openib-general] Setting HCA LinkWidth Message-ID: Hi OpenIB, I am not sure if this question belongs to this list or not. But if anyone can give me some lead it will be very helpful. One of the HCA card in our cluster is running at LinkWidth 1x: as opposed to 4x on all other nodes. I have been trying to set it to 4x using the iba_portconfig tool but nothing seems to work. Can anyone tell me what is the general approach in doing so. Thank you for any feedback, -AK From mshefty at ichips.intel.com Tue Oct 3 10:36:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 03 Oct 2006 10:36:26 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <452295B6.20201@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <4522101D.4040102@voltaire.com> <452295B6.20201@ichips.intel.com> Message-ID: <45229F9A.5030207@ichips.intel.com> >>Can you clarify what do you mean "(ABI) conflict with OFED releases"? >> >>Is an issue with someone wishing to work with OFED user space and IB >>code from upstream kernel? > > Yes - there could be issues there. To clarify the major issue: currently when a connection request is received, the connection data specified by the active side through the rdma_conn_param is NOT given to the user. This includes the responder_resources and initiator_depth. There's no easy way to obtain this information. The ideal fix for this is to include rdma_conn_param as part of the rdma_cm_event. However, this breaks every userspace app that's been coded to OFED / SVN. An alternative is to add another call to retrieve the data, but that's not a very clean alternative for new kernel submission. - Sean From halr at voltaire.com Tue Oct 3 10:38:23 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 13:38:23 -0400 Subject: [openib-general] Setting HCA LinkWidth In-Reply-To: References: Message-ID: <1159897102.4502.1179.camel@hal.voltaire.com> On Tue, 2006-10-03 at 13:34, Amit H Kumar wrote: > Hi OpenIB, > > I am not sure if this question belongs to this list or not. But if anyone > can give me some lead it will be very helpful. > > One of the HCA card in our cluster is running at LinkWidth 1x: as opposed > to 4x on all other nodes You might have a bad cable on that port. What does the switch peer port to that HCA port say for link width ? . > I have been trying to set it to 4x using the iba_portconfig tool but > nothing seems to work. What is the iba_portconfig tool ? What stack/tools are you running ? -- Hal > Can anyone tell me what is the general approach in doing so. > > Thank you for any feedback, > -AK > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ftillier at silverstorm.com Tue Oct 3 10:39:41 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 3 Oct 2006 10:39:41 -0700 Subject: [openib-general] Setting HCA LinkWidth In-Reply-To: References: Message-ID: <79ae2f320610031039w4a4ee8cexc5ec366d07f0beb5@mail.gmail.com> Hi Amit, On 10/3/06, Amit H Kumar wrote: > > Hi OpenIB, > > I am not sure if this question belongs to this list or not. But if anyone > can give me some lead it will be very helpful. > > One of the HCA card in our cluster is running at LinkWidth 1x: as opposed > to 4x on all other nodes. > I have been trying to set it to 4x using the iba_portconfig tool but > nothing seems to work. > > Can anyone tell me what is the general approach in doing so. If the port is coming up as 1x, it likely indicates a bad cable. You can check the port error counters to validate, or you can exchange two cables (one of which you know was used in a 4x link) and see if the problem follows the cable. - Fab From AHKumar at odu.edu Tue Oct 3 10:57:00 2006 From: AHKumar at odu.edu (Amit H Kumar) Date: Tue, 3 Oct 2006 13:57:00 -0400 Subject: [openib-general] Setting HCA LinkWidth In-Reply-To: <1159897102.4502.1179.camel@hal.voltaire.com> Message-ID: Hal Rosenstock wrote on 10/03/2006 01:38:23 PM: > On Tue, 2006-10-03 at 13:34, Amit H Kumar wrote: > > Hi OpenIB, > > > > I am not sure if this question belongs to this list or not. But if anyone > > can give me some lead it will be very helpful. > > > > One of the HCA card in our cluster is running at LinkWidth 1x: as opposed > > to 4x on all other nodes > > You might have a bad cable on that port. What does the switch peer port > to that HCA port say for link width ? I checked the port info on the switch ..it also says it 1x. > . > > I have been trying to set it to 4x using the iba_portconfig tool but > > nothing seems to work. > > What is the iba_portconfig tool ? What stack/tools are you running ? InfiniCon InfinIO 3000 VAPI libraries. Thank you for your feedback, -AK > > -- Hal > > > Can anyone tell me what is the general approach in doing so. > > > > Thank you for any feedback, > > -AK > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib. > org/mailman/listinfo/openib-general > > > From AHKumar at odu.edu Tue Oct 3 10:57:49 2006 From: AHKumar at odu.edu (Amit H Kumar) Date: Tue, 3 Oct 2006 13:57:49 -0400 Subject: [openib-general] Setting HCA LinkWidth In-Reply-To: <79ae2f320610031039w4a4ee8cexc5ec366d07f0beb5@mail.gmail.com> Message-ID: openib-general-bounces at openib.org wrote on 10/03/2006 01:39:41 PM: > Hi Amit, > > On 10/3/06, Amit H Kumar wrote: > > > > Hi OpenIB, > > > > I am not sure if this question belongs to this list or not. But if anyone > > can give me some lead it will be very helpful. > > > > One of the HCA card in our cluster is running at LinkWidth 1x: as opposed > > to 4x on all other nodes. > > I have been trying to set it to 4x using the iba_portconfig tool but > > nothing seems to work. > > > > Can anyone tell me what is the general approach in doing so. > > If the port is coming up as 1x, it likely indicates a bad cable. You > can check the port error counters to validate, or you can exchange two > cables (one of which you know was used in a 4x link) and see if the > problem follows the cable. I will give that a try. Thank you, -AK > > - Fab > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rjwalsh at pathscale.com Tue Oct 3 11:07:05 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 03 Oct 2006 11:07:05 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <20061003063752.GC15885@mellanox.co.il> References: <4521AA8B.80702@pathscale.com> <20061003063752.GC15885@mellanox.co.il> Message-ID: <4522A6C9.7070104@pathscale.com> Michael S. Tsirkin wrote: > Quoting r. Robert Walsh : >> Subject: Re: [openfabrics-ewg] OFED Status >> >> The attached patch fixes this problem by deferring creation of our >> diagpkt device until at least one piece of hardware has been found. >> >> Michael: this will fix the OFED testing problem you were seeing. >> >> Roland: please queue for 2.6.19. > > Just saw this, thanks, I'll try. Do you want to update the patch following > Roland's comments? Yes - I'll get to that today. From SMarsh at analogic.com Tue Oct 3 11:12:20 2006 From: SMarsh at analogic.com (Marsh, Scott) Date: Tue, 3 Oct 2006 14:12:20 -0400 Subject: [openib-general] Infiniband Fedora Core5 Message-ID: Good day, My name is Scott Marsh. I am an Engineer for Analogic Corporation and I have a few questions regarding OFED. Is there any current development towards OFED for use with Fedora Core 5? If so, is there a timeline for working towards Fedora Core 5? Thank you. Regards, Scott Marsh **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors at analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Oct 3 11:18:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 11:18:08 -0700 Subject: [openib-general] [PATCH] Add spinlocks to serialize ib_post_send/ib_post_recv In-Reply-To: <20061003144641.8901.28566.stgit@dell3.ogc.int> (Tom Tucker's message of "Tue, 03 Oct 2006 09:46:41 -0500") References: <20061003144641.8901.28566.stgit@dell3.ogc.int> Message-ID: Thanks, queued for 2.6.19 From rdreier at cisco.com Tue Oct 3 11:19:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 11:19:42 -0700 Subject: [openib-general] [PATCH 8/10] sysfs interface implementation In-Reply-To: <45226CDE.1060205@silverstorm.com> (Ramachandra K.'s message of "Tue, 03 Oct 2006 19:29:58 +0530") References: <4521BF8B.3021.4E46A8AA@rkuchimanchi.silverstorm.com> <4521802F.4010405@pathscale.com> <45226CDE.1060205@silverstorm.com> Message-ID: Ramachandra> This is similar to what is done in the SRP driver. In Ramachandra> fact I had chosen this approach of adding targets Ramachandra> looking at the SRP driver as the input parameters Ramachandra> that are required here are almost same as that of the Ramachandra> SRP driver. Yes, I think this is probably fine. The really hard rule for sysfs is that only one value should be shown per file -- I don't think we're as strict with what gets written into the files. With that said, perhaps configfs would make more sense here (and I've been meaning to see if it makes sense to move SRP to configfs too). - R. From yaronh at voltaire.com Tue Oct 3 12:05:15 2006 From: yaronh at voltaire.com (Yaron Haviv) Date: Tue, 3 Oct 2006 21:05:15 +0200 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) Message-ID: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Rimmer, Todd > Sent: Monday, October 02, 2006 5:46 PM > To: Scott Weitzenkamp (sweitzen); Kuchimanchi, Ramachandra; Roland Dreier > (rdreier) > Cc: openib-General > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm > Virtual Ethernet I/O controller (VEx) > > > From: Scott Weitzenkamp (sweitzen) > > Sent: Monday, October 02, 2006 4:22 PM > > To: Kuchimanchi, Ramachandra; Roland Dreier (rdreier) > > Cc: openib-General > > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for > SilverStorm > > Virtual Ethernet I/O controller (VEx) > > > > Is this communication protocols documented anywhere? How does this > > feature compare to IPoIB and SDP? > > > This protocol is distinct from IPoIB and SDP. > > In brief: > > IPoIB treats an IB fabric as a LAN. As such it has UD semantics. > > SDP essentially treats the HCA as a TOE and leverages IB's RC semantics > to emulate TCP/IP SOCK_STREAM sockets. > > This protocol implements the interface to communicate to the SilverStorm > VEx Ethernet Virtual IO Controllers. The VEx card presents a true > Ethernet NIC to the host and essentially treats IB as an IO bus to allow > a host CPU to use the VEx card as its NIC. > > Todd Rimmer > Todd, I'm trying to figure out why this protocol makes sense As far as I understand, IPoIB can provide a Virtual NIC functionality just as well (maybe even better), with two restrictions: 1. Lack of support for Jumbo Frames 2. Doesn't support protocols other than IP (e.g. IPX, ..) 1 can easily be addressed using IPoIB RC, and the question is if 2 is really a problem (how many people use IPX or apple talk .. these days) And if 2 is a problem why isn't it in a greater scope of supporting Ethernet emulation even between any IB nodes, and not just from a host to a gateway device. If this is a real requirement, why haven't SilverStorm worked with the industry and standardization bodies such as IBTA or IETF to come with a standard and interoperable way to address it, and not just try and push a proprietary driver and a point solution to the kernel. I believe we should first see if such a driver is needed and if IPoIB UD/RC cannot be leveraged for that, maybe the Ethernet emulation can just be an extension to IPoIB RC, hitting 3 birds in one stone (same infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all nodes not just Gateways) Yaron > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From eitan at mellanox.co.il Tue Oct 3 12:05:40 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 03 Oct 2006 21:05:40 +0200 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <86wt82lc1l.fsf@mtl066.yok.mtl.com> References: <86wt82lc1l.fsf@mtl066.yok.mtl.com> Message-ID: <4522B484.6040000@mellanox.co.il> Hi Hal, I see this was not committed. Do you see a reason it should not? Michael had a generic comment of why we do not check range of parsed values which also relates to "safe casting". But to completely fix these we will probably need a separate patch. Eitan Eitan Zahavi wrote: >Hi Hal > >Explicit cast required for the win compiler to handle this... > >Thanks > >Eitan > >Signed-off-by: Eitan Zahavi > >Index: osmtest/osmtest.c >=================================================================== >--- osmtest/osmtest.c (revision 9502) >+++ osmtest/osmtest.c (working copy) >@@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t > else > { > /* Also, this doesn't detect fewer than the correct number of paths being returned */ >- if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) ) >+ if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) ) > { > osm_log( &p_osmt->log, OSM_LOG_ERROR, > "osmtest_validate_path_data: ERR 0052: " > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Tue Oct 3 12:14:59 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 03 Oct 2006 21:14:59 +0200 Subject: [openib-general] [PATCH 12/13] osm: port to WinIB stack : opensm/osm_qos.c In-Reply-To: <86lkoilbz2.fsf@mtl066.yok.mtl.com> References: <86lkoilbz2.fsf@mtl066.yok.mtl.com> Message-ID: <4522B6B3.6040303@mellanox.co.il> Hi Hal, I did not see ant response to this one. Just a reminder Eitan Eitan Zahavi wrote: >Hi Hal > >Port num is uint8_t (avoid casting by using correct size field). >Added some explicit casts > >Thanks > >Eitan > >Signed-off-by: Eitan Zahavi > >Index: opensm/osm_qos.c >=================================================================== >--- opensm/osm_qos.c (revision 9502) >+++ opensm/osm_qos.c (working copy) >@@ -70,7 +70,7 @@ static void qos_build_config(struct qos_ > */ > static ib_api_status_t vlarb_update_table_block(osm_req_t * p_req, > osm_physp_t * p, >- unsigned port_num, >+ uint8_t port_num, > const ib_vl_arb_table_t *table_block, > unsigned block_length, > unsigned block_num) >@@ -80,7 +80,7 @@ static ib_api_status_t vlarb_update_tabl > uint32_t attr_mod; > ib_port_info_t *p_pi; > unsigned vl_mask; >- int i; >+ unsigned int i; > > if (!(p_pi = osm_physp_get_port_info_ptr(p))) > return IB_ERROR; >@@ -110,7 +110,7 @@ static ib_api_status_t vlarb_update_tabl > } > > static ib_api_status_t vlarb_update(osm_req_t * p_req, >- osm_physp_t * p, unsigned port_num, >+ osm_physp_t * p, uint8_t port_num, > const struct qos_config *qcfg) > { > ib_api_status_t status = IB_SUCCESS; >@@ -198,11 +198,11 @@ static ib_api_status_t sl2vl_update_tabl > } > > static ib_api_status_t sl2vl_update(osm_req_t * p_req, osm_port_t * p_port, >- osm_physp_t * p, unsigned port_num, >+ osm_physp_t * p, uint8_t port_num, > const struct qos_config *qcfg) > { > ib_api_status_t status; >- unsigned i, num_ports; >+ uint8_t i, num_ports; > ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p); > osm_physp_t *p_physp; > >@@ -273,7 +273,7 @@ static ib_api_status_t vl_high_limit_upd > > static ib_api_status_t qos_physp_setup(osm_log_t * p_log, osm_req_t * p_req, > osm_port_t * p_port, osm_physp_t * p, >- unsigned port_num, >+ uint8_t port_num, > const struct qos_config *qcfg) > { > ib_api_status_t status; >@@ -329,7 +329,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t > osm_physp_t *p_physp; > uint8_t node_type; > ib_api_status_t status; >- uint32_t i; >+ uint8_t i; > > if (p_osm->subn.opt.no_qos) > return OSM_SIGNAL_DONE; >@@ -411,7 +411,7 @@ static int parse_vlarb_entry(char *str, > p += parse_one_unsigned(p, ':', &val); > e->vl = val % 15; > p += parse_one_unsigned(p, ',', &val); >- e->weight = val; >+ e->weight = (uint8_t)val; > return p - str; > } > >@@ -434,7 +434,7 @@ static void qos_build_config(struct qos_ > memset(cfg, 0, sizeof(*cfg)); > > cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; >- cfg->vl_high_limit = opt->high_limit; >+ cfg->vl_high_limit = (uint8_t)opt->high_limit; > > p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; > for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Tue Oct 3 12:16:50 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 03 Oct 2006 21:16:50 +0200 Subject: [openib-general] [PATCH 13/13] osm: port to WinIB stack : opensm/osm_pkey_mgr.c In-Reply-To: <86k642lbym.fsf@mtl066.yok.mtl.com> References: <86k642lbym.fsf@mtl066.yok.mtl.com> Message-ID: <4522B722.6060009@mellanox.co.il> Hi Hal, I did not see any response on this one. Just a reminder Thanks Eitan Eitan Zahavi wrote: >Hi Hal > >Avoid using array initialization statements which do not compile on win. > >Thanks > >Eitan > >Signed-off-by: Eitan Zahavi > >Index: opensm/osm_pkey_mgr.c >=================================================================== >--- opensm/osm_pkey_mgr.c (revision 9502) >+++ opensm/osm_pkey_mgr.c (working copy) >@@ -67,7 +67,7 @@ > a different place for switch external ports (SwitchInfo) and the > rest of the ports (NodeInfo). > */ >-static int >+static uint16_t > pkey_mgr_get_physp_max_blocks( > IN const osm_subn_t *p_subn, > IN const osm_physp_t *p_physp ) >@@ -132,8 +132,8 @@ pkey_mgr_process_physical_port( > CL_ASSERT( ib_pkey_get_base( *p_orig_pkey ) == ib_pkey_get_base( pkey ) ); > p_pending->is_new = FALSE; > if (osm_pkey_tbl_get_block_and_idx( >- p_pkey_tbl, p_orig_pkey, >- &p_pending->block, &p_pending->index ) != IB_SUCCESS) >+ p_pkey_tbl, p_orig_pkey, >+ &p_pending->block, &p_pending->index ) != IB_SUCCESS) > { > osm_log( p_log, OSM_LOG_ERROR, > "pkey_mgr_process_physical_port: ERR 0503: " >@@ -276,7 +276,8 @@ static boolean_t pkey_mgr_update_port( > boolean_t ret_val = FALSE; > osm_pending_pkey_t *p_pending; > boolean_t found; >- ib_pkey_table_t empty_block = {.pkey_entry = {0}, }; >+ ib_pkey_table_t empty_block; >+ memset(&empty_block, 0, sizeof(ib_pkey_table_t)); > > p_physp = osm_port_get_default_phys_ptr( p_port ); > if ( !osm_physp_is_valid( p_physp ) ) >@@ -403,7 +404,8 @@ pkey_mgr_update_peer_port( > uint16_t peer_max_blocks; > ib_api_status_t status = IB_SUCCESS; > boolean_t ret_val = FALSE; >- ib_pkey_table_t empty_block = {.pkey_entry = {0}, }; >+ ib_pkey_table_t empty_block; >+ memset(&empty_block, 0, sizeof(ib_pkey_table_t)); > > p_physp = osm_port_get_default_phys_ptr( p_port ); > if (!osm_physp_is_valid( p_physp )) > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Tue Oct 3 12:25:24 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 03 Oct 2006 21:25:24 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <20060917173518.GC32526@mellanox.co.il> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> <20060917173518.GC32526@mellanox.co.il> Message-ID: <4522B924.8030504@mellanox.co.il> Hi Hal, This is another case where Michael complains about the patch not providing range checking. However, range checking is not implemented for the rest of this parser code. So I think the range check should be a separate patch. Please let me know if this works for you Thanks Michael S. Tsirkin wrote: >Quoting r. Eitan Zahavi : > > >> p++; >>- port_num = strtoul(p, &q, 10); >>+ port_num = (uint8_t)strtoul(p, &q, 10); >> if (q && !isspace(*q)) { >> >> > >Would it make sense to range-check the value before casting it away? > > From ftillier at silverstorm.com Tue Oct 3 12:31:31 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 3 Oct 2006 12:31:31 -0700 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> Message-ID: <79ae2f320610031231t74f2c69due3c48508c152b409@mail.gmail.com> Hi Yaron, On 10/3/06, Yaron Haviv wrote: > > I'm trying to figure out why this protocol makes sense > As far as I understand, IPoIB can provide a Virtual NIC functionality > just as well (maybe even better), with two restrictions: > 1. Lack of support for Jumbo Frames > 2. Doesn't support protocols other than IP (e.g. IPX, ..) Whether to use a router or virtual NIC approach for connectivity to Ethernet subnets is a design decision. We could argue until we are blue in the face about which architecture is "better", but that's really not relevant. > I believe we should first see if such a driver is needed and if IPoIB > UD/RC cannot be leveraged for that, maybe the Ethernet emulation can > just be an extension to IPoIB RC, hitting 3 birds in one stone (same > infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all > nodes not just Gateways) You're joking right? Are you really arguing that SilverStorm should not develop a driver to support its existing devices? This really isn't complicated: 1). SilverStorm has a virtual NIC hardware device. 2). SilverStorm is committed to support OpenFabrics. The above two statements lead to the following conclusion: SilverStorm needs a driver for its devices that works with the OpenFabrics stack. This is totally orthogonal to and independent of working on IPoIB RC or any IETF efforts to define something new. - Fab From halr at voltaire.com Tue Oct 3 12:41:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 15:41:28 -0400 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <4522B484.6040000@mellanox.co.il> References: <86wt82lc1l.fsf@mtl066.yok.mtl.com> <4522B484.6040000@mellanox.co.il> Message-ID: <1159904488.4502.5642.camel@hal.voltaire.com> Hi Eitan, On Tue, 2006-10-03 at 15:05, Eitan Zahavi wrote: > Hi Hal, > > I see this was not committed. > Do you see a reason it should not? I had gotten part way through the Windows patches before the start of last week and have not got back to them. This and some others in the series are still pending. > Michael had a generic comment of why we do not check range of parsed > values which also relates to "safe casting". > But to completely fix these we will probably need a separate patch. Is this going to be done ? Can it be done the "first" time rather than committing this and fixing later ? -- Hal > Eitan > > Eitan Zahavi wrote: > > >Hi Hal > > > >Explicit cast required for the win compiler to handle this... > > > >Thanks > > > >Eitan > > > >Signed-off-by: Eitan Zahavi > > > >Index: osmtest/osmtest.c > >=================================================================== > >--- osmtest/osmtest.c (revision 9502) > >+++ osmtest/osmtest.c (working copy) > >@@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t > > else > > { > > /* Also, this doesn't detect fewer than the correct number of paths being returned */ > >- if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) ) > >+ if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) ) > > { > > osm_log( &p_osmt->log, OSM_LOG_ERROR, > > "osmtest_validate_path_data: ERR 0052: " > > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From halr at voltaire.com Tue Oct 3 12:51:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 15:51:41 -0400 Subject: [openib-general] [PATCH] diags/ibportstate: Support changing LinkSpeedEnabled on any IB port Message-ID: <1159905100.4502.5986.camel@hal.voltaire.com> diags/ibportstate: Support changing LinkSpeedEnabled on any IB port Signed-off-by: Hal Rosenstock Index: src/ibportstate.c =================================================================== --- src/ibportstate.c (revision 9687) +++ src/ibportstate.c (working copy) @@ -99,10 +99,11 @@ get_node_info(ib_portid_t *dest, char *d } static char * -get_port_info(ib_portid_t *dest, char *data, char **argv, int argc) +get_port_info(ib_portid_t *dest, char *data, char **argv, int argc, int port_op) { char buf[2048]; int portnum = 0; + char val[64]; if (argc > 0) portnum = strtol(argv[0], 0, 0); @@ -110,17 +111,24 @@ get_port_info(ib_portid_t *dest, char *d if (!smp_query(data, dest, IB_ATTR_PORT_INFO, portnum, 0)) return "smp query portinfo failed"; - mad_dump_portstates(buf, sizeof buf, data, sizeof data); + if (port_op != 4) + mad_dump_portstates(buf, sizeof buf, data, sizeof data); + else { + mad_decode_field(data, IB_PORT_LINK_SPEED_ENABLED_F, val); + mad_dump_field(IB_PORT_LINK_SPEED_ENABLED_F, buf, sizeof buf, val); + sprintf(buf+strlen(buf), "%s", "\n"); + } printf("# Port info: %s port %d\n%s", portid2str(dest), portnum, buf); return 0; } static char * -set_port_info(ib_portid_t *dest, char *data, char **argv, int argc) +set_port_info(ib_portid_t *dest, char *data, char **argv, int argc, int port_op) { char buf[2048]; int portnum = 0; + char val[64]; if (argc > 0) portnum = strtol(argv[0], 0, 0); @@ -128,9 +136,15 @@ set_port_info(ib_portid_t *dest, char *d if (!smp_set(data, dest, IB_ATTR_PORT_INFO, portnum, 0)) return "smp set failed"; - mad_dump_portstates(buf, sizeof buf, data, sizeof data); + if (port_op != 4) + mad_dump_portstates(buf, sizeof buf, data, sizeof data); + else { + mad_decode_field(data, IB_PORT_LINK_SPEED_ENABLED_F, val); + mad_dump_field(IB_PORT_LINK_SPEED_ENABLED_F, buf, sizeof buf, val); + sprintf(buf+strlen(buf), "%s", "\n"); + } - printf("\nPort states after set:\n"); + printf("\nAfter PortInfo set:\n"); printf("# Port info: %s port %d\n%s", portid2str(dest), portnum, buf); return 0; } @@ -148,11 +162,13 @@ usage(void) fprintf(stderr, "Usage: %s [-d(ebug) -e(rr_show) -v(erbose) -D(irect) -G(uid) -s smlid -V(ersion) -C ca_name -P ca_port " "-t(imeout) timeout_ms] []\n", basename); - fprintf(stderr, "\tsupported ops: enable, disable, reset, query\n"); + fprintf(stderr, "\tsupported ops: enable, disable, reset, speed, query\n"); fprintf(stderr, "\n\texamples:\n"); fprintf(stderr, "\t\t%s 3 1 disable\t\t\t# by lid\n", basename); fprintf(stderr, "\t\t%s -G 0x2C9000100D051 1 enable\t# by guid\n", basename); - fprintf(stderr, "\t\t%s -D 0 1\t\t\t# by direct route\n", basename); + fprintf(stderr, "\t\t%s -D 0 1\t\t\t# (query) by direct route\n", basename); + fprintf(stderr, "\t\t%s 3 1 reset\t\t\t# by lid\n", basename); + fprintf(stderr, "\t\t%s 3 1 speed 1\t\t\t# by lid\n", basename); exit(-1); } @@ -167,6 +183,7 @@ main(int argc, char **argv) char *ca = 0; int ca_port = 0; int port_op = 0; /* default to query */ + int speed = 15; char *err; char data[IB_SMP_DATA_SIZE]; @@ -254,14 +271,23 @@ main(int argc, char **argv) port_op = 2; else if (!strcmp(argv[2], "reset")) port_op = 3; + else if (!strcmp(argv[2], "speed")) { + if (argc < 4) + IBERROR("speed requires an additional parameter"); + port_op = 4; + /* Parse speed value */ + speed = strtoul(argv[3], 0, 0); + if (speed > 15) + IBERROR("invalid speed value %d", speed); + } } - if (port_op) + if (port_op && (port_op != 4)) if ((err = get_node_info(&portid, data, argv+1, argc-1))) IBERROR("smpquery nodeinfo: %s", err); - printf("Initial port states:\n"); - if ((err = get_port_info(&portid, data, argv+1, argc-1))) + printf("Initial PortInfo:\n"); + if ((err = get_port_info(&portid, data, argv+1, argc-1, port_op))) IBERROR("smpquery portinfo: %s", err); /* Only if one of the "set" options is chosen */ @@ -271,13 +297,17 @@ main(int argc, char **argv) else if ((port_op == 2) || (port_op == 3)) { /* Disable port */ mad_set_field(data, 0, IB_PORT_STATE_F, 1); /* Down */ mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 3); /* Disabled */ + } else if (port_op == 4) { /* Set speed */ + mad_set_field(data, 0, IB_PORT_LINK_SPEED_ENABLED_F, speed); + mad_set_field(data, 0, IB_PORT_STATE_F, 0); + mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 0); } - if ((err = set_port_info(&portid, data, argv+1, argc-1))) + if ((err = set_port_info(&portid, data, argv+1, argc-1, port_op))) IBERROR("smpset portinfo: %s", err); if (port_op == 3) { /* Reset port - so also enable */ mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 2); /* Polling */ - if ((err = set_port_info(&portid, data, argv+1, argc-1))) + if ((err = set_port_info(&portid, data, argv+1, argc-1, port_op))) IBERROR("smpset portinfo: %s", err); } } Index: man/ibportstate.8 =================================================================== --- man/ibportstate.8 (revision 9686) +++ man/ibportstate.8 (working copy) @@ -1,7 +1,7 @@ .TH IBPORTSTATE 8 "October 3, 2006" "OpenIB" "OpenIB Diagnostics" .SH NAME -ibportstate \- handle port state and port physical state of an InfiniBand port +ibportstate \- handle port (physical) state and link speed of an InfiniBand port .SH SYNOPSIS .B ibportstate @@ -10,7 +10,8 @@ ibportstate \- handle port state and por .SH DESCRIPTION .PP ibportstate allows the port state and port physical state of an IB port -to be queried or a switch port to be disabled, enabled, or reset. +to be queried or a switch port to be disabled, enabled, or reset. It +also allows the link speed enabled on any IB port to be adjusted. .SH OPTIONS @@ -18,10 +19,17 @@ to be queried or a switch port to be dis .TP op Port operations allowed - supported ops: enable, disable, reset, query + supported ops: enable, disable, reset, speed, query Default is query +.PP ops enable, disable, and reset are only allowed on switch ports - (An error is returned if attempted on CA or router ports) + (An error is indicated if attempted on CA or router ports) + speed op is allowed on any port + speed values are legal values for PortInfo:LinkSpeedEnabled + (An error is indicated if PortInfo:LinkSpeedSupported does not support + this setting) + (NOTE: Speed changes are not effected until the port goes through + link renegotiation) .SH COMMON OPTIONS @@ -84,7 +92,11 @@ ibportstate 3 1 disable .PP ibportstate -G 0x2C9000100D051 1 enable # by guid .PP -ibportstate -D 0 1 # by direct route +ibportstate -D 0 1 # (query) by direct route +.PP +ibportstate 3 1 reset # by lid +.PP +ibportstate 3 1 speed 1 # by lid .SH AUTHOR .TP From halr at voltaire.com Tue Oct 3 13:19:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 16:19:43 -0400 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <4522B484.6040000@mellanox.co.il> References: <86wt82lc1l.fsf@mtl066.yok.mtl.com> <4522B484.6040000@mellanox.co.il> Message-ID: <1159906782.4502.6967.camel@hal.voltaire.com> Hi Eitan, On Tue, 2006-10-03 at 15:05, Eitan Zahavi wrote: > Hi Hal, > > I see this was not committed. > Do you see a reason it should not? > > Michael had a generic comment of why we do not check range of parsed > values which also relates to "safe casting". > But to completely fix these we will probably need a separate patch. > > Eitan > > Eitan Zahavi wrote: > > >Hi Hal > > > >Explicit cast required for the win compiler to handle this... > > > >Thanks > > > >Eitan > > > >Signed-off-by: Eitan Zahavi > > > >Index: osmtest/osmtest.c > >=================================================================== > >--- osmtest/osmtest.c (revision 9502) > >+++ osmtest/osmtest.c (working copy) > >@@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t > > else > > { > > /* Also, this doesn't detect fewer than the correct number of paths being returned */ > >- if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) ) > >+ if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) ) What about Michael's comment: Does: if ( p_path->count >= 1u << (2*lmc)) work for Windows ? If so, that would be better. -- Hal > > { > > osm_log( &p_osmt->log, OSM_LOG_ERROR, > > "osmtest_validate_path_data: ERR 0052: " > > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From krause at cup.hp.com Tue Oct 3 13:26:08 2006 From: krause at cup.hp.com (Michael Krause) Date: Tue, 03 Oct 2006 13:26:08 -0700 Subject: [openib-general] Drop in performance on Mellanox MT25204 single port DDR HCA In-Reply-To: References: Message-ID: <6.2.0.14.2.20061003132422.03621eb8@esmail.cup.hp.com> At 02:43 PM 10/2/2006, Roland Dreier wrote: > Robert> Yes. 1250Mbytes/sec is what we expect. You say the 128 > Robert> value comes from the BIOS ? If so, we need to discuss this > Robert> with our BIOS team to find out why they limit it to 128, > Robert> perhaps it is a BIOS bug. > >Yes, I believe that the BIOS is the only place that would set that >value. We know that resetting the device makes it go back to a >different default value, and nothing in the kernel that I know of is >going to set it down to 128. 128B is the default minimum from PCIe so likely some BIOS engineer took a conservative view and chose the defaults (go figure). Setting Max Read Request Size = 4096 is preferred on any implementation as it is basically free from a chipset perspective. The chipset will likely return in cache line quantities but there is some obvious optimizations to be achieved by issuing a single DMA Read Request. Mike From krause at cup.hp.com Tue Oct 3 13:33:11 2006 From: krause at cup.hp.com (Michael Krause) Date: Tue, 03 Oct 2006 13:33:11 -0700 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <79ae2f320610031231t74f2c69due3c48508c152b409@mail.gmail.co m> References: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> <79ae2f320610031231t74f2c69due3c48508c152b409@mail.gmail.com> Message-ID: <6.2.0.14.2.20061003132954.036086a0@esmail.cup.hp.com> Silverstorm is executing a usage model that the IBTA used to develop the IB protocols. What is the problem with that? If it works and integrates into the stack, then this seems like an appropriate bit of functionality to support. The fact that one can use a standard ULP to communicate to a TCA as an alternative which is supported by the existing stack is a customer product decision at the end of the day. If Silverstorm or any IHV can show value and that it works in the stack, then it seems appropriate to support. Isn't that a fundamental principle of being an open source effort? Mike At 12:31 PM 10/3/2006, Fabian Tillier wrote: >Hi Yaron, > >On 10/3/06, Yaron Haviv wrote: > > > > I'm trying to figure out why this protocol makes sense > > As far as I understand, IPoIB can provide a Virtual NIC functionality > > just as well (maybe even better), with two restrictions: > > 1. Lack of support for Jumbo Frames > > 2. Doesn't support protocols other than IP (e.g. IPX, ..) > >Whether to use a router or virtual NIC approach for connectivity to >Ethernet subnets is a design decision. We could argue until we are >blue in the face about which architecture is "better", but that's >really not relevant. > > > I believe we should first see if such a driver is needed and if IPoIB > > UD/RC cannot be leveraged for that, maybe the Ethernet emulation can > > just be an extension to IPoIB RC, hitting 3 birds in one stone (same > > infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all > > nodes not just Gateways) > >You're joking right? Are you really arguing that SilverStorm should >not develop a driver to support its existing devices? This really >isn't complicated: > >1). SilverStorm has a virtual NIC hardware device. >2). SilverStorm is committed to support OpenFabrics. > >The above two statements lead to the following conclusion: SilverStorm >needs a driver for its devices that works with the OpenFabrics stack. >This is totally orthogonal to and independent of working on IPoIB RC >or any IETF efforts to define something new. > >- Fab > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue Oct 3 13:37:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 13:37:19 -0700 Subject: [openib-general] [PATCH][RFC] Add node_type / transport_type to struct ibv_device Message-ID: So I finally got around to working on this... Anyway, here's a patch that adds node_type and transport_type members to struct ibv_device. I just set them up once when the device is initialized, rather than adding all sorts of query functions etc. How does this strike you? - R. From rdreier at cisco.com Tue Oct 3 13:48:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 13:48:28 -0700 Subject: [openib-general] [PATCH][RFC] Add node_type / transport_type to struct ibv_device In-Reply-To: (Roland Dreier's message of "Tue, 03 Oct 2006 13:37:19 -0700") References: Message-ID: err, here's the patch: Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 9680) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -66,9 +66,17 @@ union ibv_gid { }; enum ibv_node_type { - IBV_NODE_CA = 1, + IBV_NODE_UNKNOWN = -1, + IBV_NODE_CA = 1, IBV_NODE_SWITCH, - IBV_NODE_ROUTER + IBV_NODE_ROUTER, + IBV_NODE_RNIC +}; + +enum ibv_transport_type { + IBV_TRANSPORT_UNKNOWN = -1, + IBV_TRANSPORT_IB = 0, + IBV_TRANSPORT_IWARP }; enum ibv_device_cap_flags { @@ -577,6 +585,8 @@ enum { struct ibv_device { struct ibv_driver *driver; struct ibv_device_ops ops; + enum ibv_node_type node_type; + enum ibv_transport_type transport_type; /* Name of underlying kernel IB device, eg "mthca0" */ char name[IBV_SYSFS_NAME_MAX]; /* Name of uverbs device, eg "uverbs0" */ Index: libibverbs/ChangeLog =================================================================== --- libibverbs/ChangeLog (revision 9680) +++ libibverbs/ChangeLog (working copy) @@ -1,3 +1,12 @@ +2006-10-03 Roland Dreier + + * src/init.c (init_drivers): Set node_type and transport_type + values of device being created. + + * include/infiniband/verbs.h: Add ibv_node_type enum value + IBV_NODE_RNIC, and add enum ibv_transport_type. Add node_type and + transport_type fields to struct ibv_device. + 2006-09-12 Roland Dreier * include/infiniband/verbs.h: Swap wr_id and next members of Index: libibverbs/src/init.c =================================================================== --- libibverbs/src/init.c (revision 9680) +++ libibverbs/src/init.c (working copy) @@ -130,7 +130,9 @@ static struct ibv_device *init_drivers(c int abi_ver = 0; char sys_path[IBV_SYSFS_PATH_MAX]; char ibdev_name[IBV_SYSFS_NAME_MAX]; + char ibdev_path[IBV_SYSFS_PATH_MAX]; char value[8]; + enum ibv_node_type node_type; snprintf(sys_path, sizeof sys_path, "%s/%s", class_path, dev_name); @@ -144,17 +146,44 @@ static struct ibv_device *init_drivers(c return NULL; } + snprintf(ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", + ibv_get_sysfs_path(), ibdev_name); + + if (ibv_read_sysfs_file(ibdev_path, "node_type", value, sizeof value) < 0) { + fprintf(stderr, PFX "Warning: no node_type attr for %s\n", + ibdev_path); + return NULL; + } + node_type = strtol(value, NULL, 10); + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) + node_type = IBV_NODE_UNKNOWN; + for (driver = driver_list; driver; driver = driver->next) { dev = driver->init_func(sys_path, abi_ver); if (!dev) continue; dev->driver = driver; + dev->node_type = node_type; + + switch (node_type) { + case IBV_NODE_CA: + case IBV_NODE_SWITCH: + case IBV_NODE_ROUTER: + dev->transport_type = IBV_TRANSPORT_IB; + break; + case IBV_NODE_RNIC: + dev->transport_type = IBV_TRANSPORT_IWARP; + break; + default: + dev->transport_type = IBV_TRANSPORT_UNKNOWN; + break; + } + strcpy(dev->dev_path, sys_path); - snprintf(dev->ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", - ibv_get_sysfs_path(), ibdev_name); strcpy(dev->dev_name, dev_name); strcpy(dev->name, ibdev_name); + strcpy(dev->ibdev_path, ibdev_path); return dev; } From halr at voltaire.com Tue Oct 3 13:52:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 16:52:07 -0400 Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack : opensm/osm_prtn_config.c In-Reply-To: <86u036lc11.fsf@mtl066.yok.mtl.com> References: <86u036lc11.fsf@mtl066.yok.mtl.com> Message-ID: <1159908726.4502.8124.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote: > Hi Hal > > 1. Avoid varargs macros not supported by win What about using __VA_ARGS__? It is C99, and MS claims that they support this (http://msdn2.microsoft.com/en-us/library/ms177415.aspx), in this way: #define MACRO(s, ...) printf(s, __VA_ARGS__) This should work with GNU as well. > 2. Some explicit casting required I applied this part of the patch. -- Hal > Thanks > > Eitan From richard.frank at oracle.com Tue Oct 3 13:53:59 2006 From: richard.frank at oracle.com (rick) Date: Tue, 03 Oct 2006 16:53:59 -0400 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <6.2.0.14.2.20061003132954.036086a0@esmail.cup.hp.com> References: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> <79ae2f320610031231t74f2c69due3c48508c152b409@mail.gmail.com> <6.2.0.14.2.20061003132954.036086a0@esmail.cup.hp.com> Message-ID: <4522CDE7.6070206@oracle.com> For what it's worth: As a customer who is using the SS stack - we were more than pleased that we could achieve IPOIB (and RDS) failover without using the bonding driver. I believe this is direct result of the Virtual NIC approach SS is using. Michael Krause wrote: >Silverstorm is executing a usage model that the IBTA used to develop the IB >protocols. What is the problem with that? If it works and integrates >into the stack, then this seems like an appropriate bit of functionality to >support. The fact that one can use a standard ULP to communicate to a TCA >as an alternative which is supported by the existing stack is a customer >product decision at the end of the day. If Silverstorm or any IHV can >show value and that it works in the stack, then it seems appropriate to >support. Isn't that a fundamental principle of being an open source effort? > > >Mike > > >At 12:31 PM 10/3/2006, Fabian Tillier wrote: > > >>Hi Yaron, >> >>On 10/3/06, Yaron Haviv wrote: >> >> >>>I'm trying to figure out why this protocol makes sense >>>As far as I understand, IPoIB can provide a Virtual NIC functionality >>>just as well (maybe even better), with two restrictions: >>>1. Lack of support for Jumbo Frames >>>2. Doesn't support protocols other than IP (e.g. IPX, ..) >>> >>> >>Whether to use a router or virtual NIC approach for connectivity to >>Ethernet subnets is a design decision. We could argue until we are >>blue in the face about which architecture is "better", but that's >>really not relevant. >> >> >> >>>I believe we should first see if such a driver is needed and if IPoIB >>>UD/RC cannot be leveraged for that, maybe the Ethernet emulation can >>>just be an extension to IPoIB RC, hitting 3 birds in one stone (same >>>infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all >>>nodes not just Gateways) >>> >>> >>You're joking right? Are you really arguing that SilverStorm should >>not develop a driver to support its existing devices? This really >>isn't complicated: >> >>1). SilverStorm has a virtual NIC hardware device. >>2). SilverStorm is committed to support OpenFabrics. >> >>The above two statements lead to the following conclusion: SilverStorm >>needs a driver for its devices that works with the OpenFabrics stack. >>This is totally orthogonal to and independent of working on IPoIB RC >>or any IETF efforts to define something new. >> >>- Fab >> >>_______________________________________________ >>openib-general mailing list >>openib-general at openib.org >>http://openib.org/mailman/listinfo/openib-general >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general >> >> > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From swise at opengridcomputing.com Tue Oct 3 14:01:53 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 03 Oct 2006 16:01:53 -0500 Subject: [openib-general] [PATCH][RFC] Add node_type / transport_type to struct ibv_device In-Reply-To: References: Message-ID: <1159909313.24791.27.camel@stevo-desktop> looks good to me. Steve. On Tue, 2006-10-03 at 13:48 -0700, Roland Dreier wrote: > err, here's the patch: > > Index: libibverbs/include/infiniband/verbs.h > =================================================================== > --- libibverbs/include/infiniband/verbs.h (revision 9680) > +++ libibverbs/include/infiniband/verbs.h (working copy) > @@ -66,9 +66,17 @@ union ibv_gid { > }; > > enum ibv_node_type { > - IBV_NODE_CA = 1, > + IBV_NODE_UNKNOWN = -1, > + IBV_NODE_CA = 1, > IBV_NODE_SWITCH, > - IBV_NODE_ROUTER > + IBV_NODE_ROUTER, > + IBV_NODE_RNIC > +}; > + > +enum ibv_transport_type { > + IBV_TRANSPORT_UNKNOWN = -1, > + IBV_TRANSPORT_IB = 0, > + IBV_TRANSPORT_IWARP > }; > > enum ibv_device_cap_flags { > @@ -577,6 +585,8 @@ enum { > struct ibv_device { > struct ibv_driver *driver; > struct ibv_device_ops ops; > + enum ibv_node_type node_type; > + enum ibv_transport_type transport_type; > /* Name of underlying kernel IB device, eg "mthca0" */ > char name[IBV_SYSFS_NAME_MAX]; > /* Name of uverbs device, eg "uverbs0" */ > Index: libibverbs/ChangeLog > =================================================================== > --- libibverbs/ChangeLog (revision 9680) > +++ libibverbs/ChangeLog (working copy) > @@ -1,3 +1,12 @@ > +2006-10-03 Roland Dreier > + > + * src/init.c (init_drivers): Set node_type and transport_type > + values of device being created. > + > + * include/infiniband/verbs.h: Add ibv_node_type enum value > + IBV_NODE_RNIC, and add enum ibv_transport_type. Add node_type and > + transport_type fields to struct ibv_device. > + > 2006-09-12 Roland Dreier > > * include/infiniband/verbs.h: Swap wr_id and next members of > Index: libibverbs/src/init.c > =================================================================== > --- libibverbs/src/init.c (revision 9680) > +++ libibverbs/src/init.c (working copy) > @@ -130,7 +130,9 @@ static struct ibv_device *init_drivers(c > int abi_ver = 0; > char sys_path[IBV_SYSFS_PATH_MAX]; > char ibdev_name[IBV_SYSFS_NAME_MAX]; > + char ibdev_path[IBV_SYSFS_PATH_MAX]; > char value[8]; > + enum ibv_node_type node_type; > > snprintf(sys_path, sizeof sys_path, "%s/%s", > class_path, dev_name); > @@ -144,17 +146,44 @@ static struct ibv_device *init_drivers(c > return NULL; > } > > + snprintf(ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", > + ibv_get_sysfs_path(), ibdev_name); > + > + if (ibv_read_sysfs_file(ibdev_path, "node_type", value, sizeof value) < 0) { > + fprintf(stderr, PFX "Warning: no node_type attr for %s\n", > + ibdev_path); > + return NULL; > + } > + node_type = strtol(value, NULL, 10); > + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) > + node_type = IBV_NODE_UNKNOWN; > + > for (driver = driver_list; driver; driver = driver->next) { > dev = driver->init_func(sys_path, abi_ver); > if (!dev) > continue; > > dev->driver = driver; > + dev->node_type = node_type; > + > + switch (node_type) { > + case IBV_NODE_CA: > + case IBV_NODE_SWITCH: > + case IBV_NODE_ROUTER: > + dev->transport_type = IBV_TRANSPORT_IB; > + break; > + case IBV_NODE_RNIC: > + dev->transport_type = IBV_TRANSPORT_IWARP; > + break; > + default: > + dev->transport_type = IBV_TRANSPORT_UNKNOWN; > + break; > + } > + > strcpy(dev->dev_path, sys_path); > - snprintf(dev->ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", > - ibv_get_sysfs_path(), ibdev_name); > strcpy(dev->dev_name, dev_name); > strcpy(dev->name, ibdev_name); > + strcpy(dev->ibdev_path, ibdev_path); > > return dev; > } From mst at mellanox.co.il Tue Oct 3 14:11:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 3 Oct 2006 23:11:39 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45228C7A.5070402@ichips.intel.com> References: <45228C7A.5070402@ichips.intel.com> Message-ID: <20061003211139.GB7787@mellanox.co.il> Quoting r. Sean Hefty : > Did you > carry the rdma_conn_param patch up to userspace as well? Yes. -- MST From ishai at mellanox.co.il Tue Oct 3 14:14:59 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 3 Oct 2006 23:14:59 +0200 Subject: [openib-general] [PATCH] IB/SRP: Remove redundant memset of the target Message-ID: <20061003211459.GA16503@mellanox.co.il> scsi_host_alloc already sets the entire scsi_host (including the privsize to zero) This patch removes the redundant memset. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-10-03 15:41:49.000000000 +0200 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-10-03 15:55:19.000000000 +0200 @@ -1731,7 +1731,6 @@ static ssize_t srp_create_target(struct target_host->max_lun = SRP_MAX_LUN; target = host_to_target(target_host); - memset(target, 0, sizeof *target); target->io_class = SRP_REV16A_IB_IO_CLASS; target->scsi_host = target_host; -- Ishai Rabinovitz From halr at voltaire.com Tue Oct 3 14:18:48 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 17:18:48 -0400 Subject: [openib-general] [PATCH 13/13] osm: port to WinIB stack : opensm/osm_pkey_mgr.c In-Reply-To: <86k642lbym.fsf@mtl066.yok.mtl.com> References: <86k642lbym.fsf@mtl066.yok.mtl.com> Message-ID: <1159910327.4502.9074.camel@hal.voltaire.com> On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote: > Hi Hal > > Avoid using array initialization statements which do not compile on win. > > Thanks Thanks. Applied. -- Hal > Eitan From ishai at mellanox.co.il Tue Oct 3 14:19:21 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 3 Oct 2006 23:19:21 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout Message-ID: <20061003211921.GB16503@mellanox.co.il> There is a bug in SRP Engenio target that send a large value as service timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long timeout is not reasonable and it may leave the kernel module waiting on wait_for_completion and may stuck a lot of processes. The following patch allows the load of ib_cm module with a limit on the timeout. Signed-off-by: Ishai Rabinovitz --- Index: last_stable/drivers/infiniband/core/cm.c =================================================================== --- last_stable.orig/drivers/infiniband/core/cm.c 2006-10-03 15:30:38.000000000 +0200 +++ last_stable/drivers/infiniband/core/cm.c 2006-10-03 15:39:53.000000000 +0200 @@ -54,6 +54,13 @@ MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("InfiniBand CM"); MODULE_LICENSE("Dual BSD/GPL"); +static int mra_timeout_limit = 0; + +module_param(mra_timeout_limit, int, 0444); +MODULE_PARM_DESC(mra_timeout_limit, + "Limit the MRA timeout according to this value if != 0"); + + static void cm_add_one(struct ib_device *device); static void cm_remove_one(struct ib_device *device); @@ -2297,6 +2304,9 @@ static int cm_mra_handler(struct cm_work timeout = cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + cm_convert_to_ms(cm_id_priv->av.packet_life_time); + if (mra_timeout_limit && timeout > mra_timeout_limit) + timeout = mra_timeout_limit; + spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { case IB_CM_REQ_SENT: -- Ishai Rabinovitz From yaronh at voltaire.com Tue Oct 3 14:23:22 2006 From: yaronh at voltaire.com (Yaron Haviv) Date: Tue, 3 Oct 2006 23:23:22 +0200 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) Message-ID: <35EA21F54A45CB47B879F21A91F4862F010516F2@taurus.voltaire.com> > -----Original Message----- > From: rick [mailto:richard.frank at oracle.com] > Sent: Tuesday, October 03, 2006 4:54 PM > To: Michael Krause > Cc: Fabian Tillier; Yaron Haviv; Roland Dreier (rdreier); Kuchimanchi, > Ramachandra; openib-General > Subject: Re: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm > Virtual Ethernet I/O controller (VEx) > > For what it's worth: As a customer who is using the SS stack - we were > more than pleased that we could achieve IPOIB (and RDS) failover without > using the bonding driver. I believe this is direct result of the Virtual > NIC approach SS is using. Rick, if such functionality (w/o the bonding driver) is needed It can also be implemented into IPoIB (we had it in our old stack) It has no direct relation to the Virtual NIC. It may even be preferred if it's IPoIB and not a proprietary gateway driver, so also IB nodes in the same fabric can use that functionality. The only point I'm making is that any one can add an overlay driver for his proprietary HW as he likes, and put it in OFED distribution, but if this is becoming an internal portion of the open fabric kernel than: 1. Let's look at how we solve the problems in a more general perspective 2. Let's not duplicate code where we can avoid it 3. Let's make sure it's documented and reviewed (code and architectural wise) We have kept those standards for all other solutions; I think it's just as fair to demand it in that case as well Yaron > > Michael Krause wrote: > > >Silverstorm is executing a usage model that the IBTA used to develop the > IB > >protocols. What is the problem with that? If it works and integrates > >into the stack, then this seems like an appropriate bit of functionality > to > >support. The fact that one can use a standard ULP to communicate to a > TCA > >as an alternative which is supported by the existing stack is a customer > >product decision at the end of the day. If Silverstorm or any IHV can > >show value and that it works in the stack, then it seems appropriate to > >support. Isn't that a fundamental principle of being an open source > effort? > > > > > >Mike > > > > > >At 12:31 PM 10/3/2006, Fabian Tillier wrote: > > > > > >>Hi Yaron, > >> > >>On 10/3/06, Yaron Haviv wrote: > >> > >> > >>>I'm trying to figure out why this protocol makes sense > >>>As far as I understand, IPoIB can provide a Virtual NIC functionality > >>>just as well (maybe even better), with two restrictions: > >>>1. Lack of support for Jumbo Frames > >>>2. Doesn't support protocols other than IP (e.g. IPX, ..) > >>> > >>> > >>Whether to use a router or virtual NIC approach for connectivity to > >>Ethernet subnets is a design decision. We could argue until we are > >>blue in the face about which architecture is "better", but that's > >>really not relevant. > >> > >> > >> > >>>I believe we should first see if such a driver is needed and if IPoIB > >>>UD/RC cannot be leveraged for that, maybe the Ethernet emulation can > >>>just be an extension to IPoIB RC, hitting 3 birds in one stone (same > >>>infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all > >>>nodes not just Gateways) > >>> > >>> > >>You're joking right? Are you really arguing that SilverStorm should > >>not develop a driver to support its existing devices? This really > >>isn't complicated: > >> > >>1). SilverStorm has a virtual NIC hardware device. > >>2). SilverStorm is committed to support OpenFabrics. > >> > >>The above two statements lead to the following conclusion: SilverStorm > >>needs a driver for its devices that works with the OpenFabrics stack. > >>This is totally orthogonal to and independent of working on IPoIB RC > >>or any IETF efforts to define something new. > >> > >>- Fab > >> > >>_______________________________________________ > >>openib-general mailing list > >>openib-general at openib.org > >>http://openib.org/mailman/listinfo/openib-general > >> > >>To unsubscribe, please visit > >>http://openib.org/mailman/listinfo/openib-general > >> > >> > > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general > > > > > > From swise at opengridcomputing.com Tue Oct 3 14:23:40 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 03 Oct 2006 16:23:40 -0500 Subject: [openib-general] [PATCH] - iWARP Core Support for librdmacm Message-ID: <1159910620.24791.33.camel@stevo-desktop> Here is an updated patch for the iwarp core support in librdmacm. This is updated to use roland'd transport type changes. Sean, can you add this to librdmacm? Signed-off-by: Steve Wise Index: librdmacm/src/cma.c =================================================================== --- librdmacm/src/cma.c (revision 9694) +++ librdmacm/src/cma.c (working copy) @@ -643,6 +643,17 @@ return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask); } +static int ucma_modify_qp_sqd(struct rdma_cm_id *id) +{ + struct ibv_qp_attr qp_attr; + + if (!id->qp) + return 0; + + qp_attr.qp_state = IBV_QPS_SQD; + return ibv_modify_qp(id->qp, &qp_attr, IBV_QP_STATE); +} + static int ucma_modify_qp_err(struct rdma_cm_id *id) { struct ibv_qp_attr qp_attr; @@ -890,7 +901,16 @@ void *msg; int ret, size; - ret = ucma_modify_qp_err(id); + switch (id->verbs->device->transport_type) { + case IBV_TRANSPORT_IB: + ret = ucma_modify_qp_err(id); + break; + case IBV_TRANSPORT_IWARP: + ret = ucma_modify_qp_sqd(id); + break; + default: + ret = -EINVAL; + } if (ret) return ret; From rjwalsh at pathscale.com Tue Oct 3 14:36:09 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 03 Oct 2006 14:36:09 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> Message-ID: <4522D7C9.7040300@pathscale.com> > This seems dangerous, especially now that we have PCI_MULTITHREAD_PROBE: > nothing prevents ipath_cdev_init() from being called twice. Better to > use something like test_and_set_bit() to make sure this is done > exactly once. Well, it needs to be refcounted, as we need to create it on first add_one and remove it on the last remove_one. If I put a spinlock around it, would that suffice? Regards, Robert. From mshefty at ichips.intel.com Tue Oct 3 14:34:12 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 03 Oct 2006 14:34:12 -0700 Subject: [openib-general] [PATCH] - iWARP Core Support for librdmacm In-Reply-To: <1159910620.24791.33.camel@stevo-desktop> References: <1159910620.24791.33.camel@stevo-desktop> Message-ID: <4522D754.8080308@ichips.intel.com> Steve Wise wrote: > Here is an updated patch for the iwarp core support in librdmacm. This > is updated to use roland'd transport type changes. > > Sean, can you add this to librdmacm? Added to svn 9696. - Sean From halr at voltaire.com Tue Oct 3 14:38:22 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 17:38:22 -0400 Subject: [openib-general] [PATCH 12/13] osm: port to WinIB stack : opensm/osm_qos.c In-Reply-To: <86lkoilbz2.fsf@mtl066.yok.mtl.com> References: <86lkoilbz2.fsf@mtl066.yok.mtl.com> Message-ID: <1159911502.4502.9763.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote: > Hi Hal > > Port num is uint8_t (avoid casting by using correct size field). > Added some explicit casts Thanks. Applied. -- Hal > Thanks > > Eitan From rdreier at cisco.com Tue Oct 3 14:42:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 14:42:23 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: <4522D7C9.7040300@pathscale.com> (Robert Walsh's message of "Tue, 03 Oct 2006 14:36:09 -0700") References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <4522D7C9.7040300@pathscale.com> Message-ID: Robert> Well, it needs to be refcounted, as we need to create it Robert> on first add_one and remove it on the last remove_one. If Robert> I put a spinlock around it, would that suffice? Yes, that should be fine, although I assume you'll need to set a flag and then create the file outside the lock. (I guess you could also get fancy and use atomic_inc_return() and atomic_dec_and_test() too if you wanted). I'm still curious -- what is the problem with creating the file before any devices are found? Why does modprobe get stuck? - R. From halr at voltaire.com Tue Oct 3 14:45:08 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 17:45:08 -0400 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <86sliqlc0r.fsf@mtl066.yok.mtl.com> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> Message-ID: <1159911907.4502.10032.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote: > Hi Hal > > 1. Avoid varargs macros not supported by win What about using __VA_ARGS__? It is C99, and MS claims that they support this (http://msdn2.microsoft.com/en-us/library/ms177415.aspx), in this way: #define MACRO(s, ...) printf(s, __VA_ARGS__) This should work with GNU as well. > 2. Some explicit casting required > 3. Use stroull and not stroll Items 2 and 3 were applied. -- Hal > > Thanks > > Eitan From rjwalsh at pathscale.com Tue Oct 3 14:45:02 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 03 Oct 2006 14:45:02 -0700 Subject: [openib-general] [openfabrics-ewg] OFED Status In-Reply-To: References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <4522D7C9.7040300@pathscale.com> Message-ID: <4522D9DE.7060709@pathscale.com> Roland Dreier wrote: > Robert> Well, it needs to be refcounted, as we need to create it > Robert> on first add_one and remove it on the last remove_one. If > Robert> I put a spinlock around it, would that suffice? > > Yes, that should be fine, although I assume you'll need to set a flag > and then create the file outside the lock. (I guess you could also > get fancy and use atomic_inc_return() and atomic_dec_and_test() too if > you wanted). > > I'm still curious -- what is the problem with creating the file before > any devices are found? Why does modprobe get stuck? I don't really know yet. I'm trying to fix something else right now, but I'll come back and take a look at this shortly. OFED looks like it's going to have my current fix, which is probably OK for the moment. I'll try get a proper fix ready for 2.6.19 in the next day or so. Regards, Robert. From halr at voltaire.com Tue Oct 3 14:49:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Oct 2006 17:49:10 -0400 Subject: [openib-general] [PATCH] osm: port to WinIB stack Message-ID: <1159912150.4502.10146.camel@hal.voltaire.com> Hi Eitan, Aside from the varargs handling (relative to 2 patches) and the osmtest.c question, also pending is a patch to remove the WIN defines just added in multiple places and move them to config.h for the Windows build ? Can you/when can you prepare a patch for this ? Thanks! -- Hal From ftillier at silverstorm.com Tue Oct 3 14:48:28 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 3 Oct 2006 14:48:28 -0700 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F010516F2@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F010516F2@taurus.voltaire.com> Message-ID: <79ae2f320610031448p46602648r464991f77084fb53@mail.gmail.com> Yaron, On 10/3/06, Yaron Haviv wrote: > > The only point I'm making is that any one can add an overlay driver for > his proprietary HW as he likes, and put it in OFED distribution, but if > this is becoming an internal portion of the open fabric kernel than: > 1. Let's look at how we solve the problems in a more general perspective > 2. Let's not duplicate code where we can avoid it How is this duplicating code? It's just a NIC driver for a proprietary NIC that uses IB as it's I/O bus. What code is this duplicating? > 3. Let's make sure it's documented and reviewed (code and architectural > wise) There are many proprietary HW architecture drivers in the linux kernel. Heck, even in OpenFabrics -- the HCA drivers implement prorietary protocols over the PCI bus. Why should this be treated differently? The code should definitely be reviewed to make sure that it follows the right network driver architecture, interfaces to the IB stack properly, and follows the kernel coding guidelines. In fact, the review is what prompted this email thread. > We have kept those standards for all other solutions; I think it's just > as fair to demand it in that case as well OpenFabrics has followed standards where they exist. However, there is nothing that restricts development in OpenFabrics to things that have industry standards. There is also nothing that restricts proprietary drivers from having driver support in OpenFabrics or the Linux kernel. MTHCA is an example of a driver that does exactly this - it is a driver to enable Mellanox's proprietary HCA architecture for the OpenFabrics software stack with the goal of being merged into the Linux kernel. - Fab From sweitzen at cisco.com Tue Oct 3 14:53:11 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 3 Oct 2006 14:53:11 -0700 Subject: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10 Message-ID: Vlad, thaks for the fast response. I have some followup questions about configuring IPoIB HA, see below. 3) I got IPoIB HA working on SLES 10, but the documentation is a little lacking. Looks like I have to put the same IP address in ifcfg-ib0 and ifcfg-ib1, is this correct? Yes, IP address should be the same. Actually the configuration of the secondary interface does not matter. The High Availability daemon reads the configuration of the primary interface and migrates it between the interfaces in case of failure. If I don't have an ifcfg-ib1 file, then ipoib_ha.pl won't start. If I don't have an ifcfg-ib1, then ipoib_ha.pl won't start. I would prefer to not configure ifcfg-ib1 since I don't plan to use it. # ipoib_ha.pl --with-arping --with-multicast -v Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory ... If I put different IP addresses in ifcfg-ib0 and ifcfg-ib1, then the ifcfg-ib1 IP address is used for both ib0 and ib1! # pwd /etc/sysconfig/network # cat ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static IPADDR=192.168.2.46 NETMASK=255.255.255.0 ONBOOT=yes # cat ifcfg-ib1 DEVICE=ib1 BOOTPROTO=static IPADDR=192.168.6.46 NETMASK=255.255.255.0 ONBOOT=yes # /etc/init.d/openibd start Loading HCA driver and Access Layer: [ OK ] Setting up InfiniBand network interfaces: ib0 device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com patibility mode) (rev 20) ib0 configuration: ib1 Bringing up interface ib0: [ OK ] ib1 device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com patibility mode) (rev 20) Bringing up interface ib1: [ OK ] Setting up service network . . . [ done ] # ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00 -00 inet addr:192.168.6.46 Bcast:192.168.6.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:21:700d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:224 (224.0 b) # ifconfig ib1 ib1 Link encap:UNSPEC HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00 -00 inet addr:192.168.6.46 Bcast:192.168.6.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:21:700e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:304 (304.0 b) Notice how both ib0 and ib1 have the IP address from ifcfg-ib1. This contradicts this info from ipoib_release_notes.txt: b. The ib1 interface uses the configuration script of ib0. Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Tue Oct 3 15:01:55 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 4 Oct 2006 00:01:55 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <4522B924.8030504@mellanox.co.il> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> <20060917173518.GC32526@mellanox.co.il> <4522B924.8030504@mellanox.co.il> Message-ID: <20061003220155.GY10617@sashak.voltaire.com> Hi Eitan, On 21:25 Tue 03 Oct , Eitan Zahavi wrote: > Hi Hal, > > This is another case where Michael complains about the patch not > providing range checking. > However, range checking is not implemented for the rest of this parser > code. There are only two occurrences of strtoul() in this parser and both are touched by the patch. BTW what is the goal of int/int casting in this and other WinIB patches? VC warnings preventing? It does not help to make the code more readable and potentially could hide a problems. Sasha > So I think > the range check should be a separate patch. > > Please let me know if this works for you > > Thanks > > > Michael S. Tsirkin wrote: > > >Quoting r. Eitan Zahavi : > > > > > >> p++; > >>- port_num = strtoul(p, &q, 10); > >>+ port_num = (uint8_t)strtoul(p, &q, 10); > >> if (q && !isspace(*q)) { > >> > >> > > > >Would it make sense to range-check the value before casting it away? > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Tue Oct 3 15:07:55 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 03 Oct 2006 15:07:55 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061003211921.GB16503@mellanox.co.il> References: <20061003211921.GB16503@mellanox.co.il> Message-ID: <4522DF3B.3020205@ichips.intel.com> Ishai Rabinovitz wrote: > There is a bug in SRP Engenio target that send a large value as service > timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long > timeout is not reasonable and it may leave the kernel module waiting on > wait_for_completion and may stuck a lot of processes. > > The following patch allows the load of ib_cm module with a limit on the > timeout. There's several timeout values transfered and used by the cm, most notably the remote cm response timeout and packet life time. Does it make more sense to have a single, generic timeout maximum instead? Would it make more sense to enable the maximum(s) by default, since we're dependent upon values received over the network? - Sean From sashak at voltaire.com Tue Oct 3 15:16:38 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 4 Oct 2006 00:16:38 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <86sliqlc0r.fsf@mtl066.yok.mtl.com> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> Message-ID: <20061003221638.GZ10617@sashak.voltaire.com> Hi Eitan, Some more comments... On 18:59 Sun 17 Sep , Eitan Zahavi wrote: > Hi Hal > > 1. Avoid varargs macros not supported by win > 2. Some explicit casting required > 3. Use stroull and not stroll > > Thanks > > Eitan > > Signed-off-by: Eitan Zahavi > > Index: opensm/osm_ucast_file.c > =================================================================== > --- opensm/osm_ucast_file.c (revision 9502) > +++ opensm/osm_ucast_file.c (working copy) > @@ -52,18 +52,11 @@ > > #include > #include > +#include Why this? > #include > #include > #include > > -#define PARSEERR(log, file_name, lineno, fmt, arg...) \ > - osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \ > - file_name, lineno, ##arg ) > - > -#define PARSEWARN(log, file_name, lineno, fmt, arg...) \ > - osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \ > - file_name, lineno, ##arg ) > - Is it possible to use C99 style var args macros (with __VA_ARGS___)? MS claims it is supported by VC. And it is supported by gcc too. > static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid) > { > osm_port_t *p_port; > @@ -72,10 +65,11 @@ static uint16_t remap_lid(osm_opensm_t * > > p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid); > if (!p_port || > - p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) { > + p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) > + { Please don't break existing code formatting. > osm_log(&p_osm->log, OSM_LOG_VERBOSE, > - "remap_lid: cannot find port guid 0x%016" PRIx64 > - " , will use the same lid\n", cl_ntoh64(guid)); > + "remap_lid: cannot find port guid 0x%016" PRIx64 > + " , will use the same lid\n", cl_ntoh64(guid)); > return lid; > } > > @@ -182,19 +176,21 @@ static int do_ucast_file_load(void *cont > "skipping parsing. Using default routing algorithm\n"); > > } > + Ditto. > else if (!strncmp(p, "Unicast lids", 12)) { > q = strstr(p, " guid 0x"); > if (!q) { > - PARSEERR(&p_osm->log, file_name, lineno, > - "cannot parse switch definition\n"); > + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" > + " cannot parse switch definition\n", > + file_name, lineno); > return -1; > } > p = q + 6; > - sw_guid = strtoll(p, &q, 16); > + sw_guid = strtoull(p, &q, 16); Good. > if (q && !isspace(*q)) { > - PARSEERR(&p_osm->log, file_name, lineno, > - "cannot parse switch guid: \'%s\'\n", > - p); > + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" > + "cannot parse switch guid: \'%s\'\n", > + file_name, lineno, p); > return -1; > } > sw_guid = cl_hton64(sw_guid); > @@ -212,40 +208,39 @@ static int do_ucast_file_load(void *cont > } > } > else if (p_sw && !strncmp(p, "0x", 2)) { > - lid = strtoul(p, &q, 16); > + lid = (uint16_t)strtoul(p, &q, 16); > if (q && !isspace(*q)) { > - PARSEERR(&p_osm->log, file_name, lineno, > - "cannot parse lid: \'%s\'\n", p); > + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" > + "cannot parse lid: \'%s\'\n", file_name, lineno, p); > return -1; > } > p = q; > while (isspace(*p)) > p++; > - port_num = strtoul(p, &q, 10); > + port_num = (uint8_t)strtoul(p, &q, 10); > if (q && !isspace(*q)) { > - PARSEERR(&p_osm->log, file_name, lineno, > - "cannot parse port: \'%s\'\n", p); > + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" > + "cannot parse port: \'%s\'\n", file_name, lineno, p); > return -1; > } > p = q; > /* additionally try to exract guid */ > q = strstr(p, " portguid 0x"); > if (!q) { > - PARSEWARN(&p_osm->log, file_name, lineno, > - "cannot find port guid " > - "(maybe broken dump): \'%s\'\n", p); > + osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" > + "cannot find port guid " > + "(maybe broken dump): \'%s\'\n", file_name, lineno, p); > port_guid = 0; > } > else > { > p = q + 10; > - port_guid = strtoll(p, &q, 16); > + port_guid = strtoull(p, &q, 16); Good. Sasha > if (!q && !isspace(*q) && *q != ':') { > - PARSEWARN(&p_osm->log, file_name, > - lineno, > - "cannot parse port guid " > - "(maybe broken dump): " > - "\'%s\'\n", p); > + osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" > + "cannot parse port guid " > + "(maybe broken dump): " > + "\'%s\'\n", file_name, lineno, p); > port_guid = 0; > } > } > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Tue Oct 3 15:30:46 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 03 Oct 2006 17:30:46 -0500 Subject: [openib-general] ammasso user lib Message-ID: <1159914646.24791.39.camel@stevo-desktop> Roland, How do you suggest we handle the Ammasso user lib code? I could check it into the svn main trunk, or submit a set of patches and you handle it the svn work. To date, I've only committed to the iwarp branch... In either case, I'll re-post an RFC patch set soon for folks to review if they desire... Thanks, Steve. From trimmer at silverstorm.com Tue Oct 3 15:57:55 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 3 Oct 2006 18:57:55 -0400 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> Message-ID: > From: Yaron Haviv [mailto:yaronh at voltaire.com] > Sent: Tuesday, October 03, 2006 3:05 PM > To: Rimmer, Todd; Scott Weitzenkamp (sweitzen); Kuchimanchi, Ramachandra; > Roland Dreier (rdreier) > Cc: openib-General > Subject: RE: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm > Virtual Ethernet I/O controller (VEx) > > > Todd, > > I'm trying to figure out why this protocol makes sense > As far as I understand, IPoIB can provide a Virtual NIC functionality > just as well (maybe even better), with two restrictions: > 1. Lack of support for Jumbo Frames > 2. Doesn't support protocols other than IP (e.g. IPX, ..) > > 1 can easily be addressed using IPoIB RC, and the question is if 2 is > really a problem (how many people use IPX or apple talk .. these days) > And if 2 is a problem why isn't it in a greater scope of supporting > Ethernet emulation even between any IB nodes, and not just from a host > to a gateway device. > > If this is a real requirement, why haven't SilverStorm worked with the > industry and standardization bodies such as IBTA or IETF to come with a > standard and interoperable way to address it, and not just try and push > a proprietary driver and a point solution to the kernel. > > I believe we should first see if such a driver is needed and if IPoIB > UD/RC cannot be leveraged for that, maybe the Ethernet emulation can > just be an extension to IPoIB RC, hitting 3 birds in one stone (same > infrastructure, jumbo frames for IPoIB, and Ethernet emulation for all > nodes not just Gateways) > Hi Yaron, I was offline much of today, in reviewing the email on this topic it seems others have already answered most of your questions. So rather than belabor the topic, I would simply like to say that I agree with all the comments Rick, Fab and Michael have put forth. Todd Rimmer From rdreier at cisco.com Tue Oct 3 16:47:50 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 03 Oct 2006 16:47:50 -0700 Subject: [openib-general] ammasso user lib In-Reply-To: <1159914646.24791.39.camel@stevo-desktop> (Steve Wise's message of "Tue, 03 Oct 2006 17:30:46 -0500") References: <1159914646.24791.39.camel@stevo-desktop> Message-ID: Steve> Roland, How do you suggest we handle the Ammasso user lib Steve> code? I could check it into the svn main trunk, or submit Steve> a set of patches and you handle it the svn work. To date, Steve> I've only committed to the iwarp branch... As far as I'm concerned, it's fine to check into the trunk. It's completely self-contained and doesn't adversely affect anything on the trunk, so why not? - R. From tom at opengridcomputing.com Tue Oct 3 20:31:15 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 03 Oct 2006 22:31:15 -0500 Subject: [openib-general] [PATCH][RFC] Add node_type / transport_type to struct ibv_device In-Reply-To: Message-ID: Yeah, this is better. On 10/3/06 3:48 PM, "Roland Dreier" wrote: > err, here's the patch: > > Index: libibverbs/include/infiniband/verbs.h > =================================================================== > --- libibverbs/include/infiniband/verbs.h (revision 9680) > +++ libibverbs/include/infiniband/verbs.h (working copy) > @@ -66,9 +66,17 @@ union ibv_gid { > }; > > enum ibv_node_type { > - IBV_NODE_CA = 1, > + IBV_NODE_UNKNOWN = -1, > + IBV_NODE_CA = 1, > IBV_NODE_SWITCH, > - IBV_NODE_ROUTER > + IBV_NODE_ROUTER, > + IBV_NODE_RNIC > +}; > + > +enum ibv_transport_type { > + IBV_TRANSPORT_UNKNOWN = -1, > + IBV_TRANSPORT_IB = 0, > + IBV_TRANSPORT_IWARP > }; > > enum ibv_device_cap_flags { > @@ -577,6 +585,8 @@ enum { > struct ibv_device { > struct ibv_driver *driver; > struct ibv_device_ops ops; > + enum ibv_node_type node_type; > + enum ibv_transport_type transport_type; > /* Name of underlying kernel IB device, eg "mthca0" */ > char name[IBV_SYSFS_NAME_MAX]; > /* Name of uverbs device, eg "uverbs0" */ > Index: libibverbs/ChangeLog > =================================================================== > --- libibverbs/ChangeLog (revision 9680) > +++ libibverbs/ChangeLog (working copy) > @@ -1,3 +1,12 @@ > +2006-10-03 Roland Dreier > + > + * src/init.c (init_drivers): Set node_type and transport_type > + values of device being created. > + > + * include/infiniband/verbs.h: Add ibv_node_type enum value > + IBV_NODE_RNIC, and add enum ibv_transport_type. Add node_type and > + transport_type fields to struct ibv_device. > + > 2006-09-12 Roland Dreier > > * include/infiniband/verbs.h: Swap wr_id and next members of > Index: libibverbs/src/init.c > =================================================================== > --- libibverbs/src/init.c (revision 9680) > +++ libibverbs/src/init.c (working copy) > @@ -130,7 +130,9 @@ static struct ibv_device *init_drivers(c > int abi_ver = 0; > char sys_path[IBV_SYSFS_PATH_MAX]; > char ibdev_name[IBV_SYSFS_NAME_MAX]; > + char ibdev_path[IBV_SYSFS_PATH_MAX]; > char value[8]; > + enum ibv_node_type node_type; > > snprintf(sys_path, sizeof sys_path, "%s/%s", > class_path, dev_name); > @@ -144,17 +146,44 @@ static struct ibv_device *init_drivers(c > return NULL; > } > > + snprintf(ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", > + ibv_get_sysfs_path(), ibdev_name); > + > + if (ibv_read_sysfs_file(ibdev_path, "node_type", value, sizeof value) < 0) { > + fprintf(stderr, PFX "Warning: no node_type attr for %s\n", > + ibdev_path); > + return NULL; > + } > + node_type = strtol(value, NULL, 10); > + if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC) > + node_type = IBV_NODE_UNKNOWN; > + > for (driver = driver_list; driver; driver = driver->next) { > dev = driver->init_func(sys_path, abi_ver); > if (!dev) > continue; > > dev->driver = driver; > + dev->node_type = node_type; > + > + switch (node_type) { > + case IBV_NODE_CA: > + case IBV_NODE_SWITCH: > + case IBV_NODE_ROUTER: > + dev->transport_type = IBV_TRANSPORT_IB; > + break; > + case IBV_NODE_RNIC: > + dev->transport_type = IBV_TRANSPORT_IWARP; > + break; > + default: > + dev->transport_type = IBV_TRANSPORT_UNKNOWN; > + break; > + } > + > strcpy(dev->dev_path, sys_path); > - snprintf(dev->ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", > - ibv_get_sysfs_path(), ibdev_name); > strcpy(dev->dev_name, dev_name); > strcpy(dev->name, ibdev_name); > + strcpy(dev->ibdev_path, ibdev_path); > > return dev; > } From sweitzen at cisco.com Tue Oct 3 22:39:54 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 3 Oct 2006 22:39:54 -0700 Subject: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10 Message-ID: If I fail back and forth between ib0 and ib1 every 30 seconds or so for several hours, while IPoIB traffic is running, IPoIB host gets an Oops: and IPoIB stops working. ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet general protection fault: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 7 Modules linked in: af_packet ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_s a ib_uverbs ib_umad ib_mthca ib_mad ib_core nls_utf8 st ipv6 nfs lockd nfs_acl s unrpc button battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random ide_c d ehci_hcd uhci_hcd cdrom i8xx_tco ide_floppy usbcore shpchp e1000 pci_hotplug f loppy reiserfs edd fan thermal processor siimage sg mptspi mptscsih mptbase scsi _transport_spi piix sd_mod scsi_mod ide_disk ide_core Pid: 23541, comm: ib_mad1 Tainted: G U 2.6.16.21-0.8-smp #1 RIP: 0010:[] {_spin_lock_irqsave+3} RSP: 0018:ffff810132a4fc20 EFLAGS: 00010086 RAX: 0000000000000286 RBX: 0000000000000000 RCX: ffffffff883324ee RDX: ffff810128d5e380 RSI: 0000000000000000 RDI: 0000ffff1b6017ff RBP: 00000000fffffffc R08: ffffffff803d3260 R09: ffff810140333800 R10: ffff81000107d400 R11: 0000000000000292 R12: ffff810128d5e380 R13: ffff810132a4fc78 R14: 0000ffff1b6017ff R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff810142d19740(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b0b5e6ae180 CR3: 0000000128cbc000 CR4: 00000000000006e0 Process ib_mad1 (pid: 23541, threadinfo ffff810132a4e000, task ffff810142b56100) Stack: ffffffff8833c5f5 ffff8101302b3000 0000ffff1b6012ff 0000000000000002 0000000000000296 ffff8101302b3500 ffffffff8027753e ffff810128d5e3a0 ffff81012bce1680 ffff810128d5e380 Call Trace: {:ib_ipoib:path_rec_completion+862} {dev_queue_xmit+545} {:ib_ipoib:path_ rec_completion+795} {:ib_sa:ib_sa_path_rec_callback+64} {lock_timer_base+27} {try_to_del_time r_sync+81} {:ib_sa:send_handler+72} {:ib_mad:ib_ mad_complete_send_wr+421} {:ib_mad:ib_mad_completion_handler+947} {:ib_mad:ib_mad_completion_handler+0} {run_workqueue+153} {worker_thread+0} {keventd_create_kthread+0} {worker_th read+265} {__wake_up_common+62} {default_wake_f unction+0} {keventd_create_kthread+0} {kthread+2 36} {child_rip+8} {keventd_create_kthread +0} {kthread+0} {child_rip+0} Code: f0 ff 0f 0f 88 29 01 00 00 c3 fa f0 ff 0f 0f 88 2a 01 00 00 RIP {_spin_lock_irqsave+3} RSP Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Tuesday, October 03, 2006 2:53 PM To: Vladimir Sokolovsky Cc: EWG; openib-General Subject: Re: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10 Vlad, thaks for the fast response. I have some followup questions about configuring IPoIB HA, see below. 3) I got IPoIB HA working on SLES 10, but the documentation is a little lacking. Looks like I have to put the same IP address in ifcfg-ib0 and ifcfg-ib1, is this correct? Yes, IP address should be the same. Actually the configuration of the secondary interface does not matter. The High Availability daemon reads the configuration of the primary interface and migrates it between the interfaces in case of failure. If I don't have an ifcfg-ib1 file, then ipoib_ha.pl won't start. If I don't have an ifcfg-ib1, then ipoib_ha.pl won't start. I would prefer to not configure ifcfg-ib1 since I don't plan to use it. # ipoib_ha.pl --with-arping --with-multicast -v Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory ... If I put different IP addresses in ifcfg-ib0 and ifcfg-ib1, then the ifcfg-ib1 IP address is used for both ib0 and ib1! # pwd /etc/sysconfig/network # cat ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static IPADDR=192.168.2.46 NETMASK=255.255.255.0 ONBOOT=yes # cat ifcfg-ib1 DEVICE=ib1 BOOTPROTO=static IPADDR=192.168.6.46 NETMASK=255.255.255.0 ONBOOT=yes # /etc/init.d/openibd start Loading HCA driver and Access Layer: [ OK ] Setting up InfiniBand network interfaces: ib0 device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com patibility mode) (rev 20) ib0 configuration: ib1 Bringing up interface ib0: [ OK ] ib1 device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com patibility mode) (rev 20) Bringing up interface ib1: [ OK ] Setting up service network . . . [ done ] # ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00 -00 inet addr:192.168.6.46 Bcast:192.168.6.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:21:700d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:224 (224.0 b) # ifconfig ib1 ib1 Link encap:UNSPEC HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00 -00 inet addr:192.168.6.46 Bcast:192.168.6.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:21:700e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:304 (304.0 b) Notice how both ib0 and ib1 have the IP address from ifcfg-ib1. This contradicts this info from ipoib_release_notes.txt: b. The ib1 interface uses the configuration script of ib0. Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Tue Oct 3 22:47:54 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 3 Oct 2006 22:47:54 -0700 (PDT) Subject: [openib-general] [Bug 263] New: OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061004054754.082242283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 Summary: OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Product: OpenFabrics Linux Version: 1.1rc6 Platform: X86-64 OS/Version: SLES 10 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com SLES10 x86_64 with dual-port LionCub HCA. I am looping a script that turns off and back on IB ports on a Cisco IB switch such that there will be IPoIB failover every 30 seconds on a host, and I'm running IPoIB traffic on that host too. If I fail back and forth between ib0 and ib1 every 30 seconds or so for several hours, while IPoIB traffic is running, IPoIB host gets an Oops: and IPoIB stops working. ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet general protection fault: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 7 Modules linked in: af_packet ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_s a ib_uverbs ib_umad ib_mthca ib_mad ib_core nls_utf8 st ipv6 nfs lockd nfs_acl s unrpc button battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random ide_c d ehci_hcd uhci_hcd cdrom i8xx_tco ide_floppy usbcore shpchp e1000 pci_hotplug f loppy reiserfs edd fan thermal processor siimage sg mptspi mptscsih mptbase scsi _transport_spi piix sd_mod scsi_mod ide_disk ide_core Pid: 23541, comm: ib_mad1 Tainted: G U 2.6.16.21-0.8-smp #1 RIP: 0010:[] {_spin_lock_irqsave+3} RSP: 0018:ffff810132a4fc20 EFLAGS: 00010086 RAX: 0000000000000286 RBX: 0000000000000000 RCX: ffffffff883324ee RDX: ffff810128d5e380 RSI: 0000000000000000 RDI: 0000ffff1b6017ff RBP: 00000000fffffffc R08: ffffffff803d3260 R09: ffff810140333800 R10: ffff81000107d400 R11: 0000000000000292 R12: ffff810128d5e380 R13: ffff810132a4fc78 R14: 0000ffff1b6017ff R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff810142d19740(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b0b5e6ae180 CR3: 0000000128cbc000 CR4: 00000000000006e0 Process ib_mad1 (pid: 23541, threadinfo ffff810132a4e000, task ffff810142b56100) Stack: ffffffff8833c5f5 ffff8101302b3000 0000ffff1b6012ff 0000000000000002 0000000000000296 ffff8101302b3500 ffffffff8027753e ffff810128d5e3a0 ffff81012bce1680 ffff810128d5e380 Call Trace: {:ib_ipoib:path_rec_completion+862} {dev_queue_xmit+545} {:ib_ipoib:path_ rec_completion+795} {:ib_sa:ib_sa_path_rec_callback+64} {lock_timer_base+27} {try_to_del_time r_sync+81} {:ib_sa:send_handler+72} {:ib_mad:ib_ mad_complete_send_wr+421} {:ib_mad:ib_mad_completion_handler+947} {:ib_mad:ib_mad_completion_handler+0} {run_workqueue+153} {worker_thread+0} {keventd_create_kthread+0} {worker_th read+265} {__wake_up_common+62} {default_wake_f unction+0} {keventd_create_kthread+0} {kthread+2 36} {child_rip+8} {keventd_create_kthread +0} {kthread+0} {child_rip+0} Code: f0 ff 0f 0f 88 29 01 00 00 c3 fa f0 ff 0f 0f 88 2a 01 00 00 RIP {_spin_lock_irqsave+3} RSP ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Tue Oct 3 23:17:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 08:17:54 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <4522DF3B.3020205@ichips.intel.com> References: <20061003211921.GB16503@mellanox.co.il> <4522DF3B.3020205@ichips.intel.com> Message-ID: <20061004061754.GB4855@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Ishai Rabinovitz wrote: > > There is a bug in SRP Engenio target that send a large value as service > > timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long > > timeout is not reasonable and it may leave the kernel module waiting on > > wait_for_completion and may stuck a lot of processes. > > > > The following patch allows the load of ib_cm module with a limit on the > > timeout. > > There's several timeout values transfered and used by the cm, most notably the > remote cm response timeout and packet life time. Does it make more sense to > have a single, generic timeout maximum instead? Hmm. I'm not sure - we are working around an actual broken implementation here - what do you think? > Would it make more sense to > enable the maximum(s) by default, since we're dependent upon values received > over the network? I think it would. -- MST From rjwalsh at pathscale.com Tue Oct 3 23:29:44 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 03 Oct 2006 23:29:44 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> Message-ID: <452354D8.6040903@pathscale.com> Hi folks, Here's a new spin of the patch to fix the problem of the ipath module causing errors if no ipath hardware is present. This new version of the patch should fix the potential problem Roland spotted if the kernel is doing multithreaded probes. Roland: please review and queue for 2.6.19 if you're satisfied with this approach. I still don't have an answer about why modprobe hangs when this patch isn't applied - I'll get to that in the next day or so when I have a moment. Michael: please consider replacing the last patch we sent to OFED for this with this new version. I suspect that, once again, you will be required to modify the patch to get it to apply cleanly. I'd like to avoid having you do this, but I don't have a clear idea how to get hold of the OFED-next-release-in-progress stuff. Bryan handled this previously, but he's on vacation for the next several weeks. Do you have some instructions written down somewhere you could point me at on how to submit patches that would make your life a little easier in this regard? Regards, Robert. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: diagpkt-init-fixup.patch URL: From mst at mellanox.co.il Tue Oct 3 23:34:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 08:34:40 +0200 Subject: [openib-general] Fwd: Re: Problems with OFED IPoIB HA on SLES10 Message-ID: <20061004063440.GE4855@mellanox.co.il> BTW, any idea? The ipoib_ha is just a script that ups/downs and configures interfaces, so this crash it seems coul also happen on systems without it. -- MST -------------- next part -------------- An embedded message was scrubbed... From: "Scott Weitzenkamp (sweitzen)" Subject: Re: [openib-general] Problems with OFED IPoIB HA on SLES10 Date: Tue, 3 Oct 2006 22:39:54 -0700 Size: 32210 URL: From vlad at mellanox.co.il Tue Oct 3 23:38:08 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 04 Oct 2006 08:38:08 +0200 Subject: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10 In-Reply-To: References: Message-ID: <1159943888.8333.83.camel@mtlsws13.yok.mtl.com> Hi Scott, You have an old version of ipoibtools package (ipoib_ha.pl). All issues you are talking about were fixed in the new version which will be available in OFED-1.1-rc7. You can also download it from SVN: https://openib.org/svn/gen2/branches/1.1/src/userspace/ipoibtools Thanks, Regards, Vladimir On Tue, 2006-10-03 at 14:53 -0700, Scott Weitzenkamp (sweitzen) wrote: > Vlad, thaks for the fast response. I have some followup questions > about configuring IPoIB HA, see below. > > > 3) I got IPoIB HA working on SLES 10, but the > documentation is a little lacking. Looks like I have to put the same > IP address in ifcfg-ib0 and ifcfg-ib1, is this correct? > > > Yes, IP address should be the same. Actually the configuration > of the secondary interface does not matter. > The High Availability daemon reads the configuration of the > primary interface and migrates it between the interfaces in case of > failure. > > > If I don't have an ifcfg-ib1 file, then ipoib_ha.pl won't start. > > If I don't have an ifcfg-ib1, then ipoib_ha.pl won't start. I would > prefer to not configure ifcfg-ib1 since I don't plan to use it. > > # ipoib_ha.pl --with-arping --with-multicast -v > Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or > directory > Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or > directory > Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or > directory > Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or > directory > Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or > directory > ... > > If I put different IP addresses in ifcfg-ib0 and ifcfg-ib1, then the > ifcfg-ib1 IP address is used for both ib0 and ib1! > > # pwd > /etc/sysconfig/network > # cat ifcfg-ib0 > DEVICE=ib0 > BOOTPROTO=static > IPADDR=192.168.2.46 > NETMASK=255.255.255.0 > ONBOOT=yes > # cat ifcfg-ib1 > DEVICE=ib1 > BOOTPROTO=static > IPADDR=192.168.6.46 > NETMASK=255.255.255.0 > ONBOOT=yes > # /etc/init.d/openibd start > Loading HCA driver and Access Layer: [ OK ] > Setting up InfiniBand network interfaces: > ib0 device: Mellanox Technologies MT25208 InfiniHost III Ex > (Tavor com > patibility mode) (rev 20) > ib0 configuration: ib1 > Bringing up interface ib0: [ OK ] > ib1 device: Mellanox Technologies MT25208 InfiniHost III Ex > (Tavor com > patibility mode) (rev 20) > Bringing up interface ib1: [ OK ] > Setting up service network . . . [ done ] > # ifconfig ib0 > ib0 Link encap:UNSPEC HWaddr > 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00 > -00 > inet addr:192.168.6.46 Bcast:192.168.6.255 > Mask:255.255.255.0 > inet6 addr: fe80::202:c902:21:700d/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:224 (224.0 b) > > # ifconfig ib1 > ib1 Link encap:UNSPEC HWaddr > 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00 > -00 > inet addr:192.168.6.46 Bcast:192.168.6.255 > Mask:255.255.255.0 > inet6 addr: fe80::202:c902:21:700e/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:304 (304.0 b) > > Notice how both ib0 and ib1 have the IP address from ifcfg-ib1. This > contradicts this info from ipoib_release_notes.txt: > > > b. The ib1 interface uses the configuration script of > ib0. > > > > > Scott > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackm at dev.mellanox.co.il Wed Oct 4 00:32:50 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 4 Oct 2006 09:32:50 +0200 Subject: [openib-general] [PATCH] ib_cm: fix module unload race with timewait In-Reply-To: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com> References: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com> Message-ID: <200610040932.50362.jackm@dev.mellanox.co.il> On Saturday 30 September 2006 02:52, Sean Hefty wrote: > If the ib_cm module is unloaded while id's are still in timewait, > the CM will destroy the work queue used to process timewait. Once > the id's exit timewait, their timers will fire, leading to a crash > trying to access the destroyed work queue. > > We need to track id's that are in timewait, and cancel their deferred > work on module unload. > > Signed-off-by: Sean Hefty Erez, have you tried out the patch (with or without Roland's suggested modifications)? If so, did it solve the problem? (we think it most likely did, but we would like to know). - Jack From ogerlitz at voltaire.com Wed Oct 4 00:37:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 04 Oct 2006 09:37:19 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <452295B6.20201@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <4522101D.4040102@voltaire.com> <452295B6.20201@ichips.intel.com> Message-ID: <452364AF.8060301@voltaire.com> Sean Hefty wrote: > Or Gerlitz wrote: >> Can you clarify what do you mean "(ABI) conflict with OFED releases"? >> Is an issue with someone wishing to work with OFED user space and IB >> code from upstream kernel? > Yes - there could be issues there. As long as OFED provides kernel IB code you need not support the above config. >> The approach i suggest is: it makes sense to take some care not to >> create too much non working scenarios... however the upstream push >> process must **not** be restricted by the existence of OFED. > I agree with this. cool. >> Specifically, can you push rhe rdma_establish() ***kernel*** API >> support which was integrated into OFED 1.1 as a bug fix for 2.6.19 ? > Yes, but I'd like a user of it to go in at the same time. I don't think this is possible nor its required. The thing is that the only in-tree consumer of the cma code is the iser initiator which implements the active side of an rdma connection. As such it does not call rdma_accept() nor it can be modified to call rdma_establish(), so the _establish() call can be merged during bug fixes window similarly as the _accept() call has been merged during feature window. If you find it problematic to merge it for 2.6.19 i think you should demand ***removing*** the rdma_establish() call from OFED 1.1 as this puts the kernel code in second place relative to OFED and violates another guideline: OFED uses ***kernel IB code***, where "kernel IB code" stands for code that has been merged into Linus tree, or is at some branch of Roland's tree (or your tree when you have such...), or at the -mm tree etc. Or. From ogerlitz at voltaire.com Wed Oct 4 01:55:12 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 04 Oct 2006 10:55:12 +0200 Subject: [openib-general] RHEL 4 U3 - lost completions In-Reply-To: References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> Message-ID: <452376F0.4030902@voltaire.com> Roland Dreier wrote: > Or> Roland - If indeed, does it make sense that the problem does > Or> not reproduce with single threaded runs? > > Sorry, I can't parse the question. However, the problem here seems to > be that the CQ buffer pages end up being marked for copy-on-write, and > I don't know of any reason why that would happen other than a fork() > happening somewhere (possibly behind the scenes in a system() call or > something like that). My question was: assuming there is some fork() (eg behind the scenes of daemonize()) in the app, does it makes sense that everything works as long as the app is single threaded but when there are multiple threads things breaks (eg COW is applied on the page used to hold the CQ etc). ? Or. From ogerlitz at voltaire.com Wed Oct 4 02:12:32 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 04 Oct 2006 11:12:32 +0200 Subject: [openib-general] [PATCH 0/10] [RFC] Support for SilverStorm Virtual Ethernet I/O controller (VEx) In-Reply-To: <4522CDE7.6070206@oracle.com> References: <35EA21F54A45CB47B879F21A91F4862F010516DE@taurus.voltaire.com> <79ae2f320610031231t74f2c69due3c48508c152b409@mail.gmail.com> <6.2.0.14.2.20061003132954.036086a0@esmail.cup.hp.com> <4522CDE7.6070206@oracle.com> Message-ID: <45237B00.2000404@voltaire.com> rick wrote: > For what it's worth: As a customer who is using the SS stack - we were > more than pleased that we could achieve IPOIB (and RDS) failover without > using the bonding driver. I believe this is direct result of the Virtual > NIC approach SS is using. Were you pleased as of having a solution for Oracle/IPoIB/RDS/HA in the presence of the no support for IPoIB by the bonding driver? or the VNIC has provided you some feature which differentiates it from the active-backup mode of the bonding driver? Or. From aviram at dev.mellanox.co.il Wed Oct 4 03:04:21 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Wed, 04 Oct 2006 12:04:21 +0200 Subject: [openib-general] Infiniband Fedora Core5 In-Reply-To: References: Message-ID: <45238725.5030802@dev.mellanox.co.il> No, Fedora Core 5 is not part of the OFED OS matrix. Marsh, Scott wrote: > > Good day, > > > > My name is Scott Marsh. I am an Engineer for Analogic Corporation and > I have a few questions > > regarding OFED. Is there any current development towards OFED for use > with Fedora Core 5? > > If so, is there a timeline for working towards Fedora Core 5? > > > > Thank you. > > > > Regards, > > > > Scott Marsh > > **************************************************************** > The information transmitted in this message is confidential and may be > privileged. Any review, retransmission, dissemination, or other use of > this information by persons or entities other than the intended > recipient is prohibited. If you are not the intended recipient, please > notify Analogic Corporation immediately - by replying to this message > or by sending an email to DeliveryErrors at analogic.com - and destroy > all copies of this information, including any attachments, without > reading or disclosing them. > > Thank you. > > ------------------------------------------------------------------------ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ogerlitz at voltaire.com Wed Oct 4 03:15:37 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 04 Oct 2006 12:15:37 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45229F9A.5030207@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <4522101D.4040102@voltaire.com> <452295B6.20201@ichips.intel.com> <45229F9A.5030207@ichips.intel.com> Message-ID: <452389C9.30000@voltaire.com> Sean Hefty wrote: >>> Can you clarify what do you mean "(ABI) conflict with OFED releases"? >>> >>> Is an issue with someone wishing to work with OFED user space and IB >>> code from upstream kernel? >> >> Yes - there could be issues there. > > To clarify the major issue: currently when a connection request is > received, the connection data specified by the active side through the > rdma_conn_param is NOT given to the user. This includes the > responder_resources and initiator_depth. There's no easy way to obtain > this information. And when getting established event, the connection data specified by the passive side through the rdma_conn_param provided to rdma_accept is also not given to the user, is that an issue? Or. From mlleinin at hpcn.ca.sandia.gov Wed Oct 4 03:16:03 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Wed, 04 Oct 2006 03:16:03 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159885543.24791.21.camel@stevo-desktop> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159885543.24791.21.camel@stevo-desktop> Message-ID: <1159956963.3950.31.camel@localhost> On Tue, 2006-10-03 at 09:25 -0500, Steve Wise wrote: > > Someday soon I hear, OFA will be able to host git repositories, so my preference > > is to delay any svn to git transition until then. (I cannot host git from > > inside Intel's firewall, nor can I access a git repository which isn't hosted at > > kernel.org.) How would you handle merging in changes from the main branch to > > side branches? > > > > Can OFA give us a date on when this will happen? We just got approval to spend OFA money on a new hosted server. The arrangements are being made but we don't have a date for when we will get access to this new machine or when it will be set up. If I had to guess I'd say we will start setting up the server in the next couple weeks. Thanks, - Matt From mst at mellanox.co.il Wed Oct 4 03:33:59 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 12:33:59 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45229F9A.5030207@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <4522101D.4040102@voltaire.com> <452295B6.20201@ichips.intel.com> <45229F9A.5030207@ichips.intel.com> Message-ID: <20061004103359.GC5883@mellanox.co.il> Quoting r. Sean Hefty : > To clarify the major issue: currently when a connection request is received, the > connection data specified by the active side through the rdma_conn_param is NOT > given to the user. This includes the responder_resources and initiator_depth. > There's no easy way to obtain this information. > > The ideal fix for this is to include rdma_conn_param as part of the > rdma_cm_event. BTW, wouldn't it be cleaner to just pass it up in the request event? > However, this breaks every userspace app that's been coded to > OFED / SVN. > An alternative is to add another call to retrieve the data, but > that's not a very clean alternative for new kernel submission. Another alternative is to version the create ID call. -- MST From halr at voltaire.com Wed Oct 4 03:36:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Oct 2006 06:36:59 -0400 Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c In-Reply-To: <86wt82lc1l.fsf@mtl066.yok.mtl.com> References: <86wt82lc1l.fsf@mtl066.yok.mtl.com> Message-ID: <1159958219.4502.35676.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote: > Hi Hal > > Explicit cast required for the win compiler to handle this... Applied to trunk only. Accepted to be consistent with other patches applied for Windows which currently accepted casts. -- Hal > Thanks > > Eitan From halr at voltaire.com Wed Oct 4 03:40:44 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Oct 2006 06:40:44 -0400 Subject: [openib-general] [PATCH] osm: port to WinIB stack In-Reply-To: <1159912150.4502.10146.camel@hal.voltaire.com> References: <1159912150.4502.10146.camel@hal.voltaire.com> Message-ID: <1159958444.4502.35784.camel@hal.voltaire.com> Hi Eitan, On Tue, 2006-10-03 at 17:49, Hal Rosenstock wrote: > Hi Eitan, > > Aside from the varargs handling (relative to 2 patches) and the > osmtest.c question, osmtest.c patch has been applied. The question is more general as to why the casts were needed for Windows. -- Hal > also pending is a patch to remove the WIN defines > just added in multiple places and move them to config.h for the Windows > build ? Can you/when can you prepare a patch for this ? > > Thanks! > > -- Hal > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Oct 4 04:56:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 13:56:34 +0200 Subject: [openib-general] [PATCH repost] IB/mthca: query port fix Message-ID: <20061004115634.GF5883@mellanox.co.il> Fill in "max_vl_num" (encoded according to VLCap field in the PortInfo MAD), and "init_type_reply" values in the ib_query_port verb. Signed-off-by: Jack Morgenstein --- This was posted a while ago - could the fix go into 2.6.19? Index: ofed_1_1/drivers/infiniband/hw/mthca/mthca_provider.c =================================================================== --- ofed_1_1.orig/drivers/infiniband/hw/mthca/mthca_provider.c 2006-08-03 14:30:21.000000000 +0300 +++ ofed_1_1/drivers/infiniband/hw/mthca/mthca_provider.c 2006-08-20 09:37:10.647839000 +0300 @@ -179,6 +179,8 @@ static int mthca_query_port(struct ib_de props->max_mtu = out_mad->data[41] & 0xf; props->active_mtu = out_mad->data[36] >> 4; props->subnet_timeout = out_mad->data[51] & 0x1f; + props->max_vl_num = out_mad->data[37] >> 4; + props->init_type_reply = out_mad->data[41] >> 4; out: kfree(in_mad); -- MST From mst at mellanox.co.il Wed Oct 4 05:03:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 14:03:43 +0200 Subject: [openib-general] [PATCH repost] IB/srp: re-create QP and CQ on reconnect Message-ID: <20061004120343.GG5883@mellanox.co.il> From: Ishai Rabinovitz Make srp destroy/re-create QP and CQ on each reconnect. This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz Signed-off-by: Michael S. Tsirkin --- Roland, this has been posted a while ago, and still applies to for-2.6.19 with a small offset. Looks like a good idea - could this go into 2.6.19? A description from the original mail below: For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:23:52.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.000000000 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target->cm_id); target->cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target->qp); - if (ret) + old_qp = target->qp; + old_cq = target->cq; + ret = srp_create_target_ib(target); + if (ret) { + target->qp = old_qp; + target->cq = old_cq; goto err; + } - while (ib_poll_cq(target->cq, 1, &wc) > 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target->scsi_host->host_lock); list_for_each_entry_safe(req, tmp, &target->req_queue, list) -- MST From ishai at mellanox.co.il Wed Oct 4 05:18:52 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 4 Oct 2006 14:18:52 +0200 Subject: [openib-general] [PATCH] IB/SRP set initiator_extention from user space Message-ID: <20061004121852.GE32010@mellanox.co.il> There is a need for an initiator to connect to the same target several times, e.g., once from each IB port of the target. Some targets do not support multichannel. In order to work with them as well: 1) Use port_guid instead of node_guid. 2) Allow the user to set the identifier_extension when providing the target attributes. Signed-off-by: Ishai Rabinovitz --- Roland, Madhu and MST, I think this summarizes our discussion. Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-10-03 15:38:16.000000000 +0200 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-10-03 18:10:34.000000000 +0200 @@ -329,25 +329,29 @@ static int srp_send_req(struct srp_targe req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); + /* * In the published SRP specification (draft rev. 16a), the * port identifier format is 8 bytes of ID extension followed - * by 8 bytes of GUID. Older drafts put the two halves in the - * opposite order, so that the GUID comes first. + * by 8 bytes of port GUID. Older drafts put the two halves in the + * opposite order, so that the port GUID comes first. * * Targets conforming to these obsolete drafts can be * recognized by the I/O Class they report. */ + if (target->io_class == SRP_REV10_IB_IO_CLASS) { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id + 8, 8); + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.initiator_port_id + 8, - target->srp_host->initiator_port_id, 8); + &target->initiator_ext, 8); memcpy(req->priv.target_port_id, &target->ioc_guid, 8); memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); } else { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id, 16); + &target->initiator_ext, 8); + memcpy(req->priv.initiator_port_id + 8, + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.target_port_id, &target->id_ext, 8); memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); } @@ -1557,6 +1561,7 @@ enum { SRP_OPT_MAX_SECT = 1 << 5, SRP_OPT_MAX_CMD_PER_LUN = 1 << 6, SRP_OPT_IO_CLASS = 1 << 7, + SRP_OPT_INITIATOR_EXT = 1 << 8, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1573,6 +1578,7 @@ static match_table_t srp_opt_tokens = { { SRP_OPT_MAX_SECT, "max_sect=%d" }, { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d" }, { SRP_OPT_IO_CLASS, "io_class=%x" }, + { SRP_OPT_INITIATOR_EXT, "initiator_ext=%s" }, { SRP_OPT_ERR, NULL } }; @@ -1672,6 +1678,12 @@ static int srp_parse_options(const char target->io_class = token; break; + case SRP_OPT_INITIATOR_EXT: + p = match_strdup(args); + target->initiator_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + default: printk(KERN_WARNING PFX "unknown parameter or missing value " "'%s' in target creation request\n", p); @@ -1820,9 +1832,6 @@ static struct srp_host *srp_add_port(str host->dev = device; host->port = port; - host->initiator_port_id[7] = port; - memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); - host->class_dev.class = &srp_class; host->class_dev.dev = device->dev->dma_device; snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.h =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.h 2006-10-03 15:38:16.000000000 +0200 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.h 2006-10-03 18:05:50.000000000 +0200 @@ -91,7 +91,6 @@ struct srp_device { }; struct srp_host { - u8 initiator_port_id[16]; struct srp_device *dev; u8 port; struct class_device class_dev; @@ -122,6 +121,7 @@ struct srp_target_port { __be64 id_ext; __be64 ioc_guid; __be64 service_id; + __be64 initiator_ext; u16 io_class; struct srp_host *srp_host; struct Scsi_Host *scsi_host; -- Ishai Rabinovitz From mst at mellanox.co.il Wed Oct 4 05:46:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 14:46:57 +0200 Subject: [openib-general] Fwd: Re: Problems with OFED IPoIB HA on SLES10 In-Reply-To: <20061004063440.GE4855@mellanox.co.il> References: <20061004063440.GE4855@mellanox.co.il> Message-ID: <20061004124656.GA6853@mellanox.co.il> Another point: this seems to be crashing while we are requeueing the packet through dev_start_xmit upon path record completion. It looks like this could try to requeue even though the interface is going down - could this trigger some problems? Quoting r. Michael S. Tsirkin : Subject: Fwd: Re: Problems with OFED IPoIB HA on SLES10 BTW, any idea? The ipoib_ha is just a script that ups/downs and configures interfaces, so this crash it seems coul also happen on systems without it. -- MST Date: Tue, 3 Oct 2006 22:39:54 -0700 From: "Scott Weitzenkamp (sweitzen)" Subject: Re: [openib-general] Problems with OFED IPoIB HA on SLES10 If I fail back and forth between ib0 and ib1 every 30 seconds or so for several hours, while IPoIB traffic is running, IPoIB host gets an Oops: and IPoIB stops working. ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet general protection fault: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 7 Modules linked in: af_packet ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_s a ib_uverbs ib_umad ib_mthca ib_mad ib_core nls_utf8 st ipv6 nfs lockd nfs_acl s unrpc button battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random ide_c d ehci_hcd uhci_hcd cdrom i8xx_tco ide_floppy usbcore shpchp e1000 pci_hotplug f loppy reiserfs edd fan thermal processor siimage sg mptspi mptscsih mptbase scsi _transport_spi piix sd_mod scsi_mod ide_disk ide_core Pid: 23541, comm: ib_mad1 Tainted: G U 2.6.16.21-0.8-smp #1 RIP: 0010:[] {_spin_lock_irqsave+3} RSP: 0018:ffff810132a4fc20 EFLAGS: 00010086 RAX: 0000000000000286 RBX: 0000000000000000 RCX: ffffffff883324ee RDX: ffff810128d5e380 RSI: 0000000000000000 RDI: 0000ffff1b6017ff RBP: 00000000fffffffc R08: ffffffff803d3260 R09: ffff810140333800 R10: ffff81000107d400 R11: 0000000000000292 R12: ffff810128d5e380 R13: ffff810132a4fc78 R14: 0000ffff1b6017ff R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff810142d19740(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b0b5e6ae180 CR3: 0000000128cbc000 CR4: 00000000000006e0 Process ib_mad1 (pid: 23541, threadinfo ffff810132a4e000, task ffff810142b56100) Stack: ffffffff8833c5f5 ffff8101302b3000 0000ffff1b6012ff 0000000000000002 0000000000000296 ffff8101302b3500 ffffffff8027753e ffff810128d5e3a0 ffff81012bce1680 ffff810128d5e380 Call Trace: {:ib_ipoib:path_rec_completion+862} {dev_queue_xmit+545} {:ib_ipoib:path_ rec_completion+795} {:ib_sa:ib_sa_path_rec_callback+64} {lock_timer_base+27} {try_to_del_time r_sync+81} {:ib_sa:send_handler+72} {:ib_mad:ib_ mad_complete_send_wr+421} {:ib_mad:ib_mad_completion_handler+947} {:ib_mad:ib_mad_completion_handler+0} {run_workqueue+153} {worker_thread+0} {keventd_create_kthread+0} {worker_th read+265} {__wake_up_common+62} {default_wake_f unction+0} {keventd_create_kthread+0} {kthread+2 36} {child_rip+8} {keventd_create_kthread +0} {kthread+0} {child_rip+0} Code: f0 ff 0f 0f 88 29 01 00 00 c3 fa f0 ff 0f 0f 88 2a 01 00 00 RIP {_spin_lock_irqsave+3} RSP Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Tuesday, October 03, 2006 2:53 PM To: Vladimir Sokolovsky Cc: EWG; openib-General Subject: Re: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10 Vlad, thaks for the fast response. I have some followup questions about configuring IPoIB HA, see below. 3) I got IPoIB HA working on SLES 10, but the documentation is a little lacking. Looks like I have to put the same IP address in ifcfg-ib0 and ifcfg-ib1, is this correct? Yes, IP address should be the same. Actually the configuration of the secondary interface does not matter. The High Availability daemon reads the configuration of the primary interface and migrates it between the interfaces in case of failure. If I don't have an ifcfg-ib1 file, then ipoib_ha.pl won't start. If I don't have an ifcfg-ib1, then ipoib_ha.pl won't start. I would prefer to not configure ifcfg-ib1 since I don't plan to use it. # ipoib_ha.pl --with-arping --with-multicast -v Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory ... If I put different IP addresses in ifcfg-ib0 and ifcfg-ib1, then the ifcfg-ib1 IP address is used for both ib0 and ib1! # pwd /etc/sysconfig/network # cat ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static IPADDR=192.168.2.46 NETMASK=255.255.255.0 ONBOOT=yes # cat ifcfg-ib1 DEVICE=ib1 BOOTPROTO=static IPADDR=192.168.6.46 NETMASK=255.255.255.0 ONBOOT=yes # /etc/init.d/openibd start Loading HCA driver and Access Layer: [ OK ] Setting up InfiniBand network interfaces: ib0 device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com patibility mode) (rev 20) ib0 configuration: ib1 Bringing up interface ib0: [ OK ] ib1 device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com patibility mode) (rev 20) Bringing up interface ib1: [ OK ] Setting up service network . . . [ done ] # ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00 -00 inet addr:192.168.6.46 Bcast:192.168.6.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:21:700d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:3 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:224 (224.0 b) # ifconfig ib1 ib1 Link encap:UNSPEC HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00 -00 inet addr:192.168.6.46 Bcast:192.168.6.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:21:700e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:304 (304.0 b) Notice how both ib0 and ib1 have the IP address from ifcfg-ib1. This contradicts this info from ipoib_release_notes.txt: b. The ib1 interface uses the configuration script of ib0. Scott _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From mst at mellanox.co.il Wed Oct 4 05:47:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 14:47:17 +0200 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159956963.3950.31.camel@localhost> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159885543.24791.21.camel@stevo-desktop> <1159956963.3950.31.camel@localhost> Message-ID: <20061004124717.GB6853@mellanox.co.il> Quoting r. Matt Leininger : > We just got approval to spend OFA money on a new hosted server. The > arrangements are being made but we don't have a date for when we will > get access to this new machine or when it will be set up. If I had to > guess I'd say we will start setting up the server in the next couple > weeks. > > Thanks, > > - Matt Thanks. A couple of more requests as far as you are working on the infrastructure - updated svn server enables fast mirroring better web access and other goodies - add bugzilla email gateway (as seen e.g. at kernel.org) that supports accepting Cc mail where you put "[Bug XXXX]" in the subject (where XXXX is the bug number) and cc bugme-daemon at kernel-bugs.osdl.org Could these be addressed? -- MST From mst at mellanox.co.il Wed Oct 4 05:59:10 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 14:59:10 +0200 Subject: [openib-general] RFC: potential race in ipoib Message-ID: <20061004125910.GC6853@mellanox.co.il> Not related to the recently discussed oops, but I think I see an oopsable race in path_rec_completion: we do: if (dev_queue_xmit(skb)) ipoib_warn(priv, "dev_queue_xmit failed " "to requeue packet\n"); if the device is going away (e.g. hotplug remove) and the skb is the last one, priv pointer might not exist anymore after dev_queue_xmit - the attempt to read the name in ipoib_warn will then lead to a crash. Do we even need the ipoib_warn? Its not too hard to trigger it by downing the device while path record query is in progress. Maybe just remove the message? -- MST From moshek at voltaire.com Wed Oct 4 06:05:31 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Wed, 4 Oct 2006 15:05:31 +0200 Subject: [openib-general] FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: Michael, I received the attached files from Frank. they look small , easy to understand, and change almost nothing in the code. The patch solves the ppc64 problems. Please approve the patch and integrate it into OFED-1.1-rc7. I tested it . it's working o.k. on on JS21 ppc64 sles 10, JS21 ppc64 sles9, redhat as4 u3 x86_64, redhat as4 u3 i386. Frank also tested it on AMD and JS21 PPC and MAC PPC64 . Best regards, Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mstflint.patch Type: application/octet-stream Size: 2335 bytes Desc: mstflint.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mstflint.tgz Type: application/octet-stream Size: 46327 bytes Desc: mstflint.tgz URL: From mst at mellanox.co.il Wed Oct 4 06:28:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 15:28:56 +0200 Subject: [openib-general] [PATCH fixed] IB/srp: enable multiple connections to the same target In-Reply-To: <20061004121852.GE32010@mellanox.co.il> References: <20061004121852.GE32010@mellanox.co.il> Message-ID: <20061004132856.GA7224@mellanox.co.il> Enable multiple concurrent connections to the same SRP target 1) Use port guid instead of node guid in the initiator port identifier. 2) Let the user specify the identifier extention when adding the device. Signed-off-by: Ishai Rabinovitz Signed-off-by: Michael S. Tsirkin --- Looks like the last patch Ishai posted didn't apply to the upstream srp. Here's the version that does. Comments? drivers/infiniband/ulp/srp/ib_srp.c | 19 +++++++++++++------ drivers/infiniband/ulp/srp/ib_srp.h | 2 +- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 44b9e5b..273a688 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -343,14 +343,16 @@ static int srp_send_req(struct srp_targe */ if (target->io_class == SRP_REV10_IB_IO_CLASS) { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id + 8, 8); + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.initiator_port_id + 8, - target->srp_host->initiator_port_id, 8); + &target->initiator_ext, 8); memcpy(req->priv.target_port_id, &target->ioc_guid, 8); memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); } else { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id, 16); + &target->initiator_ext, 8); + memcpy(req->priv.initiator_port_id + 8, + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.target_port_id, &target->id_ext, 8); memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); } @@ -1553,6 +1555,7 @@ enum { SRP_OPT_MAX_SECT = 1 << 5, SRP_OPT_MAX_CMD_PER_LUN = 1 << 6, SRP_OPT_IO_CLASS = 1 << 7, + SRP_OPT_INITIATOR_EXT = 1 << 8, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1569,6 +1572,7 @@ static match_table_t srp_opt_tokens = { { SRP_OPT_MAX_SECT, "max_sect=%d" }, { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d" }, { SRP_OPT_IO_CLASS, "io_class=%x" }, + { SRP_OPT_INITIATOR_EXT, "initiator_ext=%s" }, { SRP_OPT_ERR, NULL } }; @@ -1668,6 +1672,12 @@ static int srp_parse_options(const char target->io_class = token; break; + case SRP_OPT_INITIATOR_EXT: + p = match_strdup(args); + target->initiator_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + default: printk(KERN_WARNING PFX "unknown parameter or missing value " "'%s' in target creation request\n", p); @@ -1815,9 +1825,6 @@ static struct srp_host *srp_add_port(str host->dev = device; host->port = port; - host->initiator_port_id[7] = port; - memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); - host->class_dev.class = &srp_class; host->class_dev.dev = device->dev->dma_device; snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index 5b581fb..d4e35ef 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -91,7 +91,6 @@ struct srp_device { }; struct srp_host { - u8 initiator_port_id[16]; struct srp_device *dev; u8 port; struct class_device class_dev; @@ -122,6 +121,7 @@ struct srp_target_port { __be64 id_ext; __be64 ioc_guid; __be64 service_id; + __be64 initiator_ext; u16 io_class; struct srp_host *srp_host; struct Scsi_Host *scsi_host; -- MST From mlakshmanan at silverstorm.com Wed Oct 4 06:37:41 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Wed, 4 Oct 2006 09:37:41 -0400 Subject: [openib-general] [PATCH fixed] IB/srp: enable multiple connections to the same target In-Reply-To: <20061004132856.GA7224@mellanox.co.il> Message-ID: > Enable multiple concurrent connections to the same SRP target > > 1) Use port guid instead of node guid in the initiator port identifier. > 2) Let the user specify the identifier extention when adding the device. > > Signed-off-by: Ishai Rabinovitz > Signed-off-by: Michael S. Tsirkin > --- > > Looks like the last patch Ishai posted didn't apply to the upstream srp. > Here's the version that does. Comments? > I had some trouble applying the patch as well. I'll try again and let you know soon. From reviewing the code, it appears to fulfill the requirements we agreed upon. Madhu From mst at mellanox.co.il Wed Oct 4 06:38:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 15:38:57 +0200 Subject: [openib-general] FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20061004133857.GH5883@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: FW: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > Michael, > > I received the attached files from Frank. they look small , easy to understand, and change almost nothing in the code. > > The patch solves the ppc64 problems. > > Please approve the patch and integrate it into OFED-1.1-rc7. > > I tested it . it's working o.k. on on JS21 ppc64 sles 10, JS21 ppc64 sles9, redhat as4 u3 x86_64, redhat as4 u3 i386. > Frank also tested it on AMD and JS21 PPC and MAC PPC64 . > > > > Best regards, > > Moshe OK, not sure what's in a tarball, but the patch looks small and safe enough to go in. But, we need the Signed-off-by like from the patch author, certifying to the Developer's Certificate of Origin 1.1: ---- The rules are pretty simple: if you can certify the below: Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. then you just add a line saying Signed-off-by: Random J Developer -- MST From jlentini at netapp.com Wed Oct 4 06:44:51 2006 From: jlentini at netapp.com (James Lentini) Date: Wed, 4 Oct 2006 09:44:51 -0400 (EDT) Subject: [openib-general] Infiniband Fedora Core5 In-Reply-To: <45238725.5030802@dev.mellanox.co.il> References: <45238725.5030802@dev.mellanox.co.il> Message-ID: Hi Scott, While OFED might not target FC5, you can use the OFA stack on FC5. In fact, FC5 comes with the OFA drivers (the code in driver/infiniband) pre-compiled. You can also download the latest development tree and compile the current drivers/libraries/apps. Of course neither of these options have received as much testing as the OFED distribution. james On Wed, 4 Oct 2006, Aviram Gutman wrote: > No, Fedora Core 5 is not part of the OFED OS matrix. > > Marsh, Scott wrote: > > > > Good day, > > > > > > > > My name is Scott Marsh. I am an Engineer for Analogic Corporation > > and I have a few questions > > > > regarding OFED. Is there any current development towards OFED for > > use with Fedora Core 5? > > > > If so, is there a timeline for working towards Fedora Core 5? > > > > > > > > Thank you. > > > > > > > > Regards, > > > > > > > > Scott Marsh From erezz at voltaire.com Wed Oct 4 06:58:39 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 04 Oct 2006 15:58:39 +0200 Subject: [openib-general] [PATCH v2] ib_cm: fix module unload race with timewait In-Reply-To: <000201c6e640$b987ad90$19d1180a@amr.corp.intel.com> References: <000201c6e640$b987ad90$19d1180a@amr.corp.intel.com> Message-ID: <4523BE0F.9090209@voltaire.com> Sean Hefty wrote: > Updated patch based on Roland's feedback - converted a couple uses > of spinlock_irqsave to spinlock_irq, and used list manipulation > routine for cleanup. > > Sean, Your patch seems to work fine. I ran the same test several times (after applying the patch) and didn't see any oops. Thanks -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From bhartner at austin.rr.com Wed Oct 4 08:52:24 2006 From: bhartner at austin.rr.com (Bill Hartner) Date: Wed, 04 Oct 2006 10:52:24 -0500 Subject: [openib-general] RHEL 4 U3 - lost completions References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> Message-ID: <4523D8B7.442D9C39@austin.rr.com> "Michael S. Tsirkin" wrote: > > Quoting r. glebn at voltaire.com : > > AFAIR there is a bug in kernel 2.6.9 that makes it possible for page to > > be changed in process's VM even though it is locked by get_user_pages(). > > That is why Mellanox driver used mlock() in addition to > > get_user_pages(). I think this bug was fixed somewhere around 2.6.11. > > I think it got fixed around 2.6.7. RHEL4 U3 has this fix, > and AFAIK last SLES9 update has backported that to 2.6.7 too. Another data point here. On gen1 stacks + RHEL 4 U3, the app I'm working on mlock()s a region from user space and also does get_user_pages() on the same region from a kernel piece of the app. When the adapter was closed or the registration was freed, the region was munlock()ed by the IB stack and the page structs changed from under us, even though the app still had get_user_pages() on the region. Is this an indication that get_user_pages() not guaranteeing a page does not move on RHEL 4 U3? I created a test case using pthreads and simulated what the real app does and can not recreate. I will continue to debug the app. I will also verify no forks take place. -Bill From thlin at us.ibm.com Wed Oct 4 09:26:41 2006 From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin) Date: Wed, 04 Oct 2006 11:26:41 -0500 Subject: [openib-general] [PATCH] mstflint not working on ppc64 and when driver is not loaded on AMD Message-ID: <1159979201.21249.121.camel@flin.austin.ibm.com> mmap() does not work on ppc64. The 64-bit machines with 32-bit I/O need ioremap in device driver to allow mmap access to the I/O memory. This patch checks the above situations and try to use PCI config to do the firmware update when mmap() failed. Signed-off-by: Tseng-Hui (Frank) Lin ======= -------------- next part -------------- diff -uPr mstflint.ofed-1.1r6/mtcr.h mstflint/mtcr.h --- mstflint.ofed-1.1r6/mtcr.h 2006-09-17 10:46:21.000000000 -0500 +++ mstflint/mtcr.h 2006-10-03 10:29:38.000000000 -0500 @@ -294,6 +294,9 @@ int err; char buf[]="0000:00:00.0"; char path[]="/sys/bus/pci/devices/0000:00:00.0/resource0"; + unsigned domain, bus, dev, func; + struct stat dummybuf; + char file_name[]="/proc/bus/pci/0000:00/00.0"; mf=(mfile*)malloc(sizeof(mfile)); if (!mf) return 0; @@ -338,13 +341,14 @@ mf->ptr = mmap(NULL, 0x100000, PROT_READ | PROT_WRITE, MAP_SHARED, mf->fd, 0); - if ( (! mf->ptr) || (mf->ptr == MAP_FAILED) ) goto map_failed; + if ( (! mf->ptr) || (mf->ptr == MAP_FAILED) || + (__be32_to_cpu(*((u_int32_t *) ((char *) mf->ptr + 0xF0014))) == 0xFFFFFFFF) ) + goto map_failed_try_pciconf; } #endif else { #if CONFIG_ENABLE_MMAP - unsigned bus, dev, func; if (mfind(name,&offset,&bus,&dev,&func)) goto find_failed; #if CONFIG_USE_DEV_MEM @@ -352,8 +356,6 @@ if (mf->fd<0) goto open_failed; #else { - struct stat dummybuf; - char file_name[]="/proc/bus/pci/0000:00/00.0"; sprintf(file_name,"/proc/bus/pci/%2.2x/%2.2x.%1.1x", bus,dev,func); if (stat(file_name,&dummybuf)) @@ -369,7 +371,9 @@ mf->ptr = mmap(NULL, 0x100000, PROT_READ | PROT_WRITE, MAP_SHARED, mf->fd, offset); - if ( (! mf->ptr) || (mf->ptr == MAP_FAILED) ) goto map_failed; + if ( (! mf->ptr) || (mf->ptr == MAP_FAILED) || + (__be32_to_cpu(*((u_int32_t *) ((char *) mf->ptr + 0xF0014))) == 0xFFFFFFFF) ) + goto map_failed_try_pciconf; #else goto open_failed; @@ -379,6 +383,20 @@ #if CONFIG_ENABLE_MMAP +map_failed_try_pciconf: +#if CONFIG_ENABLE_PCICONF + mf->ptr = NULL; + close(mf->fd); + if (sscanf(name, "%x:%x:%x.%x", &domain, &bus, &dev, &func) != 4) { + domain = 0; + if (sscanf(name, "%x:%x.%x", &bus, &dev, &func) != 3) goto map_failed; + } + snprintf(file_name, sizeof file_name, "/proc/bus/pci/%2.2x/%2.2x.%1.1x", bus, dev, func); + if (stat(file_name,&dummybuf)) + snprintf(file_name, sizeof file_name, "/proc/bus/pci/%4.4x:%2.2x/%2.2x.%1.1x", domain, bus,dev,func); + if ((mf->fd = open(file_name, O_RDWR | O_SYNC)) >= 0) return mf; +#endif + map_failed: #if !CONFIG_USE_DEV_MEM ioctl_failed: From sean.hefty at intel.com Wed Oct 4 09:48:06 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 09:48:06 -0700 Subject: [openib-general] [PATCH v2] ib_cm: fix module unload race with timewait In-Reply-To: <4523BE0F.9090209@voltaire.com> Message-ID: <000001c6e7d4$dd856df0$8698070a@amr.corp.intel.com> >Your patch seems to work fine. I ran the same test several times (after >applying the patch) and didn't see any oops. Thanks - I will commit to svn and resubmit for 2.6.19 inclusion. - Sean From sean.hefty at intel.com Wed Oct 4 09:53:54 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 09:53:54 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <452389C9.30000@voltaire.com> Message-ID: <000101c6e7d5$ad0034c0$8698070a@amr.corp.intel.com> >And when getting established event, the connection data specified by the >passive side through the rdma_conn_param provided to rdma_accept is also >not given to the user, is that an issue? Correct - the connection parameter data disappears into the rdma_cm and is not directly given to the remote side. The data can be obtained by querying the QP after it's been connected, or by calling rdma_init_qp_attr(), but neither of these methods are very clean. - Sean From mlleinin at hpcn.ca.sandia.gov Wed Oct 4 09:59:19 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Wed, 04 Oct 2006 09:59:19 -0700 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <20061004124717.GB6853@mellanox.co.il> References: <1159476585.30153.80.camel@stevo-desktop> <451C3F02.3000907@ichips.intel.com> <1159885543.24791.21.camel@stevo-desktop> <1159956963.3950.31.camel@localhost> <20061004124717.GB6853@mellanox.co.il> Message-ID: <1159981159.3950.61.camel@localhost> On Wed, 2006-10-04 at 14:47 +0200, Michael S. Tsirkin wrote: > Quoting r. Matt Leininger : > > We just got approval to spend OFA money on a new hosted server. The > > arrangements are being made but we don't have a date for when we will > > get access to this new machine or when it will be set up. If I had to > > guess I'd say we will start setting up the server in the next couple > > weeks. > > > > Thanks, > > > > - Matt > > Thanks. > A couple of more requests as far as you are working on the infrastructure > - updated svn server > enables fast mirroring better web access and other goodies Are you referring to svn 1.4? our plan is to upgrade to 1.4. > - add bugzilla email gateway > (as seen e.g. at kernel.org) that supports accepting Cc mail > where you put "[Bug XXXX]" in the subject (where XXXX is the bug number) and cc > bugme-daemon at kernel-bugs.osdl.org I'll add that to the list. - Matt > > Could these be addressed? > From sean.hefty at intel.com Wed Oct 4 10:00:01 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 10:00:01 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <20061004103359.GC5883@mellanox.co.il> Message-ID: <000201c6e7d6$87eb4c00$8698070a@amr.corp.intel.com> >> The ideal fix for this is to include rdma_conn_param as part of the >> rdma_cm_event. > >BTW, wouldn't it be cleaner to just pass it up in the request event? Yes - this is what I meant by including it in the rdma_cm_event structure. >> However, this breaks every userspace app that's been coded to >> OFED / SVN. >> An alternative is to add another call to retrieve the data, but >> that's not a very clean alternative for new kernel submission. > >Another alternative is to version the create ID call. Hmm... I need to think about the implementation of this more, but this sounds like a possibility. Can you provide any details on how you're envisioning this working? - Sean From swise at opengridcomputing.com Wed Oct 4 10:13:16 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 04 Oct 2006 12:13:16 -0500 Subject: [openib-general] rdma_cm branch In-Reply-To: <000201c6e7d6$87eb4c00$8698070a@amr.corp.intel.com> References: <000201c6e7d6$87eb4c00$8698070a@amr.corp.intel.com> Message-ID: <1159981996.4489.1.camel@stevo-desktop> On Wed, 2006-10-04 at 10:00 -0700, Sean Hefty wrote: > >> The ideal fix for this is to include rdma_conn_param as part of the > >> rdma_cm_event. > > > >BTW, wouldn't it be cleaner to just pass it up in the request event? > > Yes - this is what I meant by including it in the rdma_cm_event structure. > > >> However, this breaks every userspace app that's been coded to > >> OFED / SVN. > >> An alternative is to add another call to retrieve the data, but > >> that's not a very clean alternative for new kernel submission. > > > >Another alternative is to version the create ID call. > > Hmm... I need to think about the implementation of this more, but this sounds > like a possibility. Can you provide any details on how you're envisioning this > working? > > - Sean > Guys, I must be confused. I thought the private data _was_ passed up in the ESTABLISHED event on the active side. We have tools in the perftools directory that utilize this. What am I missing here? Thanks, Steve. From sean.hefty at intel.com Wed Oct 4 10:17:39 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 10:17:39 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <1159981996.4489.1.camel@stevo-desktop> Message-ID: <000301c6e7d8$feaf4790$8698070a@amr.corp.intel.com> >Guys, I must be confused. I thought the private data _was_ passed up in >the ESTABLISHED event on the active side. We have tools in the >perftools directory that utilize this. What am I missing here? When a user calls rdma_connect(), they specific connection parameters (like responder_resources and initiator_depth) through a struct rdma_conn_param. These parameters are NOT given to the user when the connect request event is reported. The issue is: are these values needed by the user during connection establishment? If yes, then how do we export them to the user. - Sean From swise at opengridcomputing.com Wed Oct 4 10:40:38 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 04 Oct 2006 12:40:38 -0500 Subject: [openib-general] rdma_cm branch In-Reply-To: <000301c6e7d8$feaf4790$8698070a@amr.corp.intel.com> References: <000301c6e7d8$feaf4790$8698070a@amr.corp.intel.com> Message-ID: <1159983638.4489.15.camel@stevo-desktop> On Wed, 2006-10-04 at 10:17 -0700, Sean Hefty wrote: > >Guys, I must be confused. I thought the private data _was_ passed up in > >the ESTABLISHED event on the active side. We have tools in the > >perftools directory that utilize this. What am I missing here? > > When a user calls rdma_connect(), they specific connection parameters (like > responder_resources and initiator_depth) through a struct rdma_conn_param. > These parameters are NOT given to the user when the connect request event is > reported. > > The issue is: are these values needed by the user during connection > establishment? If yes, then how do we export them to the user. > I understand now. For iWARP, the key parameter is setting your local QP's ORD (initiator resources) to <= your peer's IRD (responder resources) to avoid overflowing the peers incoming rdma read queue. I think the iWARP devices must support setting ORD even after the connection is setup and the QP is in RTS, so the connection _could_ be setup (qp moved to RTS) and then the QP modified to the appropriate settings after querying to get the peer's params. But I think it seems more natural to deal with this at connection setup time. It would be nice, IMO, for the RDMA CM to handle this under the covers and setup the QP appropriately. Thus the parameters need not be passed to the consumer... My 2 cents. From mshefty at ichips.intel.com Wed Oct 4 10:43:03 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 10:43:03 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004061754.GB4855@mellanox.co.il> References: <20061003211921.GB16503@mellanox.co.il> <4522DF3B.3020205@ichips.intel.com> <20061004061754.GB4855@mellanox.co.il> Message-ID: <4523F2A7.8090501@ichips.intel.com> Michael S. Tsirkin wrote: >>There's several timeout values transfered and used by the cm, most notably the >>remote cm response timeout and packet life time. Does it make more sense to >>have a single, generic timeout maximum instead? > > Hmm. I'm not sure - we are working around an actual broken implementation here - > what do you think? I wasn't sure either. The MRA timeout is a combination of the packet life time + service timeout, which made me bring this up. The patch only handles the service timeout portion, so we end up in the same situation if a large packet life time is ever used. >>Would it make more sense to >>enable the maximum(s) by default, since we're dependent upon values received >>over the network? > > I think it would. So do I. The CM has checks to bring out of range values into range, but at the maximum, we get a timeout of about 2.5 hours. Multiple that by 15 retries, and the cm can literally spend all day retrying a request. I was considering dropping the default maximum down to around 4-8 seconds, which with retries still gives us about a minute to timeout a request. The default maximum would apply to local and remote cm timeouts, packet life time, and service timeout, but could be overridden by the user. (Basically, with Ishai's patch: rename mra_timeout_limit to timeout_limit, set to a default of 20, and replace occurrences of '31' in the code with timeout_limit.) - Sean From tom at opengridcomputing.com Wed Oct 4 11:00:37 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 04 Oct 2006 13:00:37 -0500 Subject: [openib-general] rdma_cm branch In-Reply-To: <1159983638.4489.15.camel@stevo-desktop> Message-ID: On 10/4/06 12:40 PM, "Steve Wise" wrote: > On Wed, 2006-10-04 at 10:17 -0700, Sean Hefty wrote: >>> Guys, I must be confused. I thought the private data _was_ passed up in >>> the ESTABLISHED event on the active side. We have tools in the >>> perftools directory that utilize this. What am I missing here? >> >> When a user calls rdma_connect(), they specific connection parameters (like >> responder_resources and initiator_depth) through a struct rdma_conn_param. >> These parameters are NOT given to the user when the connect request event is >> reported. >> >> The issue is: are these values needed by the user during connection >> establishment? If yes, then how do we export them to the user. >> > > I understand now. > > For iWARP, the key parameter is setting your local QP's ORD (initiator > resources) to <= your peer's IRD (responder resources) to avoid > overflowing the peers incoming rdma read queue. I think the iWARP > devices must support setting ORD even after the connection is setup and > the QP is in RTS, so the connection _could_ be setup (qp moved to RTS) > and then the QP modified to the appropriate settings after querying to > get the peer's params. But I think it seems more natural to deal with > this at connection setup time. > > It would be nice, IMO, for the RDMA CM to handle this under the covers > and setup the QP appropriately. Thus the parameters need not be passed > to the consumer... > Actually, I think how the IRD/ORD parameters are exchanged and negotiated by the CM in private data is a separate issue from whether or not the end result of the negotiation is provided to the app in the established event. I think some apps would like to know the end result of the negotiation so that it can throttle RDMA_READ submissions on the SQ and avoid stalling outbound RDMA_WRITE/RDMA_SEND behind the last RMDA_READ. So I guess that's a long way of saying that I advocate adding the negotiated value/values to the event. > My 2 cents. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Wed Oct 4 11:03:00 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 11:03:00 -0700 Subject: [openib-general] rdma_cm connection parameters (was: rdma_cm branch) In-Reply-To: <1159983638.4489.15.camel@stevo-desktop> References: <000301c6e7d8$feaf4790$8698070a@amr.corp.intel.com> <1159983638.4489.15.camel@stevo-desktop> Message-ID: <4523F754.7080604@ichips.intel.com> Steve Wise wrote: > It would be nice, IMO, for the RDMA CM to handle this under the covers > and setup the QP appropriately. Thus the parameters need not be passed > to the consumer... The same parameters are also specified when calling rdma_accept(). I think these are the values that are used for the connection. (I need to trace through the code to be sure.) There's no easy way for the passive side to know what was requested without exporting the values. We could drop to the lower of the two values, and let users that really care what the values are call ib_query_qp() after the connection has been established. This has the disadvantage that you couldn't just reject the connection if the values weren't what you needed. - Sean From sean.hefty at intel.com Wed Oct 4 11:29:59 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 11:29:59 -0700 Subject: [openib-general] [PATCH 1/2] 2.6.19 ib_cm: fix timewait crash after module unload Message-ID: <000401c6e7e3$195aa080$8698070a@amr.corp.intel.com> From: Sean Hefty If the ib_cm module is unloaded while id's are still in timewait, the CM will destroy the work queue used to process timewait. Once the id's exit timewait, their timers will fire, leading to a crash trying to access the destroyed work queue. We need to track id's that are in timewait, and cancel their deferred work on module unload. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index f35fcc4..470c482 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -75,6 +75,7 @@ static struct ib_cm { struct rb_root remote_sidr_table; struct idr local_id_table; __be32 random_id_operand; + struct list_head timewait_list; struct workqueue_struct *wq; } cm; @@ -112,6 +113,7 @@ struct cm_work { struct cm_timewait_info { struct cm_work work; /* Must be first. */ + struct list_head list; struct rb_node remote_qp_node; struct rb_node remote_id_node; __be64 remote_ca_guid; @@ -647,13 +649,6 @@ static inline int cm_convert_to_ms(int i static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info) { - unsigned long flags; - - if (!timewait_info->inserted_remote_id && - !timewait_info->inserted_remote_qp) - return; - - spin_lock_irqsave(&cm.lock, flags); if (timewait_info->inserted_remote_id) { rb_erase(&timewait_info->remote_id_node, &cm.remote_id_table); timewait_info->inserted_remote_id = 0; @@ -663,7 +658,6 @@ static void cm_cleanup_timewait(struct c rb_erase(&timewait_info->remote_qp_node, &cm.remote_qp_table); timewait_info->inserted_remote_qp = 0; } - spin_unlock_irqrestore(&cm.lock, flags); } static struct cm_timewait_info * cm_create_timewait_info(__be32 local_id) @@ -684,8 +678,12 @@ static struct cm_timewait_info * cm_crea static void cm_enter_timewait(struct cm_id_private *cm_id_priv) { int wait_time; + unsigned long flags; + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + list_add_tail(&cm_id_priv->timewait_info->list, &cm.timewait_list); + spin_unlock_irqrestore(&cm.lock, flags); /* * The cm_id could be destroyed by the user before we exit timewait. @@ -701,9 +699,13 @@ static void cm_enter_timewait(struct cm_ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv) { + unsigned long flags; + cm_id_priv->id.state = IB_CM_IDLE; if (cm_id_priv->timewait_info) { + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + spin_unlock_irqrestore(&cm.lock, flags); kfree(cm_id_priv->timewait_info); cm_id_priv->timewait_info = NULL; } @@ -1307,6 +1309,7 @@ static struct cm_id_private * cm_match_r if (timewait_info) { cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, timewait_info->work.remote_id); + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); if (cur_cm_id_priv) { cm_dup_req_handler(work, cur_cm_id_priv); @@ -1315,7 +1318,8 @@ static struct cm_id_private * cm_match_r cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + listen_cm_id_priv = NULL; + goto out; } /* Find matching listen request. */ @@ -1323,21 +1327,20 @@ static struct cm_id_private * cm_match_r req_msg->service_id, req_msg->private_data); if (!listen_cm_id_priv) { + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + goto out; } atomic_inc(&listen_cm_id_priv->refcount); atomic_inc(&cm_id_priv->refcount); cm_id_priv->id.state = IB_CM_REQ_RCVD; atomic_inc(&cm_id_priv->work_count); spin_unlock_irqrestore(&cm.lock, flags); +out: return listen_cm_id_priv; - -error: cm_cleanup_timewait(cm_id_priv->timewait_info); - return NULL; } static int cm_req_handler(struct cm_work *work) @@ -2601,28 +2604,29 @@ static int cm_timewait_handler(struct cm { struct cm_timewait_info *timewait_info; struct cm_id_private *cm_id_priv; - unsigned long flags; int ret; timewait_info = (struct cm_timewait_info *)work; - cm_cleanup_timewait(timewait_info); + spin_lock_irq(&cm.lock); + list_del(&timewait_info->list); + spin_unlock_irq(&cm.lock); cm_id_priv = cm_acquire_id(timewait_info->work.local_id, timewait_info->work.remote_id); if (!cm_id_priv) return -EINVAL; - spin_lock_irqsave(&cm_id_priv->lock, flags); + spin_lock_irq(&cm_id_priv->lock); if (cm_id_priv->id.state != IB_CM_TIMEWAIT || cm_id_priv->remote_qpn != timewait_info->remote_qpn) { - spin_unlock_irqrestore(&cm_id_priv->lock, flags); + spin_unlock_irq(&cm_id_priv->lock); goto out; } cm_id_priv->id.state = IB_CM_IDLE; ret = atomic_inc_and_test(&cm_id_priv->work_count); if (!ret) list_add_tail(&work->list, &cm_id_priv->work_list); - spin_unlock_irqrestore(&cm_id_priv->lock, flags); + spin_unlock_irq(&cm_id_priv->lock); if (ret) cm_process_work(cm_id_priv, work); @@ -3374,6 +3378,7 @@ static int __init ib_cm_init(void) idr_init(&cm.local_id_table); get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand); idr_pre_get(&cm.local_id_table, GFP_KERNEL); + INIT_LIST_HEAD(&cm.timewait_list); cm.wq = create_workqueue("ib_cm"); if (!cm.wq) @@ -3391,7 +3396,20 @@ error: static void __exit ib_cm_cleanup(void) { + struct cm_timewait_info *timewait_info, *tmp; + + spin_lock_irq(&cm.lock); + list_for_each_entry(timewait_info, &cm.timewait_list, list) + cancel_delayed_work(&timewait_info->work.work); + spin_unlock_irq(&cm.lock); + destroy_workqueue(cm.wq); + + list_for_each_entry_safe(timewait_info, tmp, &cm.timewait_list, list) { + list_del(&timewait_info->list); + kfree(timewait_info); + } + ib_unregister_client(&cm_client); idr_destroy(&cm.local_id_table); } From sean.hefty at intel.com Wed Oct 4 11:37:25 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 11:37:25 -0700 Subject: [openib-general] [PATCH 2/2] 2.6.19 ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <000401c6e7e3$195aa080$8698070a@amr.corp.intel.com> Message-ID: <000501c6e7e4$236d1930$8698070a@amr.corp.intel.com> From: Sean Hefty Currently a DREP is only sent in response to a DREQ if a connection has been found matching the DREQ, and it is in the proper state. Once a DREP is sent, the local connection moves into timewait. Duplicate DREQs received while in this state result in re-sending the DREP. However, it's likely that the local connection will enter and exit timewait before the remote side times out a lost DREP and resends a DREQ. To handle this, we send a DREP in response to a DREQ, even if a local connection is not found. This avoids maintaining disconnected id's in timewait states for excessively long times, just to handle a lost DREP. Signed-off-by: Sean Hefty --- This addresses a problem experienced by MPI during scale-up testing. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 470c482..25b1018 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -1902,6 +1902,32 @@ out: spin_unlock_irqrestore(&cm_id_priv- } EXPORT_SYMBOL(ib_send_cm_drep); +static int cm_issue_drep(struct cm_port *port, + struct ib_mad_recv_wc *mad_recv_wc) +{ + struct ib_mad_send_buf *msg = NULL; + struct cm_dreq_msg *dreq_msg; + struct cm_drep_msg *drep_msg; + int ret; + + ret = cm_alloc_response_msg(port, mad_recv_wc, &msg); + if (ret) + return ret; + + dreq_msg = (struct cm_dreq_msg *) mad_recv_wc->recv_buf.mad; + drep_msg = (struct cm_drep_msg *) msg->mad; + + cm_format_mad_hdr(&drep_msg->hdr, CM_DREP_ATTR_ID, dreq_msg->hdr.tid); + drep_msg->remote_comm_id = dreq_msg->local_comm_id; + drep_msg->local_comm_id = dreq_msg->remote_comm_id; + + ret = ib_post_send_mad(msg, NULL); + if (ret) + cm_free_msg(msg); + + return ret; +} + static int cm_dreq_handler(struct cm_work *work) { struct cm_id_private *cm_id_priv; @@ -1913,8 +1939,10 @@ static int cm_dreq_handler(struct cm_wor dreq_msg = (struct cm_dreq_msg *)work->mad_recv_wc->recv_buf.mad; cm_id_priv = cm_acquire_id(dreq_msg->remote_comm_id, dreq_msg->local_comm_id); - if (!cm_id_priv) + if (!cm_id_priv) { + cm_issue_drep(work->port, work->mad_recv_wc); return -EINVAL; + } work->cm_event.private_data = &dreq_msg->private_data; From tom at opengridcomputing.com Wed Oct 4 12:02:25 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 04 Oct 2006 14:02:25 -0500 Subject: [openib-general] rdma_cm connection parameters (was: rdma_cm branch) In-Reply-To: <4523F754.7080604@ichips.intel.com> Message-ID: On 10/4/06 1:03 PM, "Sean Hefty" wrote: > Steve Wise wrote: >> It would be nice, IMO, for the RDMA CM to handle this under the covers >> and setup the QP appropriately. Thus the parameters need not be passed >> to the consumer... > > The same parameters are also specified when calling rdma_accept(). I think > these are the values that are used for the connection. (I need to trace > through > the code to be sure.) There's no easy way for the passive side to know what > was > requested without exporting the values. > > We could drop to the lower of the two values, and let users that really care > what the values are call ib_query_qp() after the connection has been > established. This has the disadvantage that you couldn't just reject the > connection if the values weren't what you needed. > Can't the passive side receive the active side's ORD/IRD in the rdma_cm_event. Is providing the values in the rdma_cm_event what you mean by 'exporting' the values? The passive side could then call either rdma_accept or rdma_reject based on these values. My assumption is that the typical behavior, however, would be to limit itself to whatever the active side requested, or what it was capable of and then return these values in it's own call to rdma_accept. The service provided by the CM is to marshal and unmarshal these values from reserved private data into the rdma_cm_event structure. I would think the limitation is that the active side could effectively overprovision it's QP if the passive side couldn't honor it's request. Am I confused? > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Wed Oct 4 12:18:34 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 04 Oct 2006 21:18:34 +0200 Subject: [openib-general] [PATCH] opensm: setup function for 'null' routing engine Message-ID: <20061004191834.25847.15156.stgit@sashak.voltaire.com> This defines setup function for 'null' routing engine. Currently there is only log message and fallback to default behavior, so the function is needed to prevent opensm crash when '-R null' is used. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_opensm.c | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c index ac9d462..0c5450d 100644 --- a/osm/opensm/osm_opensm.c +++ b/osm/opensm/osm_opensm.c @@ -76,8 +76,10 @@ struct routing_engine_module { extern int osm_ucast_updn_setup(osm_opensm_t *p_osm); extern int osm_ucast_file_setup(osm_opensm_t *p_osm); +static int osm_ucast_null_setup(osm_opensm_t *p_osm); + const static struct routing_engine_module routing_modules[] = { - { "null", NULL }, + { "null", osm_ucast_null_setup }, { "updn", osm_ucast_updn_setup }, { "file", osm_ucast_file_setup }, { NULL, NULL } @@ -102,6 +104,14 @@ static int setup_routing_engine(osm_open return -1; } +static int osm_ucast_null_setup(osm_opensm_t *p_osm) +{ + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "osm_ucast_null_setup: nothing yet - " + "will use default routing engine\n"); + return 0; +} + /********************************************************************** **********************************************************************/ void From mshefty at ichips.intel.com Wed Oct 4 12:17:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 12:17:26 -0700 Subject: [openib-general] rdma_cm connection parameters (was: rdma_cm branch) In-Reply-To: References: Message-ID: <452408C6.8090505@ichips.intel.com> Tom Tucker wrote: > Can't the passive side receive the active side's ORD/IRD in the > rdma_cm_event. Is providing the values in the rdma_cm_event what you mean by > 'exporting' the values? That along with copying the values up to userspace. - Sean From halr at voltaire.com Wed Oct 4 12:26:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Oct 2006 15:26:34 -0400 Subject: [openib-general] [PATCH] opensm: setup function for 'null' routing engine In-Reply-To: <20061004191834.25847.15156.stgit@sashak.voltaire.com> References: <20061004191834.25847.15156.stgit@sashak.voltaire.com> Message-ID: <1159989992.4502.52880.camel@hal.voltaire.com> On Wed, 2006-10-04 at 15:18, Sasha Khapyorsky wrote: > This defines setup function for 'null' routing engine. Currently there > is only log message and fallback to default behavior, so the function > is needed to prevent opensm crash when '-R null' is used. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From sashak at voltaire.com Wed Oct 4 12:34:36 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 04 Oct 2006 21:34:36 +0200 Subject: [openib-general] [PATCH] opensm: verbose message about fallback to default routing engine. Message-ID: <20061004193436.26311.31010.stgit@sashak.voltaire.com> This provides verbose message for cases then specified routing engine (with -R) was not found or this setup was failed. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_opensm.c | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c index 0c5450d..00cb0f6 100644 --- a/osm/opensm/osm_opensm.c +++ b/osm/opensm/osm_opensm.c @@ -92,8 +92,12 @@ static int setup_routing_engine(osm_open for (r = routing_modules; r->name && *r->name; r++) { if(!strcmp(r->name, name)) { p_osm->routing_engine.name = r->name; - if (r->setup(p_osm)) - break; + if (r->setup(p_osm)) { + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "setup of routing engine \'%s\'" + " failed\n", name); + return -2; + } osm_log (&p_osm->log, OSM_LOG_DEBUG, "setup_routing_engine: " "\'%s\' routing engine set up\n", @@ -299,8 +303,13 @@ #endif goto Exit; if( p_opt->routing_engine_name && - setup_routing_engine(p_osm, p_opt->routing_engine_name)) + setup_routing_engine(p_osm, p_opt->routing_engine_name)) { + osm_log( &p_osm->log, OSM_LOG_VERBOSE, + "osm_opensm_init: cannot find or setup routing engine" + " \'%s\'. Default will be used instead.\n", + p_opt->routing_engine_name); goto Exit; + } Exit: osm_log( &p_osm->log, OSM_LOG_FUNCS, "osm_opensm_init: ]\n" ); /* Format Waived */ From halr at voltaire.com Wed Oct 4 12:39:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Oct 2006 15:39:10 -0400 Subject: [openib-general] [PATCH] opensm: verbose message about fallback to default routing engine. In-Reply-To: <20061004193436.26311.31010.stgit@sashak.voltaire.com> References: <20061004193436.26311.31010.stgit@sashak.voltaire.com> Message-ID: <1159990750.4502.53302.camel@hal.voltaire.com> On Wed, 2006-10-04 at 15:34, Sasha Khapyorsky wrote: > This provides verbose message for cases then specified routing engine > (with -R) was not found or this setup was failed. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From mst at mellanox.co.il Wed Oct 4 13:14:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 22:14:25 +0200 Subject: [openib-general] 2.6.18 kernel support in the main trunk. In-Reply-To: <1159981159.3950.61.camel@localhost> References: <1159981159.3950.61.camel@localhost> Message-ID: <20061004201425.GD9170@mellanox.co.il> Quoting r. Matt Leininger : > > A couple of more requests as far as you are working on the infrastructure > > - updated svn server > > enables fast mirroring better web access and other goodies > > Are you referring to svn 1.4? our plan is to upgrade to 1.4. Yes, thanks. I was generally saying svn should be kept up to date in some way. > > - add bugzilla email gateway > > (as seen e.g. at kernel.org) that supports accepting Cc mail > > where you put "[Bug XXXX]" in the subject (where XXXX is the bug number) and cc > > bugme-daemon at kernel-bugs.osdl.org > > I'll add that to the list. Thanks. -- MST From mst at mellanox.co.il Wed Oct 4 13:22:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 22:22:19 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <45228C7A.5070402@ichips.intel.com> References: <1159829003.31507.110.camel@stevo-desktop> <45219DFC.1010306@ichips.intel.com> <20061003071632.GE1288@mellanox.co.il> <45228C7A.5070402@ichips.intel.com> Message-ID: <20061004202219.GE9170@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: rdma_cm branch > > Michael S. Tsirkin wrote: > > Quoting r. Sean Hefty : > > > >>1. We need to add rdma_establish() and expose the rdma_conn_param values as > >>part of the connection event. I'm working on a patch for the latter. > > > > > > I have both patches as part of OFED. > > Should I post them for review? > > > > I have a patch for rdma_establish(), but please post both. Here's the rdma_establish patch from OFED. Seems to even still apply to 2.6.19. I expect just replacing the id->device->node_type test you'll get what you want for upstream. I know we don't have an in-tree user yet, but it *is* necessary for passive-side completeness, so maybe a case can be still made to have it in 2.6.19? =============================== Make it possible for ULPs on the passive side to handle RTU loss by calling rdma_establish upon completion or qp event. Signed-off-by: Sean Hefty Signed-off-by: Michael S. Tsirkin --- Index: a/include/rdma/rdma_cm.h =================================================================== --- a/include/rdma/rdma_cm.h (revision 8822) +++ a/include/rdma/rdma_cm.h (working copy) @@ -256,6 +256,16 @@ int rdma_listen(struct rdma_cm_id *id, i int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** + * rdma_establish - Forces a connection state to established. + * @id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * QP before being notified that the connection has been established by the + * RDMA CM. + */ +int rdma_establish(struct rdma_cm_id *id); + +/** * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, Index: a/drivers/infiniband/core/cm.c =================================================================== --- a/drivers/infiniband/core/cm.c (revision 8823) +++ a/drivers/infiniband/core/cm.c (working copy) @@ -3207,6 +3207,10 @@ static int cm_init_qp_rts_attr(struct cm spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { + /* Allow transition to RTS before sending REP */ + case IB_CM_REQ_RCVD: + case IB_CM_MRA_REQ_SENT: + case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: case IB_CM_REP_SENT: Index: a/drivers/infiniband/core/cma.c =================================================================== --- a/drivers/infiniband/core/cma.c (revision 8822) +++ a/drivers/infiniband/core/cma.c (working copy) @@ -840,22 +840,6 @@ static int cma_verify_rep(struct rdma_id return 0; } -static int cma_rtu_recv(struct rdma_id_private *id_priv) -{ - int ret; - - ret = cma_modify_qp_rts(&id_priv->id); - if (ret) - goto reject; - - return 0; -reject: - cma_modify_qp_err(&id_priv->id); - ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); - return ret; -} - static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; @@ -886,9 +870,8 @@ static int cma_ib_handler(struct ib_cm_i private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; break; case IB_CM_RTU_RECEIVED: - status = cma_rtu_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + case IB_CM_USER_ESTABLISHED: + event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: status = -ETIMEDOUT; /* fall through */ @@ -1981,11 +1964,25 @@ static int cma_accept_ib(struct rdma_id_ struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; - int ret; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; - ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) - return ret; + if (id_priv->id.qp) { + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + goto out; + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, &qp_attr, + &qp_attr_mask); + if (ret) + goto out; + + qp_attr.max_rd_atomic = conn_param->initiator_depth; + ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); + if (ret) + goto out; + } memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; @@ -2000,7 +1997,9 @@ static int cma_accept_ib(struct rdma_id_ rep.rnr_retry_count = conn_param->rnr_retry_count; rep.srq = id_priv->srq ? 1 : 0; - return ib_send_cm_rep(id_priv->cm_id.ib, &rep); + ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); +out: + return ret; } static int cma_send_sidr_rep(struct rdma_id_private *id_priv, @@ -2058,6 +2057,27 @@ reject: } EXPORT_SYMBOL(rdma_accept); +int rdma_establish(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + switch (id->device->node_type) { + case IB_NODE_CA: + ret = ib_cm_establish(id_priv->cm_id.ib); + break; + default: + ret = 0; + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_establish); + int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { -- MST From mst at mellanox.co.il Wed Oct 4 13:28:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 22:28:02 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <000201c6e7d6$87eb4c00$8698070a@amr.corp.intel.com> References: <20061004103359.GC5883@mellanox.co.il> <000201c6e7d6$87eb4c00$8698070a@amr.corp.intel.com> Message-ID: <20061004202802.GF9170@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: rdma_cm branch > > >> The ideal fix for this is to include rdma_conn_param as part of the > >> rdma_cm_event. > > > >BTW, wouldn't it be cleaner to just pass it up in the request event? > > Yes - this is what I meant by including it in the rdma_cm_event structure. > > >> However, this breaks every userspace app that's been coded to > >> OFED / SVN. > >> An alternative is to add another call to retrieve the data, but > >> that's not a very clean alternative for new kernel submission. > > > >Another alternative is to version the create ID call. > > Hmm... I need to think about the implementation of this more, but this sounds > like a possibility. Can you provide any details on how you're envisioning this > working? Well, I have not thought this through yet, but suppose you extend struct rdma_ucm_create_id, and check the length parameter in ucma_create_id to figure out which format was used. If length is small, you know you have userspace with old ABI. An extra field could be a userspace ABI version number, which you then carry around and use to figure out how to decode the resst of the stuff that comes from userspace. -- MST From rdreier at cisco.com Wed Oct 4 13:29:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 13:29:23 -0700 Subject: [openib-general] Infiniband Fedora Core5 In-Reply-To: <45238725.5030802@dev.mellanox.co.il> (Aviram Gutman's message of "Wed, 04 Oct 2006 12:04:21 +0200") References: <45238725.5030802@dev.mellanox.co.il> Message-ID: Aviram> No, Fedora Core 5 is not part of the OFED OS matrix. However, it is the case that FC5 alread includes very up-to-date kernel IB drivers. And at least libibverbs and libmthca are available through Fedora Extras -- so on a default FC5 install you should be able to do "yum install libibverbs libmthca" and get at least that much IB support. - R. From sean.hefty at intel.com Wed Oct 4 13:30:13 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 4 Oct 2006 13:30:13 -0700 Subject: [openib-general] rdma_cm branch In-Reply-To: <20061004202219.GE9170@mellanox.co.il> Message-ID: <000601c6e7f3$e568d600$8698070a@amr.corp.intel.com> >Here's the rdma_establish patch from OFED. >Seems to even still apply to 2.6.19. I expect just replacing the >id->device->node_type test you'll get what you want for upstream. Thanks - this matches what I have queued in my local git tree against 2.6.19. - Sean From mst at mellanox.co.il Wed Oct 4 13:37:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 22:37:29 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <4523F2A7.8090501@ichips.intel.com> References: <20061003211921.GB16503@mellanox.co.il> <4522DF3B.3020205@ichips.intel.com> <20061004061754.GB4855@mellanox.co.il> <4523F2A7.8090501@ichips.intel.com> Message-ID: <20061004203729.GG9170@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > >>There's several timeout values transfered and used by the cm, most notably the > >>remote cm response timeout and packet life time. Does it make more sense to > >>have a single, generic timeout maximum instead? > > > > Hmm. I'm not sure - we are working around an actual broken implementation here - > > what do you think? > > I wasn't sure either. The MRA timeout is a combination of the packet life time > + service timeout, which made me bring this up. The patch only handles the > service timeout portion, so we end up in the same situation if a large packet > life time is ever used. But that comes from the SA, does it not? > >>Would it make more sense to > >>enable the maximum(s) by default, since we're dependent upon values received > >>over the network? > > > > I think it would. > > So do I. > > The CM has checks to bring out of range values into range, but at the maximum, > we get a timeout of about 2.5 hours. Multiple that by 15 retries, and the cm > can literally spend all day retrying a request. > > I was considering dropping the default maximum down to around 4-8 seconds, which > with retries still gives us about a minute to timeout a request. The default > maximum would apply to local and remote cm timeouts, packet life time, and > service timeout, but could be overridden by the user. (Basically, with Ishai's > patch: rename mra_timeout_limit to timeout_limit, set to a default of 20, and > replace occurrences of '31' in the code with timeout_limit.) For remote cm timeout and service timeout this makes sense - they seem currently mostly taken out of the blue on implementations I've seen. But since the packet lifetime comes from the SM, it actually has a chance to reflect some knowledge about the network topology. And since we haven't see any practical issues with packet life time yet - maybe a different paremeter for that, with a higher limit? -- MST From mst at mellanox.co.il Wed Oct 4 13:43:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 22:43:53 +0200 Subject: [openib-general] FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: <20061004133857.GH5883@mellanox.co.il> References: <20061004133857.GH5883@mellanox.co.il> Message-ID: <20061004204353.GH9170@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: FW: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > Quoting r. Moshe Kazir : > > Subject: FW: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > > > Michael, > > > > I received the attached files from Frank. they look small , easy to understand, and change almost nothing in the code. > > > > The patch solves the ppc64 problems. > > > > Please approve the patch and integrate it into OFED-1.1-rc7. > > > > I tested it . it's working o.k. on on JS21 ppc64 sles 10, JS21 ppc64 sles9, redhat as4 u3 x86_64, redhat as4 u3 i386. > > Frank also tested it on AMD and JS21 PPC and MAC PPC64 . > > > > > > > > Best regards, > > > > Moshe > > OK, not sure what's in a tarball, but the patch looks small and safe enough to go in. > But, we need the Signed-off-by like from the patch author, certifying to > the Developer's Certificate of Origin 1.1: Please note RC7 is closing tomorrow, so we need to get the signature stuff out of the way by then if the patch's to make it in OFED 1.1. -- MST From rdreier at cisco.com Wed Oct 4 13:44:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 13:44:49 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061003211921.GB16503@mellanox.co.il> (Ishai Rabinovitz's message of "Tue, 3 Oct 2006 23:19:21 +0200") References: <20061003211921.GB16503@mellanox.co.il> Message-ID: Ishai> There is a bug in SRP Engenio target that send a large Ishai> value as service timeout. (It gets 30 which mean timeout of Ishai> (2^(30-8))=4195 sec.) Such a long timeout is not Ishai> reasonable and it may leave the kernel module waiting on Ishai> wait_for_completion and may stuck a lot of processes. OK, that's a problem, I guess... Ishai> The following patch allows the load of ib_cm module with a Ishai> limit on the timeout. ...but adding yet another knob that has to be set correctly can't be the right way to fix this. Should we just chop off too-big timeout values onconditionally? Or make Engenio fix their broken target and tell everybody to upgrade their firmware? - R. From rdreier at cisco.com Wed Oct 4 13:48:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 13:48:24 -0700 Subject: [openib-general] [PATCH] IB/SRP: Remove redundant memset of the target In-Reply-To: <20061003211459.GA16503@mellanox.co.il> (Ishai Rabinovitz's message of "Tue, 3 Oct 2006 23:14:59 +0200") References: <20061003211459.GA16503@mellanox.co.il> Message-ID: Thanks, queued for 2.6.19 From mshefty at ichips.intel.com Wed Oct 4 13:51:40 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 13:51:40 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004203729.GG9170@mellanox.co.il> References: <20061003211921.GB16503@mellanox.co.il> <4522DF3B.3020205@ichips.intel.com> <20061004061754.GB4855@mellanox.co.il> <4523F2A7.8090501@ichips.intel.com> <20061004203729.GG9170@mellanox.co.il> Message-ID: <45241EDC.1060801@ichips.intel.com> Michael S. Tsirkin wrote: > For remote cm timeout and service timeout this makes sense - they seem > currently mostly taken out of the blue on implementations I've seen. > > But since the packet lifetime comes from the SM, it actually has a chance > to reflect some knowledge about the network topology. > And since we haven't see any practical issues with packet life time yet - > maybe a different paremeter for that, with a higher limit? I guess the question is how much do we trust the timeout values sent in a CM MAD. (It's hard for me to imagine a network that requires a 2.5 hour packet life time. IB to space?) Having separate timeout values may make sense, but my expectation is for the remote cm timeout and service timeout values to be greater than the packet life time. If we go with separate values, then is there a reason not to have separate defaults for each one? My preference is to try to limit the number of values. - Sean From bugzilla-daemon at openib.org Wed Oct 4 14:02:17 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 4 Oct 2006 14:02:17 -0700 (PDT) Subject: [openib-general] [Bug 266] New: IPoIB multicast does not work with RHEL4 U4 Message-ID: <20061004210217.26E602283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=266 Summary: IPoIB multicast does not work with RHEL4 U4 Product: OpenFabrics Linux Version: 1.1rc6 Platform: All OS/Version: RHEL 4 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com I'm opening a bug on this so customers can find this info more easily. Email thread on issue: On Tue, 2006-09-19 at 14:44 +0300, Eli cohen wrote: > Hi, > > while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt() > succeeds to add a multicast group to an interface but actually the > multicast group is not added to the net_device. This means that an > application cannot join a multicast group as a full member. When I > examined the differences between the kernel sources for u3 and u4 I > noticed that essential code was removed: > > diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c > --- net/ipv4/arp.c 2006-09-18 15:35:03.000000000 +0300 > +++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c 2006-09-19 > 10:08:06.000000000 +0300 > @@ -213,9 +213,6 @@ > case ARPHRD_IEEE802_TR: > ip_tr_mc_map(addr, haddr); > return 0; > - case ARPHRD_INFINIBAND: > - ip_ib_mc_map(addr, haddr); > - return 0; > default: > if (dir) { > memcpy(haddr, dev->broadcast, dev->addr_len); > > > Can anyone suggest a workaround to this issue? Short of spinning a kernel, it's going to be hard to work around. Thanks for finding this, I'll track down how this got left out of the U4 kernel when it was in the U3 kernel :-/ -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Wed Oct 4 14:01:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 23:01:15 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <45241EDC.1060801@ichips.intel.com> References: <45241EDC.1060801@ichips.intel.com> Message-ID: <20061004210115.GB9723@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > > For remote cm timeout and service timeout this makes sense - they seem > > currently mostly taken out of the blue on implementations I've seen. > > > > But since the packet lifetime comes from the SM, it actually has a chance > > to reflect some knowledge about the network topology. > > And since we haven't see any practical issues with packet life time yet - > > maybe a different paremeter for that, with a higher limit? > > I guess the question is how much do we trust the timeout values sent in a CM > MAD. (It's hard for me to imagine a network that requires a 2.5 hour packet > life time. IB to space?) Having separate timeout values may make sense, but my > expectation is for the remote cm timeout and service timeout values to be > greater than the packet life time. > > If we go with separate values, then is there a reason not to have separate > defaults for each one? My preference is to try to limit the number of values. The way I see it, we trust e.g. the SRP target anyway. So I'm not sure there's much value in range-checking everything. The only reason we are touching this is because we see a target reporting an obviously broken service timeout value in MRA - in the hours range. So, maybe start small and just use Ishai's patch (with default set to several seconds or so), and wait with the fix till a problem surfaces? -- MST From weiny2 at llnl.gov Wed Oct 4 13:58:59 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 4 Oct 2006 13:58:59 -0700 Subject: [openib-general] ipoib question when running on the same node as opensm Message-ID: <20061004135859.25e8c03f.weiny2@llnl.gov> We just brought another cluster up and had an issue with our management node (node running opensm) not coming up on ipoib. Here is what happened and how I got it working and I had some questions. 1) We had both opensm running and a switch based Voltaire SM running. This caused problems. 2) We stopped the Voltaire SM and restarted all the nodes. This got all of the nodes except the one with opensm running to work. 3) I had to unload all the modules, load only those needed by opensm, start opensm, and then bring up the ipoib interface. At this point the node seemed to be in the multicast group and ipoib worked fine. Does this seem like proper behavior? I would think that on boot if ipoib does not find a SM running it will delay setting up a connection until the SM comes on-line? (ie when the opensm init script gets run.) It seems like the card saves some information (from the Voltaire SM) across a soft reboot? I know that it was not coming up in the multicast group with the opensm. Is this by design? At this point ipoib seems to work fine after a reboot even though the interface is brought up before opensm. Do I need to ensure that opensm is up before all ipoib requests in the future? Thanks, Ira Weiny weiny2 at llnl.gov From mst at mellanox.co.il Wed Oct 4 14:04:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 23:04:03 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: References: <20061003211921.GB16503@mellanox.co.il> Message-ID: <20061004210403.GC9723@mellanox.co.il> Quoting r. Roland Dreier : > Should we just chop off too-big timeout > values onconditionally? That's the approach we are discussing with Sean. -- MST From trimmer at silverstorm.com Wed Oct 4 14:08:13 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Wed, 4 Oct 2006 17:08:13 -0400 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004203729.GG9170@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin > Sent: Wednesday, October 04, 2006 4:37 PM > To: Sean Hefty > Cc: Ishai Rabinovitz; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout > > Quoting r. Sean Hefty : > > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > > > Michael S. Tsirkin wrote: > > >>There's several timeout values transfered and used by the cm, most > notably the > > >>remote cm response timeout and packet life time. Does it make more > sense to > > >>have a single, generic timeout maximum instead? > > > > > > Hmm. I'm not sure - we are working around an actual broken > implementation here - > > > what do you think? > > > > I wasn't sure either. The MRA timeout is a combination of the packet > life time > > + service timeout, which made me bring this up. The patch only handles > the > > service timeout portion, so we end up in the same situation if a large > packet > > life time is ever used. > > But that comes from the SA, does it not? > > > >>Would it make more sense to > > >>enable the maximum(s) by default, since we're dependent upon values > received > > >>over the network? > > > > > > I think it would. > > > > So do I. > > > > The CM has checks to bring out of range values into range, but at the > maximum, > > we get a timeout of about 2.5 hours. Multiple that by 15 retries, and > the cm > > can literally spend all day retrying a request. > > > > I was considering dropping the default maximum down to around 4-8 > seconds, which > > with retries still gives us about a minute to timeout a request. The > default > > maximum would apply to local and remote cm timeouts, packet life time, > and > > service timeout, but could be overridden by the user. (Basically, with > Ishai's > > patch: rename mra_timeout_limit to timeout_limit, set to a default of > 20, and > > replace occurrences of '31' in the code with timeout_limit.) > > For remote cm timeout and service timeout this makes sense - they seem > currently mostly taken out of the blue on implementations I've seen. > > But since the packet lifetime comes from the SM, it actually has a chance > to reflect some knowledge about the network topology. > And since we haven't see any practical issues with packet life time yet - > maybe a different paremeter for that, with a higher limit? > > -- I recommend sticking with the IB spec for the various timeouts. In our products we carefully implemented the timeouts and computations as defined by the spec. The SM controls the pkt lifetime and should base it on a knowledge of the fabric topology and configuration. Many of the CA specific base timers are specific to the HCA/TCA itself (hence we provided this information as part of queries to the CA verbs driver). We permitted configuration in the individual verbs drivers to override the "reasonable estimates" which we provided as defaults for each HCA model we support. It's a little tricky to work out the details defined in the spec (a summary section on timers would have made it easier), however I did that effort a few years ago and here is a summary of all the HCA/TCA related IB timers below. Notice many of these must be "uncomputed" from information in the CM REQ and REP to get the base level values (such as pkt lifetime which is not directly specified in CM REQ): 3.1 Base Timers CA Ack Delay - time from Receipt of IB transport packet to sending of ACK. Hardware and VlArb dependent. CA inbound processing time - time from receipt of IB transport packet to delivery and processing in CA's transport state machine. Hardware dependent. CA outbound processing time - time from entry of packet to QP until transmit packet on wire. hardware and VlArb dependent. Class turnaround time(class) - processing time from delivery of request on QP to posting of response on QP 3.2 Derived Timers Ack Timeout - timeout for QP ACK/NAK before QP resends up to RetryCount = 2*(PktLifeTime)+Remote CA Ack Delay + local CA inbound processing Time RNR NAK Delay - Appl protocol must be prepared to replenish Recv Q of QP within RNR NAK Delay + 2*(PktLifeTime), can set this to low bound and RNRNakDelay*RNRRetryLimit must be > upper bound PortInfo:SubnetTimeout = max(PktLifeTime for all pathsRecords within subnet) PortInfo:RespTimeout - SMA max time between receipt to response within Node, includes CA delays in receive and Send. = ClassTurnaroundTime(SMA) + CA inbound (QP0) + CA outbound (QP0) ClassPortInfo:RespTimeout- GSA class max time between receipt to response within Node, includes CA delays in receive and Send. = ClassTurnaroundTime(class) + CA inbound (QP1) + CA outbound (QP1) PathRecord:PacketLifeTime - reasonable estimate of worst case time through path for packet to traverse fabric in 1 direction. 0 if loopback path from port to itself (CA inbound/outbound and/or ACK delay values should cover) LocalAckTimeout - QP/CM - 2*PathRecord:PktLifeTime + local CA Ack Delay QP:AckTimeout - use 2*PathRecord:PktLifeTime + remote CA Ack Delay Remote CM Resp Timeout - CM - CM server REQ response time (should be based on Get(ClassPortInfo) for CM against remote CM) Local CM Resp Timeout - 2*PathRecord:PktLifetime + client REP response time CM MRA Service Timeout - anticipated maximum time before sender of the MRA will send the actual CM response message (REP, RTU, APR or REJ). Recipient of MRA should wait Service Timeout + packet lifetime before timing out. Note this value is subjective in nature and may depend on load on the server, performance of the application, etc. In our stack we heuristically computed a pseudo average (weighted toward longer timeouts) with configurable min/max. We also permitted the application to adjust the min/max for a given CEP. It was important that the MRA sending be issued at a low level since if the application is too busy to respond to the REQ, REP, etc; its probably also too busy to compose an MRA. We found that proper implementation of MRA was critical for high stress CM situations, such as startup of a large MPI run or Oracle's uDAPL based stress tests which made thousands of simultaneous connections. Subnet Timeout = max(Path Record Packet Lifetime) Todd Rimmer From mst at mellanox.co.il Wed Oct 4 14:08:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 23:08:19 +0200 Subject: [openib-general] ipoib question when running on the same node as opensm In-Reply-To: <20061004135859.25e8c03f.weiny2@llnl.gov> References: <20061004135859.25e8c03f.weiny2@llnl.gov> Message-ID: <20061004210819.GD9723@mellanox.co.il> Quoting r. Ira Weiny : > Do I need to ensure that opensm is up before all > ipoib requests in the future? Shouldn't be required, thing work well for me, anyway. -- MST From mst at mellanox.co.il Wed Oct 4 14:17:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 23:17:57 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: References: Message-ID: <20061004211757.GF9723@mellanox.co.il> Quoting r. Rimmer, Todd : > I recommend sticking with the IB spec for the various timeouts. So what do you suggest, wait a day or so to timeout the MRA? -- MST From mshefty at ichips.intel.com Wed Oct 4 14:34:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 14:34:33 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004210115.GB9723@mellanox.co.il> References: <45241EDC.1060801@ichips.intel.com> <20061004210115.GB9723@mellanox.co.il> Message-ID: <452428E9.3000306@ichips.intel.com> Michael S. Tsirkin wrote: > The way I see it, we trust e.g. the SRP target anyway. > So I'm not sure there's much value in range-checking everything. > The only reason we are touching this is because we see a > target reporting an obviously broken service timeout value in MRA - > in the hours range. The CM is also exposed into userspace, so I think this issue is highlighting a larger potential problem. I'm a little hesitant to add precedence that we want to reduce large timeout values by adding separate timer constraints exposed through module parameters. - Sean From rdreier at cisco.com Wed Oct 4 14:45:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 14:45:02 -0700 Subject: [openib-general] [PATCH fixed] IB/srp: enable multiple connections to the same target In-Reply-To: <20061004132856.GA7224@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 4 Oct 2006 15:28:56 +0200") References: <20061004121852.GE32010@mellanox.co.il> <20061004132856.GA7224@mellanox.co.il> Message-ID: Thanks, queued for 2.6.19 From rdreier at cisco.com Wed Oct 4 14:50:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 14:50:20 -0700 Subject: [openib-general] [PATCH repost] IB/mthca: query port fix In-Reply-To: <20061004115634.GF5883@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 4 Oct 2006 13:56:34 +0200") References: <20061004115634.GF5883@mellanox.co.il> Message-ID: thanks, queued for 2.6.19 BTW when forwarding patches (I assume this one was from Jack) please include an extra "From:" line so I get the write author when I import it back into git... From rdreier at cisco.com Wed Oct 4 14:54:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 14:54:25 -0700 Subject: [openib-general] [PATCH 2/2] 2.6.19 ib_cm: send DREP in response to unmatched DREQ In-Reply-To: <000501c6e7e4$236d1930$8698070a@amr.corp.intel.com> (Sean Hefty's message of "Wed, 4 Oct 2006 11:37:25 -0700") References: <000501c6e7e4$236d1930$8698070a@amr.corp.intel.com> Message-ID: Thanks, queued 1 & 2 for 2.6.19 From mst at mellanox.co.il Wed Oct 4 14:55:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 23:55:05 +0200 Subject: [openib-general] [PATCH repost] IB/mthca: query port fix In-Reply-To: References: Message-ID: <20061004215505.GH9723@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH repost] IB/mthca: query port fix > > thanks, queued for 2.6.19 > > BTW when forwarding patches (I assume this one was from Jack) please > include an extra "From:" line so I get the write author when I import > it back into git... > Right, missed that, sorry. This one was from Jack, pls fix it up accordingly. -- MST From mst at mellanox.co.il Wed Oct 4 14:56:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 4 Oct 2006 23:56:57 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <452428E9.3000306@ichips.intel.com> References: <452428E9.3000306@ichips.intel.com> Message-ID: <20061004215657.GI9723@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > > The way I see it, we trust e.g. the SRP target anyway. > > So I'm not sure there's much value in range-checking everything. > > The only reason we are touching this is because we see a > > target reporting an obviously broken service timeout value in MRA - > > in the hours range. > > The CM is also exposed into userspace, so I think this issue is highlighting a > larger potential problem. I'm a little hesitant to add precedence that we want > to reduce large timeout values by adding separate timer constraints exposed > through module parameters. So, let's just have a #define for now? And maybe print a warning so we can figure out what's wrong ... -- MST From rdreier at cisco.com Wed Oct 4 15:02:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 15:02:37 -0700 Subject: [openib-general] [PATCH repost] IB/mthca: query port fix In-Reply-To: <20061004215505.GH9723@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 4 Oct 2006 23:55:05 +0200") References: <20061004215505.GH9723@mellanox.co.il> Message-ID: Michael> Right, missed that, sorry. This one was from Jack, pls Michael> fix it up accordingly. No problem -- I already fixed it up. In the continuum of messed up patches, missing "From:" lines are at the good end of things, since I can just do "stg refresh -a 'foo'" to fix it up. It's nowhere near as bad as something like trailing whitespace (gasp)... - R. From mshefty at ichips.intel.com Wed Oct 4 15:18:06 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 15:18:06 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004215657.GI9723@mellanox.co.il> References: <452428E9.3000306@ichips.intel.com> <20061004215657.GI9723@mellanox.co.il> Message-ID: <4524331E.6030109@ichips.intel.com> Michael S. Tsirkin wrote: > So, let's just have a #define for now? And maybe print a warning so we can > figure out what's wrong ... That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes with retries?) Having the maximum apply at least to remote CM timeout + service timeout would be good. (It appears that Intel MPI just hit into this issue after setting the remote CM timeout to 31.) - Sean From mst at mellanox.co.il Wed Oct 4 15:28:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Oct 2006 00:28:15 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <4524331E.6030109@ichips.intel.com> References: <4524331E.6030109@ichips.intel.com> Message-ID: <20061004222815.GL9723@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > > So, let's just have a #define for now? And maybe print a warning so we can > > figure out what's wrong ... > > That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes > with retries?) Having the maximum apply at least to remote CM timeout + service > timeout would be good. (It appears that Intel MPI just hit into this issue > after setting the remote CM timeout to 31.) OK. Patch? -- MST From mshefty at ichips.intel.com Wed Oct 4 15:38:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 04 Oct 2006 15:38:30 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004222815.GL9723@mellanox.co.il> References: <4524331E.6030109@ichips.intel.com> <20061004222815.GL9723@mellanox.co.il> Message-ID: <452437E6.8010702@ichips.intel.com> Michael S. Tsirkin wrote: >>That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes >>with retries?) Having the maximum apply at least to remote CM timeout + service >>timeout would be good. (It appears that Intel MPI just hit into this issue >>after setting the remote CM timeout to 31.) > > > OK. Patch? From me or you? I can probably throw something together tomorrow. From mst at mellanox.co.il Wed Oct 4 15:45:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Oct 2006 00:45:33 +0200 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <452437E6.8010702@ichips.intel.com> References: <452437E6.8010702@ichips.intel.com> Message-ID: <20061004224533.GN9723@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [PATCH] IB_CM: Limit the MRA timeout > > Michael S. Tsirkin wrote: > >>That sounds simple enough for now. (maybe set to 21 = 8 seconds = 2 minutes > >>with retries?) Having the maximum apply at least to remote CM timeout + service > >>timeout would be good. (It appears that Intel MPI just hit into this issue > >>after setting the remote CM timeout to 31.) > > > > > > OK. Patch? > > From me or you? I can probably throw something together tomorrow. > Pls go ahead then. -- MST From sweitzen at cisco.com Wed Oct 4 15:59:33 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 4 Oct 2006 15:59:33 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: I see it for all MVAPICH tests, it's 100% consistent. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > Sent: Tuesday, October 03, 2006 3:37 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Aviram Gutman; OpenFabricsEWG; openib > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > Hi Scott, > Unfortunately was not able to reproduce the failure on our platforms. > Do you see the problem with all tests or with the specific only ? > Is it consistent problem ? > > Regards, > Pasha > > Scott Weitzenkamp (sweitzen) wrote: > > $ uname -a > > Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 > 18:25:39 UTC 2006 > > x86_64 > > x86_64 x86_64 GNU/Linux > > $ > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > > 192.168.2.46 192.168.2.49 hostname > > svbu-qa1850-4 > > svbu-qa1850-3 > > $ > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > > 192.168.2.46 192.168.2.49 > > > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench marks-2.2/ > > osu_latency > > > > The last command just hangs. Can I try your binary RPMs? > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > >> -----Original Message----- > >> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > >> Sent: Sunday, October 01, 2006 2:29 AM > >> To: Scott Weitzenkamp (sweitzen) > >> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >> OFED 1.1 rc6 with SLES10 x86_64 > >> > >> Can you please elaborate on MVAPICH issues, can you send > >> command line? > >> We ran it here on 32 Opteron nodes each quad core and also > rigorous > >> tests on the many other nodes? > >> > >> > >> > >> Scott Weitzenkamp (sweitzen) wrote: > >>> We are just getting started with OFED testing on SLES10, first > >>> platform is x86_64. > >>> > >>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > >> working so far. > >>> MVAPICH with OSU benchmarks just hang. This same > hardware works > >>> fine with OFED and RHEL4 U3. > >>> > >>> Has anyone else seen this? > >>> > >>> Scott Weitzenkamp > >>> SQA and Release Manager > >>> Server Virtualization Business Unit > >>> Cisco Systems > >>> > >>> > >> -------------------------------------------------------------- > >> ---------- > >>> _______________________________________________ > >>> openfabrics-ewg mailing list > >>> openfabrics-ewg at openib.org > >>> http://openib.org/mailman/listinfo/openfabrics-ewg > >>> > > > > > -- > Pavel Shamis (Pasha) > Software Engineer > Mellanox Technologies LTD. > pasha at mellanox.co.il > From rdreier at cisco.com Wed Oct 4 16:52:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 16:52:24 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: <452354D8.6040903@pathscale.com> (Robert Walsh's message of "Tue, 03 Oct 2006 23:29:44 -0700") References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <452354D8.6040903@pathscale.com> Message-ID: I tried loading ib_ipath on one of my systems without an ipath device, and I got the message ib_ipath: Could not create class_dev for minor 127, ipath_diagpkt (err 19) but I couldn't reproduce the hang of modprobe. Anyway, taking a quick look at what caused that message showed that the problem is that ipath_class is NULL there. And that makes sense, because ipath_class isn't created until ipath_user_add() is called in ipath_init_one() (which is never called if there are no devices). So I think a correct fix is to move any global initialization like creating ipath_class into the module_init function before probing any devices. I don't approve of this latest patch because you do this: > -int __init ipath_diagpkt_add(void) > -{ > - return ipath_cdev_init(IPATH_DIAGPKT_MINOR, > - "ipath_diagpkt", &diagpkt_file_ops, > - &diagpkt_cdev, &diagpkt_class_dev); > -} > +void ipath_diagpkt_add(void) > +{ > + if (atomic_inc_return(&diagpkt_count) == 1) > + ipath_cdev_init(IPATH_DIAGPKT_MINOR, > + "ipath_diagpkt", &diagpkt_file_ops, > + &diagpkt_cdev, &diagpkt_class_dev); > +} which means that you've stopped checking the return value of ipath_cdev_init(). What will happen if it fails? I would guess everything blows up, right? - R. From rjwalsh at pathscale.com Wed Oct 4 19:07:31 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 04 Oct 2006 19:07:31 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <452354D8.6040903@pathscale.com> Message-ID: <452468E3.1000902@pathscale.com> Roland Dreier wrote: > I tried loading ib_ipath on one of my systems without an ipath device, > and I got the message > > ib_ipath: Could not create class_dev for minor 127, ipath_diagpkt (err 19) > > but I couldn't reproduce the hang of modprobe. This was without the patch, though, right? Cause if you've applied the patch and you're still getting this message, I'm confused. > Anyway, taking a quick look at what caused that message showed that > the problem is that ipath_class is NULL there. And that makes sense, > because ipath_class isn't created until ipath_user_add() is called in > ipath_init_one() (which is never called if there are no devices). Ah. Well spotted. > So I think a correct fix is to move any global initialization like > creating ipath_class into the module_init function before probing any > devices. I don't approve of this latest patch because you do this: > > > -int __init ipath_diagpkt_add(void) > > -{ > > - return ipath_cdev_init(IPATH_DIAGPKT_MINOR, > > - "ipath_diagpkt", &diagpkt_file_ops, > > - &diagpkt_cdev, &diagpkt_class_dev); > > -} > > > +void ipath_diagpkt_add(void) > > +{ > > + if (atomic_inc_return(&diagpkt_count) == 1) > > + ipath_cdev_init(IPATH_DIAGPKT_MINOR, > > + "ipath_diagpkt", &diagpkt_file_ops, > > + &diagpkt_cdev, &diagpkt_class_dev); > > +} > > which means that you've stopped checking the return value of > ipath_cdev_init(). What will happen if it fails? I would guess > everything blows up, right? Sigh. Yeah. Code go boom. I'll roll it again. We had been ignoring the failure anyway before this patch. I'll just make sure we do a dev_warn and atomic_dec if ipath_cdev_init fails. Michael: do we have much longer left for RC7? As I said before, it's no skin of my nose if you want to roll RC7 with my hacky workaround patch in place, or even without that. But if you're still working on other stuff, I can probably get a "proper" fix in tonight. Regards, Robert. From rjwalsh at pathscale.com Wed Oct 4 19:16:25 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 04 Oct 2006 19:16:25 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: <452468E3.1000902@pathscale.com> References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <452354D8.6040903@pathscale.com> <452468E3.1000902@pathscale.com> Message-ID: <45246AF9.1070503@pathscale.com> > Sigh. Yeah. Code go boom. I'll roll it again. We had been ignoring > the failure anyway before this patch. I'll just make sure we do a > dev_warn and atomic_dec if ipath_cdev_init fails. Scrub that - I'm going to roll all this into ipath_diag_add and make it all a bit simpler. I'll send out a new patch tomorrow afternoon, after some testing. Regards, Robert. From rdreier at cisco.com Wed Oct 4 22:12:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 04 Oct 2006 22:12:59 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: <452468E3.1000902@pathscale.com> (Robert Walsh's message of "Wed, 04 Oct 2006 19:07:31 -0700") References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <452354D8.6040903@pathscale.com> <452468E3.1000902@pathscale.com> Message-ID: Robert> This was without the patch, though, right? Cause if Robert> you've applied the patch and you're still getting this Robert> message, I'm confused. Yes, I was trying to debug the root cause of the problem, so I was just running the mainline kernel. - R. From kliteyn at dev.mellanox.co.il Wed Oct 4 23:46:34 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 05 Oct 2006 08:46:34 +0200 Subject: [openib-general] [PATCH] osm: fixing some comments Message-ID: Hi Hal Just a couple of wrong comments that I found while reading the code. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: include/opensm/osm_port.h =================================================================== --- include/opensm/osm_port.h (revision 9722) +++ include/opensm/osm_port.h (working copy) @@ -973,8 +973,7 @@ osm_physp_unlink( * [in] Pointer to the adjacent osm_physp_t object to link. * * RETURN VALUES -* Returns a pointer to the Physical Port on the other side of -* the wire. A return value of NULL means there is no link at this port. +* None. * * NOTES * Index: include/opensm/osm_port_info_rcv.h =================================================================== --- include/opensm/osm_port_info_rcv.h (revision 9722) +++ include/opensm/osm_port_info_rcv.h (working copy) @@ -261,7 +261,7 @@ void osm_pi_rcv_process( * that contains the node's PortInfo attribute. * * RETURN VALUES -* CL_SUCCESS if the PortInfo processing was successful. +* None. * * NOTES * This function processes a PortInfo attribute. From mst at mellanox.co.il Wed Oct 4 23:58:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Oct 2006 08:58:50 +0200 Subject: [openib-general] rdma_cm branch In-Reply-To: <000601c6e7f3$e568d600$8698070a@amr.corp.intel.com> References: <20061004202219.GE9170@mellanox.co.il> <000601c6e7f3$e568d600$8698070a@amr.corp.intel.com> Message-ID: <20061005065850.GA6711@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: rdma_cm branch > > >Here's the rdma_establish patch from OFED. > >Seems to even still apply to 2.6.19. I expect just replacing the > >id->device->node_type test you'll get what you want for upstream. > > Thanks - this matches what I have queued in my local git tree against 2.6.19. BTW, you don't support pushing this into 2.6.19 by any chance? -- MST From moshek at voltaire.com Wed Oct 4 23:59:44 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 5 Oct 2006 08:59:44 +0200 Subject: [openib-general] FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: Michael, In case you missed Frank's signature, Look at the attached message Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Wednesday, October 04, 2006 10:44 PM To: Moshe Kazir Cc: Tseng-hui Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Michael S. Tsirkin : > Subject: Re: FW: [openib-general] FW: Mstflint - not working on ppc64 > andwhendriver is not loaded on AMD > > Quoting r. Moshe Kazir : > > Subject: FW: [openib-general] FW: Mstflint - not working on ppc64 > > andwhendriver is not loaded on AMD > > > > Michael, > > > > I received the attached files from Frank. they look small , easy to > > understand, and change almost nothing in the code. > > > > The patch solves the ppc64 problems. > > > > Please approve the patch and integrate it into OFED-1.1-rc7. > > > > I tested it . it's working o.k. on on JS21 ppc64 sles 10, JS21 > > ppc64 sles9, redhat as4 u3 x86_64, redhat as4 u3 i386. > > Frank also tested it on AMD and JS21 PPC and MAC PPC64 . > > > > > > > > Best regards, > > > > Moshe > > OK, not sure what's in a tarball, but the patch looks small and safe > enough to go in. But, we need the Signed-off-by like from the patch > author, certifying to the Developer's Certificate of Origin 1.1: Please note RC7 is closing tomorrow, so we need to get the signature stuff out of the way by then if the patch's to make it in OFED 1.1. -- MST -------------- next part -------------- An embedded message was scrubbed... From: "Tseng-Hui (Frank) Lin" Subject: [openib-general] [PATCH] mstflint not working on ppc64 and when driver is not loaded on AMD Date: Wed, 4 Oct 2006 18:26:41 +0200 Size: 9750 URL: From mst at mellanox.co.il Thu Oct 5 00:06:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Oct 2006 09:06:04 +0200 Subject: [openib-general] FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD In-Reply-To: References: Message-ID: <20061005070604.GB6711@mellanox.co.il> Quoting r. Moshe Kazir : > Subject: Re: FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD > > Michael, > > In case you missed Frank's signature, > > Look at the attached message Got that, no problem, I think it's fine for RC7. So there's a work around. Could we go deeper into the driver loaded/unloaded issue though? It looks like another kernel bug and it'd be nice to fix it. Do you know the root cause? If not, cold you pls describe the symptoms and on what systems they occur? -- MST From eitan at mellanox.co.il Thu Oct 5 00:20:10 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 05 Oct 2006 09:20:10 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack In-Reply-To: <1159958444.4502.35784.camel@hal.voltaire.com> References: <1159912150.4502.10146.camel@hal.voltaire.com> <1159958444.4502.35784.camel@hal.voltaire.com> Message-ID: <4524B22A.5040403@mellanox.co.il> Hal Rosenstock wrote: >Hi Eitan, > >On Tue, 2006-10-03 at 17:49, Hal Rosenstock wrote: > > >>Hi Eitan, >> >>Aside from the varargs handling (relative to 2 patches) and the >>osmtest.c question, >> >> > >osmtest.c patch has been applied. > >The question is more general as to why the casts were needed for >Windows. > > > Ohh sorry, missed that. The answer is that the DDK used for compiling opensm on WinIB is particularly aggressive on casting issues like this and fails the compilation. >-- Hal > > > >> also pending is a patch to remove the WIN defines >>just added in multiple places and move them to config.h for the Windows >>build ? Can you/when can you prepare a patch for this ? >> >>Thanks! >> >>-- Hal >> >> >>_______________________________________________ >>openib-general mailing list >>openib-general at openib.org >>http://openib.org/mailman/listinfo/openib-general >> >>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> >> >> > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Thu Oct 5 01:22:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Oct 2006 10:22:06 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack In-Reply-To: <4524B22A.5040403@mellanox.co.il> References: <1159912150.4502.10146.camel@hal.voltaire.com> <1159958444.4502.35784.camel@hal.voltaire.com> <4524B22A.5040403@mellanox.co.il> Message-ID: <20061005082206.GB7419@mellanox.co.il> Quoting r. Eitan Zahavi : > The answer is that the DDK used for compiling opensm on WinIB is > particularly aggressive on casting issues like this and fails the > compilation. AFAIK, DDK CL.EXE has a flag to give agressive warnings on potential "64 bit portability issues". http://msdn2.microsoft.com/en-us/library/yt4xw8fh.aspx /Wp64 - Detects 64-bit portability problems /Wp64 is off by default in the Visual C++ 32-bit compiler and on by default in the Visual C++ 64-bit compiler. If you regularly compile your application with a 64-bit compiler, you may want to disable /Wp64 in your 32-bit compilations, as the 64-bit compiler will detect all issues. CL.EXE seems to classify any conversion between types of different size as a potential "64 bit portability issue". I think that you also compiling with a flag which turns these warnings into errors: http://msdn2.microsoft.com/en-us/library/thxezb7y.aspx /WX Treats all compiler warnings as errors. For a new project, it may be best to use /WX in all compilations This flag is off by default. It might be easier for you, in the future, to just turn it off than waste time fixing the warnings. -- MST From moshek at voltaire.com Thu Oct 5 01:25:53 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 5 Oct 2006 10:25:53 +0200 Subject: [openib-general] FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Message-ID: > So there's a work around. > > Could we go deeper into the driver loaded/unloaded issue though? It looks like another kernel bug and it'd be nice to fix it. Do you know the root cause? If not, > cold you pls describe the symptoms and on what systems they occur? I'll try to understand the "mstflint not working when driver is not loaded" problem reported by Or and see how to go on. Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Thursday, October 05, 2006 9:06 AM To: Moshe Kazir Cc: Tseng-hui Lin; openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: FW: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD Quoting r. Moshe Kazir : > Subject: Re: FW: FW: Mstflint - not working on ppc64 andwhendriver is > not loaded on AMD > > Michael, > > In case you missed Frank's signature, > > Look at the attached message Got that, no problem, I think it's fine for RC7. So there's a work around. Could we go deeper into the driver loaded/unloaded issue though? It looks like another kernel bug and it'd be nice to fix it. Do you know the root cause? If not, cold you pls describe the symptoms and on what systems they occur? -- MST From moshek at voltaire.com Thu Oct 5 01:32:52 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Thu, 5 Oct 2006 10:32:52 +0200 Subject: [openib-general] mpitests-2.0-0.src.rpm compile error on ppc64 sles10 js21 Message-ID: Any one saw this error ? Moshe /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpicc -I/usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/include -DMPI1 -O -g -c IMB_cpu_exploit.c /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpicc -o IMB-MPI1 IMB.o IMB_declare.o IMB_init.o IMB_mem_manager.o IMB_parse_name_mpi1.o IMB_benchlist.o IMB_strgs.o IMB_err_handler.o IMB_g_info.o IMB_warm_up.o IMB_output.o IMB_pingpong.o IMB_pingping.o IMB_allreduce.o IMB_reduce_scatter.o IMB_reduce.o IMB_exchange.o IMB_bcast.o IMB_barrier.o IMB_allgather.o IMB_allgatherv.o IMB_alltoall.o IMB_sendrecv.o IMB_init_transfer.o IMB_chk_diff.o IMB_cpu_exploit.o -L/usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/lib/shared -L/usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/lib -L/var/tmp/OFED//usr/local/lib64 -L/var/tmp/OFED//usr/local/lib /usr/bin/ld: skipping incompatible /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/libmpi.so when searching for -lmpi /usr/bin/ld: cannot find -lmpi collect2: ld returned 1 exit status make[2]: *** [MPI1] Error 1 make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/mpitests-2.0/IMB-2.3/src' make[1]: *** [IMB-MPI1] Error 2 make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/mpitests-2.0/IMB-2.3/src' make: *** [pmb] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.81774 (%install) RPM build errors: user pasha does not exist - using root user pasha does not exist - using root Bad exit status from /var/tmp/rpm-tmp.81774 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_name mpitests_openmpi_gcc' --define 'path_to_mpihome /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1' --define 'root_path /var/tmp/OFED' /tmp/GridStack-4.1.1_OFED_1.1_rc6_js21/OFED-1.1-rc6/SRPMS/mpitests-2.0-0 .src.rpm" ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Thu Oct 5 02:24:42 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 5 Oct 2006 02:24:42 -0700 (PDT) Subject: [openib-general] [Bug 266] IPoIB multicast does not work with RHEL4 U4 Message-ID: <20061005092442.6C33E2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=266 ------- Comment #1 from ogerlitz at voltaire.com 2006-10-05 02:24 ------- > default: > if (dir) { > memcpy(haddr, dev->broadcast, dev->addr_len); > Looking on the code, what going on here is that as the ARPHRD_INFINIBAND case does not exist, the default case is executed which sets the link layer multicast address to be the device broadcast address. So the IP Multicast traffic is actually running over the broadcast group. Or. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From vlad at dev.mellanox.co.il Thu Oct 5 02:41:33 2006 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 05 Oct 2006 11:41:33 +0200 Subject: [openib-general] [openfabrics-ewg] mpitests-2.0-0.src.rpm compile error on ppc64 sles10 js21 In-Reply-To: References: Message-ID: <1160041293.20258.14.camel@mtlsws13.yok.mtl.com> Hi Moshe, I got the same error on SLES10 PPC64. It was already fixed in install.sh script. Please take the updated version of install.sh from https://openib.org/svn/gen2/branches/1.1/ofed/scripts/ Regards, Vladimir On Thu, 2006-10-05 at 10:32 +0200, Moshe Kazir wrote: > Any one saw this error ? > > > > Moshe > > > > > > /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpicc > -I/usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/include -DMPI1 -O -g -c > IMB_cpu_exploit.c > /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/bin/mpicc -o IMB-MPI1 IMB.o > IMB_declare.o IMB_init.o IMB_mem_manager.o IMB_parse_name_mpi1.o > IMB_benchlist.o IMB_strgs.o IMB_err_handler.o IMB_g_info.o > IMB_warm_up.o IMB_output.o IMB_pingpong.o IMB_pingping.o > IMB_allreduce.o IMB_reduce_scatter.o IMB_reduce.o IMB_exchange.o > IMB_bcast.o IMB_barrier.o IMB_allgather.o IMB_allgatherv.o > IMB_alltoall.o IMB_sendrecv.o IMB_init_transfer.o IMB_chk_diff.o > IMB_cpu_exploit.o > -L/usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/lib/shared > -L/usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/lib > -L/var/tmp/OFED//usr/local/lib64 -L/var/tmp/OFED//usr/local/lib > /usr/bin/ld: skipping > incompatible /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/lib64/libmpi.so > when searching for -lmpi > /usr/bin/ld: cannot find -lmpi > collect2: ld returned 1 exit status > make[2]: *** [MPI1] Error 1 > make[2]: Leaving directory > `/var/tmp/OFEDRPM/BUILD/mpitests-2.0/IMB-2.3/src' > make[1]: *** [IMB-MPI1] Error 2 > make[1]: Leaving directory > `/var/tmp/OFEDRPM/BUILD/mpitests-2.0/IMB-2.3/src' > make: *** [pmb] Error 2 > error: Bad exit status from /var/tmp/rpm-tmp.81774 (%install) > > > RPM build errors: > user pasha does not exist - using root > user pasha does not exist - using root > Bad exit status from /var/tmp/rpm-tmp.81774 (%install) > ERROR: Failed executing "rpmbuild --rebuild --define > '_topdir /var/tmp/OFEDRPM' --define '_name mpitests_openmpi_gcc' > --define 'path_to_mpihome /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1' > --define > 'root_path /var/tmp/OFED' /tmp/GridStack-4.1.1_OFED_1.1_rc6_js21/OFED-1.1-rc6/SRPMS/mpitests-2.0-0.src.rpm" > > > > > ____________________________________________________________ > > Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) > > > > Voltaire – The Grid Backbone > > > > www.voltaire.com > > > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Thu Oct 5 03:18:03 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 5 Oct 2006 03:18:03 -0700 (PDT) Subject: [openib-general] [Bug 266] IPoIB multicast does not work with RHEL4 U4 Message-ID: <20061005101803.99A152283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=266 ------- Comment #2 from eli at mellanox.co.il 2006-10-05 03:18 ------- Created an attachment (id=51) --> (http://openib.org/bugzilla/attachment.cgi?id=51&action=view) fix mcat problem on rh u4 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Thu Oct 5 03:48:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Oct 2006 06:48:03 -0400 Subject: [openib-general] [PATCH] osm: fixing some comments In-Reply-To: References: Message-ID: <1160045281.4502.84200.camel@hal.voltaire.com> On Thu, 2006-10-05 at 02:46, Yevgeny Kliteynik wrote: > Hi Hal > > Just a couple of wrong comments that I found while reading the code. > > Yevgeny Thanks. Applied. -- Hal From eitan at mellanox.co.il Thu Oct 5 05:12:30 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 05 Oct 2006 14:12:30 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack In-Reply-To: <20061005082206.GB7419@mellanox.co.il> References: <1159912150.4502.10146.camel@hal.voltaire.com> <1159958444.4502.35784.camel@hal.voltaire.com> <4524B22A.5040403@mellanox.co.il> <20061005082206.GB7419@mellanox.co.il> Message-ID: <4524F6AE.5060905@mellanox.co.il> Michael S. Tsirkin wrote: >Quoting r. Eitan Zahavi : > > >>The answer is that the DDK used for compiling opensm on WinIB is >>particularly aggressive on casting issues like this and fails the >>compilation. >> >> > >AFAIK, DDK CL.EXE has a flag to give agressive warnings on potential "64 >bit portability issues". > >http://msdn2.microsoft.com/en-us/library/yt4xw8fh.aspx > > /Wp64 - Detects 64-bit portability problems > /Wp64 is off by default in the Visual C++ 32-bit compiler and on by > default in the Visual C++ 64-bit compiler. > If you regularly compile your application with a 64-bit compiler, you > may want to disable /Wp64 in your 32-bit compilations, as the 64-bit > compiler will detect all issues. > > This makes sense. As we do compile for both platforms we can turn it off. >CL.EXE seems to classify any conversion between types of different size as >a potential "64 bit portability issue". >I think that you also compiling with a flag which turns these warnings >into errors: > >http://msdn2.microsoft.com/en-us/library/thxezb7y.aspx > /WX Treats all compiler warnings as errors. For a new project, it may be > best to use /WX in all compilations > >This flag is off by default. >It might be easier for you, in the future, to just turn it off than waste time >fixing the warnings. > > I will probably keep this on to catch other issues. From johnt1johnt2 at gmail.com Thu Oct 5 05:18:31 2006 From: johnt1johnt2 at gmail.com (john t) Date: Thu, 5 Oct 2006 17:48:31 +0530 Subject: [openib-general] Multi-port HCA In-Reply-To: References: Message-ID: Hi Bernard, I had a configuration issue. I fixed it and now I get same BW (i.e. around 10 Gb/sec) on each port provided I use ports on different HCA cards. If I use two ports of the same HCA card then BW gets divided between these two ports. I am using Mellanox HCA cards and doing simple send/recv using uverbs. Do you think it could be an issue with Mallanox driver or could it be due to system/PCI-E limitation. Regards, John T. On 10/3/06, Bernard King-Smith wrote: > > > John, > > Who's adapter (manufacturer) are you using? It is usually an adapter > implementation or driver issue that occures when you cannot scale across > multiple links. The fact that you don't scale up from one link, but it > appears they share a fixed bandwidth across N links means that there is a > driver or stack issue. At one time I think that IPoIB and maybe other IB > drivers used only one event queue across multiple links which would be a > bottleneck. We added code in the IBM EHCA driver to get round this > bottleneck. > > Are your measurements using MPI or IP. Are you using separate > tasks/sockets per link and using different subnets if using IP? > > Bernie King-Smith > IBM Corporation > Server Group > Cluster System Performance > wombat2 at us.ibm.com (845)433-8483 > Tie. 293-8483 or wombat2 on NOTES > > "We are not responsible for the world we are born into, only for the world > we leave when we die. > So we have to accept what has gone before us and work to change the only > thing we can, > -- The Future." William Shatner > > john t" < johnt1johnt2 at gmail.com> wrote on 10/03/2006 09:42:24 AM: > > > > Hi, > > > > I have two HCA cards, each having two ports and each connected to a > > separate PCI-E x8 slot. > > > > Using one HCA port I get end to end BW of 11.6 Gb/sec (uni-direction > RDMA). > > If I use two ports of the same HCA or different HCA, I get between 5 > > to 6.5 Gb/sec point-to-point BW on each port. BW on each port > > further reduces if I use more ports. I am not able to understand > > this behaviour. Is there any limitation on max. BW that a system can > > provide? Does the available BW get divided among multiple HCA ports > > (which means having multiple ports will not increase the BW)? > > > > > > Regards, > > John T > -------------- next part -------------- An HTML attachment was scrubbed... URL: From trimmer at silverstorm.com Thu Oct 5 05:19:26 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Thu, 5 Oct 2006 08:19:26 -0400 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: <20061004211757.GF9723@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Wednesday, October 04, 2006 5:18 PM > To: Rimmer, Todd > Cc: Sean Hefty; Ishai Rabinovitz; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] IB_CM: Limit the MRA timeout > > Quoting r. Rimmer, Todd : > > I recommend sticking with the IB spec for the various timeouts. > > So what do you suggest, wait a day or so to timeout the MRA? > > -- > MST Fix the broken endpoint and document the potential issue. As a potential workaround, permit a configuration option in OFED which sets an upper bound on CM related timeouts such that broken endpoints can be worked around. However by default this timeout limit should be very high (many seconds maybe even a minute). Otherwise keep all the computations as in the spec. Parts of this thread were starting to propose alternate algorithms for the computations which is what I was very concerned about because this gets into the realm of rewriting the spec and will cause all kinds of subtle issues, including interop issues with existing devices (such as native IB storage, existing virtual IO controllers, etc). I also posted the summary because some of the computations are subtle. It took a while to uncover these details from the spec and implement them properly in our stack. So I thought the group could benefit from the research, for example, most of stacks I have reviewed ignore the CA local Ack delay components in the equations. Many do not properly implement timewait, many assume the CM REQ contains Packet Lifetime - it does not but it can be computed from CM REQ information, etc. All the equations can be made to work, but it does require some attention to detail. Another key point is that to be effective MRA really needs to be issued by the stack itself, not the ULP. MRA needs to cover both ULPs that know they have a lot of work before they can respond (for example if a ULP know's it must do significant IO to a device before it can respond to a REQ, etc). However the more typical case is a ULP is simply bombarded with 20,000 REQs at once the 1st time the fabric boots or when a large job is started. In this case the ULP backlog will cause many of the REQs to timeout. However an MRA generated by the stack prior to the REQ being delivered to the ULP can help this situation. Todd Rimmer From johnt1johnt2 at gmail.com Thu Oct 5 05:38:22 2006 From: johnt1johnt2 at gmail.com (john t) Date: Thu, 5 Oct 2006 18:08:22 +0530 Subject: [openib-general] module Message-ID: Hi, I have Linux FC4 64-bit machine (kernel version 2.6.11) and I have installed OFED 1.0 on it. I am trying to write a small linux kernel module. When I try to load the module (using insmod) I get following error: disagrees about version of symbol ib_register_client undefined symbol ib_register_client How to solve this problem? Regards, John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pasha at mellanox.co.il Thu Oct 5 05:58:49 2006 From: pasha at mellanox.co.il (Pavel Shamis (Pasha)) Date: Thu, 05 Oct 2006 14:58:49 +0200 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 In-Reply-To: References: Message-ID: <45250189.3070300@mellanox.co.il> > I see it for all MVAPICH tests, it's 100% consistent. MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test over mvapich on SUSE10 platform ? Please check /etc/hosts file on your machines, it should be exactly the same on all nodes. Regards, Pasha > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > >> -----Original Message----- >> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >> Sent: Tuesday, October 03, 2006 3:37 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Aviram Gutman; OpenFabricsEWG; openib >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >> OFED 1.1 rc6 with SLES10 x86_64 >> >> Hi Scott, >> Unfortunately was not able to reproduce the failure on our platforms. >> Do you see the problem with all tests or with the specific only ? >> Is it consistent problem ? >> >> Regards, >> Pasha >> >> Scott Weitzenkamp (sweitzen) wrote: >>> $ uname -a >>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 >> 18:25:39 UTC 2006 >>> x86_64 >>> x86_64 x86_64 GNU/Linux >>> $ >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >>> 192.168.2.46 192.168.2.49 hostname >>> svbu-qa1850-4 >>> svbu-qa1850-3 >>> $ >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >>> 192.168.2.46 192.168.2.49 >>> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench > marks-2.2/ >>> osu_latency >>> >>> The last command just hangs. Can I try your binary RPMs? >>> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Server Virtualization Business Unit >>> Cisco Systems >>> >>> >>>> -----Original Message----- >>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] >>>> Sent: Sunday, October 01, 2006 2:29 AM >>>> To: Scott Weitzenkamp (sweitzen) >>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>> OFED 1.1 rc6 with SLES10 x86_64 >>>> >>>> Can you please elaborate on MVAPICH issues, can you send >>>> command line? >>>> We ran it here on 32 Opteron nodes each quad core and also >> rigorous >>>> tests on the many other nodes? >>>> >>>> >>>> >>>> Scott Weitzenkamp (sweitzen) wrote: >>>>> We are just getting started with OFED testing on SLES10, first >>>>> platform is x86_64. >>>>> >>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are >>>> working so far. >>>>> MVAPICH with OSU benchmarks just hang. This same >> hardware works >>>>> fine with OFED and RHEL4 U3. >>>>> >>>>> Has anyone else seen this? >>>>> >>>>> Scott Weitzenkamp >>>>> SQA and Release Manager >>>>> Server Virtualization Business Unit >>>>> Cisco Systems >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ---------- >>>>> _______________________________________________ >>>>> openfabrics-ewg mailing list >>>>> openfabrics-ewg at openib.org >>>>> http://openib.org/mailman/listinfo/openfabrics-ewg >>>>> >> >> -- >> Pavel Shamis (Pasha) >> Software Engineer >> Mellanox Technologies LTD. >> pasha at mellanox.co.il >> > -- Pavel Shamis (Pasha) Software Engineer Mellanox Technologies LTD. pasha at mellanox.co.il From wombat2 at us.ibm.com Thu Oct 5 06:20:51 2006 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Thu, 5 Oct 2006 09:20:51 -0400 Subject: [openib-general] Multi-port HCA In-Reply-To: Message-ID: "john t" wrote on 10/05/2006 08:18:31 AM: > Hi Bernard, > > I had a configuration issue. I fixed it and now I get same BW (i.e. > around 10 Gb/sec) on each port provided I use ports on different HCA > cards. If I use two ports of the same HCA card then BW gets divided > between these two ports. I am using Mellanox HCA cards and doing > simple send/recv using uverbs. > > Do you think it could be an issue with Mallanox driver or could it > be due to system/PCI-E limitation. I haven't looked closely at the Mellanox driver, but I suspect it is there. When we were looking at the EHCA driver we found that if you have two links using the same event queue ( one per adapter ), and if the completion queue handling is serialized off the one interrupt per adapter, then one and two links get the same aggregate bandwidth. The problem is if you have two completion queues for the two links, they are handled serially. You need separate threads off the interrupt, each processing one of the completion queues in parallel on different CPU's to get the scaling you expect for 2 links. I don't think it is the PCI-e bus because it can handle much more than 20 Gb/s. However, I also don't know how the Mellanox chip works for 2 ports. > > Regards, > John T. > Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhartner at austin.rr.com Thu Oct 5 06:24:25 2006 From: bhartner at austin.rr.com (Bill Hartner) Date: Thu, 05 Oct 2006 08:24:25 -0500 Subject: [openib-general] RHEL 4 U3 - lost completions References: <45214BCC.B0B78035@austin.rr.com> <45215636.2FBBC84C@austin.rr.com> <45215BC4.C8BB5E22@austin.rr.com> <4521FC1A.6000603@voltaire.com> <452376F0.4030902@voltaire.com> Message-ID: <45250789.276E6466@austin.rr.com> Or Gerlitz wrote: > > Roland Dreier wrote: > > Or> Roland - If indeed, does it make sense that the problem does > > Or> not reproduce with single threaded runs? > > > > Sorry, I can't parse the question. However, the problem here seems to > > be that the CQ buffer pages end up being marked for copy-on-write, and > > I don't know of any reason why that would happen other than a fork() > > happening somewhere (possibly behind the scenes in a system() call or > > something like that). > > My question was: assuming there is some fork() (eg behind the scenes of > daemonize()) in the app, does it makes sense that everything works as > long as the app is single threaded but when there are multiple threads > things breaks (eg COW is applied on the page used to hold the CQ etc). I found a fork() call in the app code that is made after after the completion queue is created - thanks for keeping me pointed in the right direction. I also modified the pthread test case by adding a fork() call after the completion queue is created and it now hangs after the 2nd RDMA like the app did. -Bill From rdreier at cisco.com Thu Oct 5 07:18:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 07:18:49 -0700 Subject: [openib-general] Multi-port HCA In-Reply-To: (Bernard King-Smith's message of "Thu, 5 Oct 2006 09:20:51 -0400") References: Message-ID: Bernard> I don't think it is the PCI-e bus because it can handle Bernard> much more than 20 Gb/s. This isn't true. Mellanox cards have PCI-e x8 interfaces, which has a theoretical limit of 16 Gb/sec in each direction, and a practical limit that is even lower due to packetization and other overhead. - R. From rdreier at cisco.com Thu Oct 5 07:19:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 07:19:58 -0700 Subject: [openib-general] [PATCH] IB_CM: Limit the MRA timeout In-Reply-To: (Todd Rimmer's message of "Thu, 5 Oct 2006 08:19:26 -0400") References: Message-ID: Todd> Fix the broken endpoint and document the potential issue. Todd> As a potential workaround, permit a configuration option in Todd> OFED which sets an upper bound on CM related timeouts such Todd> that broken endpoints can be worked around. No, let's not have any "unbreak_my_system=1" config options. We have too many config options as it is. - R. From eli at mellanox.co.il Thu Oct 5 07:22:12 2006 From: eli at mellanox.co.il (Eli Cohen) Date: Thu, 05 Oct 2006 16:22:12 +0200 Subject: [openib-general] [PATCH] IB/ipoib - possible deadlock in path query and join Message-ID: <1160058132.5622.21.camel@localhost> When a path query or a join fails immediately, we want to call complete(&obj->done). This is required to avoid a deadlock that can occur if there is no farther attempt to the operation (query or join) and a wait_for_completion() is called on obj->done due to a flush operation. Signed-off-by: Eli Cohen --- Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-10-05 09:09:27.000000000 +0200 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-10-05 09:12:00.000000000 +0200 @@ -504,6 +504,7 @@ static int path_rec_start(struct net_dev if (path->query_id < 0) { ipoib_warn(priv, "ib_sa_path_rec_get failed\n"); path->query = NULL; + complete(&path->done); return path->query_id; } Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-10-05 09:09:23.000000000 +0200 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-10-05 12:23:25.000000000 +0200 @@ -371,6 +371,7 @@ static int ipoib_mcast_sendonly_join(str if (ret < 0) { ipoib_warn(priv, "ib_sa_mcmember_rec_set failed (ret = %d)\n", ret); + complete(&mcast->done); } else { ipoib_dbg_mcast(priv, "no multicast record for " IPOIB_GID_FMT ", starting join\n", @@ -501,6 +502,7 @@ static void ipoib_mcast_join(struct net_ if (ret < 0) { ipoib_warn(priv, "ib_sa_mcmember_rec_set failed, status %d\n", ret); + complete(&mcast->done); mcast->backoff *= 2; if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) From mst at mellanox.co.il Thu Oct 5 07:25:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 5 Oct 2006 16:25:49 +0200 Subject: [openib-general] module In-Reply-To: References: Message-ID: <20061005142549.GA8737@mellanox.co.il> Quoting r. john t : > Subject: module > > Hi, > > I have Linux FC4 64-bit machine (kernel version 2.6.11) and I have installed OFED 1.0 on it. I am trying to write a small linux kernel module. When I try to load the module (using insmod) I get following error: > > disagrees about version of symbol ib_register_client > undefined symbol ib_register_client > > How to solve this problem? You are compiling against the wrong rdma headers. make with V=1 and you'll see. -- MST From wombat2 at us.ibm.com Thu Oct 5 07:33:26 2006 From: wombat2 at us.ibm.com (Bernard King-Smith) Date: Thu, 5 Oct 2006 10:33:26 -0400 Subject: [openib-general] Multi-port HCA In-Reply-To: Message-ID: Roland Dreier wrote on 10/05/2006 10:18:49 AM: > Bernard> I don't think it is the PCI-e bus because it can handle > Bernard> much more than 20 Gb/s. > > This isn't true. Mellanox cards have PCI-e x8 interfaces, which has a > theoretical limit of 16 Gb/sec in each direction, and a practical > limit that is even lower due to packetization and other overhead. Right, (looking at future hardware specs) :=} So the question is what speedup is expected from a 2 port Mellanox PCI-e adapter going from 1 to 2 ports. > > - R. Bernie King-Smith IBM Corporation Server Group Cluster System Performance wombat2 at us.ibm.com (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner -------------- next part -------------- An HTML attachment was scrubbed... URL: From svdavidson at charter.net Thu Oct 5 07:45:41 2006 From: svdavidson at charter.net (Shannon V. Davidson) Date: Thu, 05 Oct 2006 09:45:41 -0500 Subject: [openib-general] Multi-port HCA In-Reply-To: References: Message-ID: <45251A95.9010407@charter.net> John, In our testing with dual port Mellanox SDR HCAs, we found that not all PCI-express implementations are equal. Depending on the PCIe chipset, we measured unidirectional SDR dual-rail bandwidth ranging from 1100-1500 MB/sec and bidirectional SDR dual-rail bandwidth ranging from 1570-2600 MB/sec. YMMV, but had good luck with Intel and Nvidia chipsets, and less success with the Broadcom Serverworks HT-1000 and HT-2000 chipsets. My last report (in June 2006) was that Broadcom was working to improve their PCI-express performance. Regards, Shannon john t wrote: > Hi Bernard, > > I had a configuration issue. I fixed it and now I get same BW (i.e. > around 10 Gb/sec) on each port provided I use ports on different HCA > cards. If I use two ports of the same HCA card then BW gets divided > between these two ports. I am using Mellanox HCA cards and doing > simple send/recv using uverbs. > > Do you think it could be an issue with Mallanox driver or could it be > due to system/PCI-E limitation. > > Regards, > John T. > > > On 10/3/06, *Bernard King-Smith* > wrote: > > > John, > > Who's adapter (manufacturer) are you using? It is usually an > adapter implementation or driver issue that occures when you > cannot scale across multiple links. The fact that you don't scale > up from one link, but it appears they share a fixed bandwidth > across N links means that there is a driver or stack issue. At one > time I think that IPoIB and maybe other IB drivers used only one > event queue across multiple links which would be a bottleneck. We > added code in the IBM EHCA driver to get round this bottleneck. > > Are your measurements using MPI or IP. Are you using separate > tasks/sockets per link and using different subnets if using IP? > > Bernie King-Smith > IBM Corporation > Server Group > Cluster System Performance > wombat2 at us.ibm.com (845)433-8483 > Tie. 293-8483 or wombat2 on NOTES > > "We are not responsible for the world we are born into, only for > the world we leave when we die. > So we have to accept what has gone before us and work to change > the only thing we can, > -- The Future." William Shatner > > john t" < johnt1johnt2 at gmail.com > > wrote on 10/03/2006 09:42:24 AM: > > > > > Hi, > > > > I have two HCA cards, each having two ports and each connected to a > > separate PCI-E x8 slot. > > > > Using one HCA port I get end to end BW of 11.6 Gb/sec > (uni-direction RDMA). > > If I use two ports of the same HCA or different HCA, I get between 5 > > to 6.5 Gb/sec point-to-point BW on each port. BW on each port > > further reduces if I use more ports. I am not able to understand > > this behaviour. Is there any limitation on max. BW that a system can > > provide? Does the available BW get divided among multiple HCA ports > > (which means having multiple ports will not increase the BW)? > > > > > > Regards, > > John T > > > ------------------------------------------------------------------------ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- ____________________________________________ Shannon V. Davidson Senior Software Engineer Raytheon 636-479-7465 office 443-383-0331 fax ____________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Thu Oct 5 07:56:59 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 05 Oct 2006 09:56:59 -0500 Subject: [openib-general] [PATCH] RFC libibverbs - support provider response data in reg_mr Message-ID: <1160060219.22519.7.camel@stevo-desktop> Pass back provider-specific meta data for user mr registration. The infrastructure is there to do this in the kernel, but not in the libibverbs cmd interface. This is (hopefully) a short term requirement for the driver I'm working on, but if it has utility for others, we should add it. I guess the downside is it impacts all the libs... Signed-off-by: Steve Wise Index: libibverbs/include/infiniband/driver.h =================================================================== --- libibverbs/include/infiniband/driver.h (revision 9349) +++ libibverbs/include/infiniband/driver.h (working copy) @@ -85,7 +85,8 @@ int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, enum ibv_access_flags access, struct ibv_mr *mr, struct ibv_reg_mr *cmd, - size_t cmd_size); + size_t cmd_size, + struct ibv_reg_mr_resp *resp, size_t resp_size); int ibv_cmd_dereg_mr(struct ibv_mr *mr); int ibv_cmd_create_cq(struct ibv_context *context, int cqe, struct ibv_comp_channel *channel, Index: libibverbs/src/cmd.c =================================================================== --- libibverbs/src/cmd.c (revision 9349) +++ libibverbs/src/cmd.c (working copy) @@ -221,11 +221,11 @@ int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, enum ibv_access_flags access, struct ibv_mr *mr, struct ibv_reg_mr *cmd, - size_t cmd_size) + size_t cmd_size, + struct ibv_reg_mr_resp *resp, size_t resp_size) { - struct ibv_reg_mr_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); + IBV_INIT_CMD_RESP(cmd, cmd_size, REG_MR, resp, resp_size); cmd->start = (uintptr_t) addr; cmd->length = length; @@ -236,9 +236,9 @@ if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) return errno; - mr->handle = resp.mr_handle; - mr->lkey = resp.lkey; - mr->rkey = resp.rkey; + mr->handle = resp->mr_handle; + mr->lkey = resp->lkey; + mr->rkey = resp->rkey; return 0; } From aviram at dev.mellanox.co.il Thu Oct 5 08:39:10 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Thu, 05 Oct 2006 17:39:10 +0200 Subject: [openib-general] OFED 1.1 RC7 Message-ID: <4525271E.8070000@dev.mellanox.co.il> OFED-1.1-rc7 is available on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ File: OFED-1.1-rc7.tgz Please report any issues in bugzilla http://openib.org/bugzilla/ Release details: ================ BUILD_ID: OFED-1.1-rc7 openib-1.1 (REV=9725) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 ref: refs/heads/ofed_1_1 commit fde99a7a22e56d6aa90dae9db3d600755efcedb5 # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm Bug fixes from OFED-1.1-rc6: =========================== IPoIB HA: BUG 247: OFED IPoIB HA not working on RHEL4 U3 BUG 259: problems with OFED IPoIB HA on SLES10 IPATH: BUG 252: Failed to load ib_ipath module (IPATH device is not present) EHCA: BUG 250: libehca is not selectable although ib_ehca was selected SRP HA: Use port_guid instead of node_guid. Allows the user to set the identifier_extension when providing the target attributes. ibutils: BUG 243: ibutils/ibis build fails on SLES 10 / PPC64 openib diags: BUG 241: Diags build fails on SLES 10 PPC64 Open MPI: Fixed compilation issue on SLES10 PPC64 mstflint : SLES10 ppc workaround Known issues: ============= 1. IPoIB HA does not migrate IPoIB pkey interfaces (BUG 260) 2. kernel-ib conflicts with kernel-smp (Used --force flag in kernel-ib RPM installation as a workaround) (BUG 255) Lets try to get a final release on Wed or Thu next week. Aviram From bugzilla-daemon at openib.org Thu Oct 5 08:48:17 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 5 Oct 2006 08:48:17 -0700 (PDT) Subject: [openib-general] [Bug 266] IPoIB multicast does not work with RHEL4 U4 Message-ID: <20061005154817.A6BE82283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=266 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dledford at redhat.com ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From krause at cup.hp.com Thu Oct 5 09:03:37 2006 From: krause at cup.hp.com (Michael Krause) Date: Thu, 05 Oct 2006 09:03:37 -0700 Subject: [openib-general] Multi-port HCA In-Reply-To: References: Message-ID: <6.2.0.14.2.20061005085458.055e3380@esmail.cup.hp.com> At 07:18 AM 10/5/2006, Roland Dreier wrote: > Bernard> I don't think it is the PCI-e bus because it can handle > Bernard> much more than 20 Gb/s. > >This isn't true. Mellanox cards have PCI-e x8 interfaces, which has a >theoretical limit of 16 Gb/sec in each direction, and a practical >limit that is even lower due to packetization and other overhead. Nominally derate to 80% of the bandwidth after the 8b/10b encoding is removed and you'll come close to maximum of what a PCIe Root Port can service. Depending upon the amount of control messages (work requests, CQ updates, interrupts, etc.) generated, the effective bandwidth is further reduced - IPC workloads tend to be worse than storage as the ratio of control to application data is higher (topology has an impact here as well but for this discussion, assume point-to-point attachment). The chipsets using PCIe 2.5 GT/s (raw signaling rate) generally drive a single port HCA quite nicely but not a dual-port. An IB DDR using PCIe 2.5 GT/s is not going to come that close to link rate. You'll need to wait for the PCIe 5.0 GT/s chipsets which for servers isn't any time soon (most public information shows 2008 for shipment though expect people to sample earlier and for clients to ship products much earlier). The problem facing servers is whether there will be enough x8 Root Ports available to attach such links. Some vendors may decide to only ship x4 5.0 GT/s since it is the equivalent of a x8 2.5 GT/s thinking the world will just roll to this signaling rate quickly. However, given the need for interoperability and the desire by OEM to avoid customer backlash when their brand new system cannot perform as well as the older system when x8 cards are used, well, one can only hope they are listening closely to their customers as the OEM's customers won't be happy. Mike From sean.hefty at intel.com Thu Oct 5 09:56:13 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 5 Oct 2006 09:56:13 -0700 Subject: [openib-general] [PATCH] 2.6.19 rdma_cm: add rdma_establish call to connect if RTU is lost In-Reply-To: <20061005065850.GA6711@mellanox.co.il> Message-ID: <000001c6e89f$2ae347a0$8698070a@amr.corp.intel.com> From: Sean Hefty Allow ULPs to transition to RTS before sending a REP. This allows the ULP to respond to a received message if it arrives before the RTU or communication established event. Modify the RDMA CM to transition to RTS when sending a REP over IB, and expose a new rdma_establish interface that a user can invoke to force a connection into the established state if it polls a receive completion before an RTU arrives. Signed-off-by: Sean Hefty --- Please consider for 2.6.19, since it does allow a connection to be made when data is received on a QP, but the RTU is lost. This problem has been reported on the OFA OFED releases. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 25b1018..22ec434 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3243,6 +3243,10 @@ static int cm_init_qp_rts_attr(struct cm spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { + /* Allow transition to RTS before sending REP */ + case IB_CM_REQ_RCVD: + case IB_CM_MRA_REQ_SENT: + case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: case IB_CM_REP_SENT: diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9ae4f3a..bc20662 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -775,22 +775,6 @@ static int cma_verify_rep(struct rdma_id return 0; } -static int cma_rtu_recv(struct rdma_id_private *id_priv) -{ - int ret; - - ret = cma_modify_qp_rts(&id_priv->id); - if (ret) - goto reject; - - return 0; -reject: - cma_modify_qp_err(&id_priv->id); - ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); - return ret; -} - static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; @@ -821,9 +805,8 @@ static int cma_ib_handler(struct ib_cm_i private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; break; case IB_CM_RTU_RECEIVED: - status = cma_rtu_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + case IB_CM_USER_ESTABLISHED: + event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: status = -ETIMEDOUT; /* fall through */ @@ -1960,11 +1943,25 @@ static int cma_accept_ib(struct rdma_id_ struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; - int ret; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; - ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) - return ret; + if (id_priv->id.qp) { + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + goto out; + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, &qp_attr, + &qp_attr_mask); + if (ret) + goto out; + + qp_attr.max_rd_atomic = conn_param->initiator_depth; + ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); + if (ret) + goto out; + } memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; @@ -1979,7 +1976,9 @@ static int cma_accept_ib(struct rdma_id_ rep.rnr_retry_count = conn_param->rnr_retry_count; rep.srq = id_priv->srq ? 1 : 0; - return ib_send_cm_rep(id_priv->cm_id.ib, &rep); + ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); +out: + return ret; } static int cma_accept_iw(struct rdma_id_private *id_priv, @@ -2045,6 +2044,27 @@ reject: } EXPORT_SYMBOL(rdma_accept); +int rdma_establish(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + switch (id->device->node_type) { + case RDMA_NODE_IB_CA: + ret = ib_cm_establish(id_priv->cm_id.ib); + break; + default: + ret = 0; + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_establish); + int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index deb5a0a..2460881 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -253,6 +253,16 @@ int rdma_listen(struct rdma_cm_id *id, i int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** + * rdma_establish - Forces a connection state to established. + * @id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * QP before being notified that the connection has been established by the + * RDMA CM. + */ +int rdma_establish(struct rdma_cm_id *id); + +/** * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, From rdreier at cisco.com Thu Oct 5 10:04:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 10:04:42 -0700 Subject: [openib-general] [PATCH] 2.6.19 rdma_cm: add rdma_establish call to connect if RTU is lost In-Reply-To: <000001c6e89f$2ae347a0$8698070a@amr.corp.intel.com> (Sean Hefty's message of "Thu, 5 Oct 2006 09:56:13 -0700") References: <000001c6e89f$2ae347a0$8698070a@amr.corp.intel.com> Message-ID: > Please consider for 2.6.19, since it does allow a connection to be made > when data is received on a QP, but the RTU is lost. This problem has been > reported on the OFA OFED releases. I'm confused -- how does this fix anything? I don't see any callers of the new rdma_establish() function ?? - R. From rdreier at cisco.com Thu Oct 5 10:11:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 10:11:05 -0700 Subject: [openib-general] [PATCH] RFC libibverbs - support provider response data in reg_mr In-Reply-To: <1160060219.22519.7.camel@stevo-desktop> (Steve Wise's message of "Thu, 05 Oct 2006 09:56:59 -0500") References: <1160060219.22519.7.camel@stevo-desktop> Message-ID: Steve> Pass back provider-specific meta data for user mr Steve> registration. The infrastructure is there to do this in the Steve> kernel, but not in the libibverbs cmd interface. Steve> This is (hopefully) a short term requirement for the driver Steve> I'm working on, but if it has utility for others, we should Steve> add it. I guess the downside is it impacts all the libs... I guess this is OK if it's needed, but can you add a preprocessor define (a la what I did for ibv_cmd_resize_cq) so that low-level driver plugins can retain source-level compatibility with both old and new libibverbs? - R. From pradeep at us.ibm.com Thu Oct 5 10:29:57 2006 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Thu, 5 Oct 2006 10:29:57 -0700 Subject: [openib-general] ibv_devinfo Message-ID: On eHCA we find that there are some discrepencies (for example the phys_state) between the outputs of "ibv_devinfo -v" and the corresponding output of ibstatus. Indeed ibstatus does display what is in the /sys/class/infiniband/* files, and as expected. I would like to understand the status of ibv_devinfo. Is this a supported program -given that the sources for this is in the example directory? Also, I find the source (devinfo.c) does appear in the OFED 1.0 tar ball. Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Thu Oct 5 10:37:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 05 Oct 2006 12:37:35 -0500 Subject: [openib-general] [PATCH] RFC libibverbs - support provider response data in reg_mr In-Reply-To: References: <1160060219.22519.7.camel@stevo-desktop> Message-ID: <1160069855.23232.2.camel@stevo-desktop> On Thu, 2006-10-05 at 10:11 -0700, Roland Dreier wrote: > Steve> Pass back provider-specific meta data for user mr > Steve> registration. The infrastructure is there to do this in the > Steve> kernel, but not in the libibverbs cmd interface. > > Steve> This is (hopefully) a short term requirement for the driver > Steve> I'm working on, but if it has utility for others, we should > Steve> add it. I guess the downside is it impacts all the libs... > > I guess this is OK if it's needed, but can you add a preprocessor > define (a la what I did for ibv_cmd_resize_cq) so that low-level > driver plugins can retain source-level compatibility with both old and > new libibverbs? > > - R. Like this? ----- Pass back provider-specific meta data for user mr registration. The infrastructure is there to do this in the kernel, but not in the libibverbs cmd interface. Signed-off-by: Steve Wise Index: libibverbs/include/infiniband/driver.h =================================================================== --- libibverbs/include/infiniband/driver.h (revision 9727) +++ libibverbs/include/infiniband/driver.h (working copy) @@ -82,10 +82,12 @@ struct ibv_alloc_pd *cmd, size_t cmd_size, struct ibv_alloc_pd_resp *resp, size_t resp_size); int ibv_cmd_dealloc_pd(struct ibv_pd *pd); +#define IBV_CMD_REG_MR_HAS_RESP_PARAMS int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, enum ibv_access_flags access, struct ibv_mr *mr, struct ibv_reg_mr *cmd, - size_t cmd_size); + size_t cmd_size, + struct ibv_reg_mr_resp *resp, size_t resp_size); int ibv_cmd_dereg_mr(struct ibv_mr *mr); int ibv_cmd_create_cq(struct ibv_context *context, int cqe, struct ibv_comp_channel *channel, Index: libibverbs/src/cmd.c =================================================================== --- libibverbs/src/cmd.c (revision 9727) +++ libibverbs/src/cmd.c (working copy) @@ -232,11 +232,11 @@ int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t length, uint64_t hca_va, enum ibv_access_flags access, struct ibv_mr *mr, struct ibv_reg_mr *cmd, - size_t cmd_size) + size_t cmd_size, + struct ibv_reg_mr_resp *resp, size_t resp_size) { - struct ibv_reg_mr_resp resp; - IBV_INIT_CMD_RESP(cmd, cmd_size, REG_MR, &resp, sizeof resp); + IBV_INIT_CMD_RESP(cmd, cmd_size, REG_MR, resp, resp_size); cmd->start = (uintptr_t) addr; cmd->length = length; @@ -249,9 +249,9 @@ VALGRIND_MAKE_MEM_DEFINED(&resp, sizeof resp); - mr->handle = resp.mr_handle; - mr->lkey = resp.lkey; - mr->rkey = resp.rkey; + mr->handle = resp->mr_handle; + mr->lkey = resp->lkey; + mr->rkey = resp->rkey; return 0; } From mshefty at ichips.intel.com Thu Oct 5 10:40:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 05 Oct 2006 10:40:41 -0700 Subject: [openib-general] [PATCH] 2.6.19 rdma_cm: add rdma_establish call to connect if RTU is lost In-Reply-To: References: <000001c6e89f$2ae347a0$8698070a@amr.corp.intel.com> Message-ID: <45254399.8090703@ichips.intel.com> > I'm confused -- how does this fix anything? I don't see any callers > of the new rdma_establish() function ?? This was submitted at the request of Michael and Or, so I'll let them comment. There are no in tree passive side users of the rdma_cm, so passive side calls are unused (rdma_listen, rdma_accept, rdma_establish). I have no issues deferring this patch until a user is added (which I hope will be 2.6.20). - Sean From rdreier at cisco.com Thu Oct 5 10:53:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 10:53:41 -0700 Subject: [openib-general] ibv_devinfo In-Reply-To: (Pradeep Satyanarayana's message of "Thu, 5 Oct 2006 10:29:57 -0700") References: Message-ID: Pradeep> On eHCA we find that there are some discrepencies (for Pradeep> example the phys_state) between the outputs of Pradeep> "ibv_devinfo -v" and the corresponding output of Pradeep> ibstatus. Indeed ibstatus does display what is in the Pradeep> /sys/class/infiniband/* files, and as expected. Could you give some details on the differences? Is ibv_devinfo or ibstatus giving the correct output? Pradeep> I would like to understand the status of ibv_devinfo. Is Pradeep> this a supported program -given that the sources for this Pradeep> is in the example directory? Also, I find the source Pradeep> (devinfo.c) does appear in the OFED 1.0 tar ball. I'm not sure what "supported" would mean exactly, but certainly I would like ibv_devinfo to work as well as possible. Unfortunately I can't fix anything without a more detailed report... - R. From rjwalsh at pathscale.com Thu Oct 5 11:04:50 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 05 Oct 2006 11:04:50 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: References: <451F7F4F.30605@pathscale.com> <20061001091709.GC1796@mellanox.co.il> <452022DA.6060107@pathscale.com> <45218737.7080901@pathscale.com> <4521AA8B.80702@pathscale.com> <4521AC31.3020200@pathscale.com> <452354D8.6040903@pathscale.com> <452468E3.1000902@pathscale.com> Message-ID: <45254942.2010704@pathscale.com> Roland Dreier wrote: > Robert> This was without the patch, though, right? Cause if > Robert> you've applied the patch and you're still getting this > Robert> message, I'm confused. > > Yes, I was trying to debug the root cause of the problem, so I was > just running the mainline kernel. FWIW: I saw this behavior on kernel-smp-2.6.16-1.2096_FC4.root (i.e. an FC4 machine.) That's a regular 2.6.16-1.2096_FC4 RPM that we twiddled slightly and rebuilt (I don't have the twiddlage details to hand.) Regards, Robert. From sean.hefty at intel.com Thu Oct 5 11:06:57 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 5 Oct 2006 11:06:57 -0700 Subject: [openib-general] [PATCH v2] IB_CM: Limit CM message timeouts In-Reply-To: <20061004224533.GN9723@mellanox.co.il> Message-ID: <000501c6e8a9$0c0d67c0$8698070a@amr.corp.intel.com> Limit the timeout that the ib_cm will wait to receive a response to a message, to avoid excessively large (on the order of hours) timeout values. This prevents consuming resources tracking requests for extended periods of time. This helps correct for a bug in SRP Engenio target sending a large value (>1 hour) as service timeout. Signed-off-by: Sean Hefty --- Michael / Ishai, this is untested. Can you please let me know if it works for you? I didn't change the packet life time, since that's needed to configure the QP. Index: cm.c =================================================================== --- cm.c (revision 9713) +++ cm.c (working copy) @@ -54,6 +54,12 @@ MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("InfiniBand CM"); MODULE_LICENSE("Dual BSD/GPL"); +/* + * Limit CM msg timeouts to something reasonable. + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min. + */ +#define IB_CM_MAX_TIMEOUT 21 + static void cm_add_one(struct ib_device *device); static void cm_remove_one(struct ib_device *device); @@ -891,12 +897,12 @@ static void cm_format_req(struct cm_req_ cm_req_set_resp_res(req_msg, param->responder_resources); cm_req_set_init_depth(req_msg, param->initiator_depth); cm_req_set_remote_resp_timeout(req_msg, - param->remote_cm_response_timeout); + min((u8) IB_CM_MAX_TIMEOUT, param->remote_cm_response_timeout)); cm_req_set_qp_type(req_msg, param->qp_type); cm_req_set_flow_ctrl(req_msg, param->flow_control); cm_req_set_starting_psn(req_msg, cpu_to_be32(param->starting_psn)); cm_req_set_local_resp_timeout(req_msg, - param->local_cm_response_timeout); + min((u8) IB_CM_MAX_TIMEOUT, param->local_cm_response_timeout)); cm_req_set_retry_count(req_msg, param->retry_count); req_msg->pkey = param->primary_path->pkey; cm_req_set_path_mtu(req_msg, param->primary_path->mtu); @@ -1002,10 +1008,10 @@ int ib_send_cm_req(struct ib_cm_id *cm_i } cm_id->service_id = param->service_id; cm_id->service_mask = __constant_cpu_to_be64(~0ULL); - cm_id_priv->timeout_ms = cm_convert_to_ms( - param->primary_path->packet_life_time) * 2 + - cm_convert_to_ms( - param->remote_cm_response_timeout); + cm_id_priv->timeout_ms = + min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(param->primary_path->packet_life_time) * 2 + + cm_convert_to_ms(param->remote_cm_response_timeout)); cm_id_priv->max_cm_retries = param->max_cm_retries; cm_id_priv->initiator_depth = param->initiator_depth; cm_id_priv->responder_resources = param->responder_resources; @@ -1404,8 +1410,9 @@ static int cm_req_handler(struct cm_work } } cm_id_priv->tid = req_msg->hdr.tid; - cm_id_priv->timeout_ms = cm_convert_to_ms( - cm_req_get_local_resp_timeout(req_msg)); + cm_id_priv->timeout_ms = + min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(cm_req_get_local_resp_timeout(req_msg))); cm_id_priv->max_cm_retries = cm_req_get_max_cm_retries(req_msg); cm_id_priv->remote_qpn = cm_req_get_local_qpn(req_msg); cm_id_priv->initiator_depth = cm_req_get_resp_res(req_msg); @@ -2308,8 +2315,9 @@ static int cm_mra_handler(struct cm_work work->cm_event.private_data = &mra_msg->private_data; work->cm_event.param.mra_rcvd.service_timeout = cm_mra_get_service_timeout(mra_msg); - timeout = cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + - cm_convert_to_ms(cm_id_priv->av.packet_life_time); + timeout = min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + + cm_convert_to_ms(cm_id_priv->av.packet_life_time)); spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { @@ -2701,7 +2709,7 @@ int ib_send_cm_sidr_req(struct ib_cm_id cm_id->service_id = param->service_id; cm_id->service_mask = __constant_cpu_to_be64(~0ULL); - cm_id_priv->timeout_ms = param->timeout_ms; + cm_id_priv->timeout_ms = min(IB_CM_MAX_TIMEOUT, param->timeout_ms); cm_id_priv->max_cm_retries = param->max_cm_retries; ret = cm_alloc_msg(cm_id_priv, &msg); if (ret) From pradeep at us.ibm.com Thu Oct 5 11:11:48 2006 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Thu, 5 Oct 2006 11:11:48 -0700 Subject: [openib-general] ibv_devinfo In-Reply-To: Message-ID: Roland, Here is what I mean (partial output of "ibv_devinfo -v"): port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 4 port_lid: 128 port_lmc: 0x02 max_msg_sz: 0x0 port_cap_flags: 0x00000000 max_vl_num: 0 bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 1 gid_tbl_len: 1 subnet_timeout: 8 init_type_reply: 0 active_width: 12X (8) active_speed: 2.5 Gbps (1) phys_state: invalid physical state (0) GID[ 0]: fe80:0000:0000:0003:0002:5500:1002:1b3d port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 4 port_lid: 136 port_lmc: 0x02 max_msg_sz: 0x0 port_cap_flags: 0x00000000 max_vl_num: 0 bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 1 gid_tbl_len: 1 subnet_timeout: 8 init_type_reply: 0 active_width: 12X (8) active_speed: 2.5 Gbps (1) phys_state: invalid physical state (0) GID[ 0]: fe80:0000:0000:0002:0002:5500:1002:1b7d The phys_state above does not correspond to what is in /sys/class/infiniband/ehca0/ports/1 However, the "phys state" that ibstatus displays is indeed correct and as expected (see below). ibstatus ehca0 Infiniband device 'ehca0' port 1 status: default gid: fe80:0000:0000:0000:0002:5500:0000:933d base lid: 0xb sm lid: 0x2 state: 4: ACTIVE phys state: 0: rate: 30 Gb/sec (12X) Infiniband device 'ehca0' port 2 status: default gid: fe80:0000:0000:0000:0002:5500:0000:937d base lid: 0x7 sm lid: 0x2 state: 4: ACTIVE phys state: 0: rate: 30 Gb/sec (12X) The fix in dev_info.c for this problem is elementary (replace "invalid physical state" with "unknown"). Since the sources for this program occurred in the examples directory I presumed that this was just a sample program written as an example. That was why I asked if it was "supported" or not. Pradeep pradeep at us.ibm.com Roland Dreier wrote on 10/05/2006 10:53:41 AM: > Pradeep> On eHCA we find that there are some discrepencies (for > Pradeep> example the phys_state) between the outputs of > Pradeep> "ibv_devinfo -v" and the corresponding output of > Pradeep> ibstatus. Indeed ibstatus does display what is in the > Pradeep> /sys/class/infiniband/* files, and as expected. > > Could you give some details on the differences? Is ibv_devinfo or > ibstatus giving the correct output? > > Pradeep> I would like to understand the status of ibv_devinfo. Is > Pradeep> this a supported program -given that the sources for this > Pradeep> is in the example directory? Also, I find the source > Pradeep> (devinfo.c) does appear in the OFED 1.0 tar ball. > > I'm not sure what "supported" would mean exactly, but certainly I > would like ibv_devinfo to work as well as possible. Unfortunately I > can't fix anything without a more detailed report... > > - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu Oct 5 11:23:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 11:23:54 -0700 Subject: [openib-general] ibv_devinfo In-Reply-To: (Pradeep Satyanarayana's message of "Thu, 5 Oct 2006 11:11:48 -0700") References: Message-ID: > Here is what I mean (partial output of "ibv_devinfo -v"): > phys_state: invalid physical state (0) > The phys_state above does not correspond to what is in > /sys/class/infiniband/ehca0/ports/1 > > However, the "phys state" that ibstatus displays is indeed correct and as > expected (see below). > phys state: 0: umm... I'm not sure that either "unknown" or "invalid" can really be considered an incorrect translation of physical state 0. The IB spec does not give 0 as one of the defined values for the physical state field, so I think it is perfectly fine to say the state is "invalid." I think the real problem is that the ehca driver in the kernel does not fill in a real value for the phys_state field in its query_port method. This is also why the port's capability flags are (incorrectly I assume) shown as 0x00000000 by ibv_devinfo, etc. - R. From arthur.jones at qlogic.com Thu Oct 5 14:23:27 2006 From: arthur.jones at qlogic.com (Arthur Jones) Date: Thu, 5 Oct 2006 14:23:27 -0700 Subject: [openib-general] ipoib mcast questions... Message-ID: <20061005212327.GI2632@bauxite.pathscale.com> hi all, i'm looking over the ipoib multicast code, and i have a couple questions: 1) the set_multicast_list net device callback seems to just kick off another thread to do the work of registering the multicast group. the mc_list net_device field is only valid under the netif_tx_lock, but this lock is not grabbed by the restart_task. what happens if the mc_list is modified while in the restart_task? 2) there seem to be 2 threads, the restart_task which creates queries and the join_task which sends off the mad requests. why? is there some performance advantage? it would seem easier to do the registrations serially in the restart task... arthur From rdreier at cisco.com Thu Oct 5 21:18:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 05 Oct 2006 21:18:36 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <20061005212327.GI2632@bauxite.pathscale.com> (Arthur Jones's message of "Thu, 5 Oct 2006 14:23:27 -0700") References: <20061005212327.GI2632@bauxite.pathscale.com> Message-ID: > 1) the set_multicast_list net device callback > seems to just kick off another thread to do > the work of registering the multicast group. > the mc_list net_device field is only valid > under the netif_tx_lock, but this lock is not > grabbed by the restart_task. what happens > if the mc_list is modified while in the > restart_task? Just looking quickly, I see that ipoib_mcast_restart_task() does netif_tx_lock() (right near the top). Isn't this sufficient? > 2) there seem to be 2 threads, the restart_task > which creates queries and the join_task which sends > off the mad requests. why? is there some performance > advantage? it would seem easier to do the registrations > serially in the restart task... I guess it's really that way mainly for historical reasons. I'd be glad to see patches that simplify things (of course making sure that everything still works ;) - R. From johnt1johnt2 at gmail.com Fri Oct 6 02:09:13 2006 From: johnt1johnt2 at gmail.com (john t) Date: Fri, 6 Oct 2006 14:39:13 +0530 Subject: [openib-general] Multi-port HCA In-Reply-To: <45251A95.9010407@charter.net> References: <45251A95.9010407@charter.net> Message-ID: Hi Shannon, The bandwidth figures that you quoted below match with my readings for single port Mellanox DDR HCA (both for unidirection and bidirection). So it seems dual port SDR HCA performs as good as single port DDR HCA. It would help if you can also tell the bandwidth that you got using one port of your dual-port SDR HCA card. Was it half the bandwidth that you stated below, which means having two SDR ports per HCA helps. In my case it seems having two ports (DDR) per HCA does not increase BW, since PCI-e x8 limit is 16 Gb/sec per direction and each of the two HCA ports (DDR) though capable of transferring 16 Gb/sec in each direction, when used together can not go above 16 Gb/sec. Regards, John T. On 10/5/06, Shannon V. Davidson wrote: > > John, > > In our testing with dual port Mellanox SDR HCAs, we found that not all > PCI-express implementations are equal. Depending on the PCIe chipset, we > measured unidirectional SDR dual-rail bandwidth ranging from 1100-1500 > MB/sec and bidirectional SDR dual-rail bandwidth ranging from 1570-2600 > MB/sec. YMMV, but had good luck with Intel and Nvidia chipsets, and less > success with the Broadcom Serverworks HT-1000 and HT-2000 chipsets. My last > report (in June 2006) was that Broadcom was working to improve their > PCI-express performance. > > Regards, > Shannon > > john t wrote: > > Hi Bernard, > > I had a configuration issue. I fixed it and now I get same BW (i.e. around > 10 Gb/sec) on each port provided I use ports on different HCA cards. If I > use two ports of the same HCA card then BW gets divided between these two > ports. I am using Mellanox HCA cards and doing simple send/recv using > uverbs. > > Do you think it could be an issue with Mallanox driver or could it be due > to system/PCI-E limitation. > > Regards, > John T. > > > On 10/3/06, Bernard King-Smith wrote: > > > > > > John, > > > > Who's adapter (manufacturer) are you using? It is usually an adapter > > implementation or driver issue that occures when you cannot scale across > > multiple links. The fact that you don't scale up from one link, but it > > appears they share a fixed bandwidth across N links means that there is a > > driver or stack issue. At one time I think that IPoIB and maybe other IB > > drivers used only one event queue across multiple links which would be a > > bottleneck. We added code in the IBM EHCA driver to get round this > > bottleneck. > > > > Are your measurements using MPI or IP. Are you using separate > > tasks/sockets per link and using different subnets if using IP? > > > > Bernie King-Smith > > IBM Corporation > > Server Group > > Cluster System Performance > > wombat2 at us.ibm.com (845)433-8483 > > Tie. 293-8483 or wombat2 on NOTES > > > > "We are not responsible for the world we are born into, only for the > > world we leave when we die. > > So we have to accept what has gone before us and work to change the only > > thing we can, > > -- The Future." William Shatner > > > > john t" < johnt1johnt2 at gmail.com> wrote on 10/03/2006 09:42:24 AM: > > > > > > Hi, > > > > > > I have two HCA cards, each having two ports and each connected to a > > > separate PCI-E x8 slot. > > > > > > Using one HCA port I get end to end BW of 11.6 Gb/sec (uni-direction > > RDMA). > > > If I use two ports of the same HCA or different HCA, I get between 5 > > > to 6.5 Gb/sec point-to-point BW on each port. BW on each port > > > further reduces if I use more ports. I am not able to understand > > > this behaviour. Is there any limitation on max. BW that a system can > > > provide? Does the available BW get divided among multiple HCA ports > > > (which means having multiple ports will not increase the BW)? > > > > > > > > > Regards, > > > John T > > > > ------------------------------ > > _______________________________________________ > openib-general mailing listopenib-general at openib.orghttp://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > -- > ____________________________________________ > > Shannon V. Davidson > Senior Software Engineer Raytheon > 636-479-7465 office 443-383-0331 fax > ____________________________________________ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arthur.jones at qlogic.com Fri Oct 6 08:17:21 2006 From: arthur.jones at qlogic.com (Arthur Jones) Date: Fri, 6 Oct 2006 08:17:21 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: References: <20061005212327.GI2632@bauxite.pathscale.com> Message-ID: <20061006151721.GJ2632@bauxite.pathscale.com> hi roland, ... On Thu, Oct 05, 2006 at 09:18:36PM -0700, Roland Dreier wrote: > > 1) the set_multicast_list net device callback > > seems to just kick off another thread to do > > the work of registering the multicast group. > > the mc_list net_device field is only valid > > under the netif_tx_lock, but this lock is not > > grabbed by the restart_task. what happens > > if the mc_list is modified while in the > > restart_task? > > Just looking quickly, I see that ipoib_mcast_restart_task() does > netif_tx_lock() (right near the top). Isn't this sufficient? doh! i just missed it -- i predicted it would be missing, so i made it missing... > > 2) there seem to be 2 threads, the restart_task > > which creates queries and the join_task which sends > > off the mad requests. why? is there some performance > > advantage? it would seem easier to do the registrations > > serially in the restart task... > > I guess it's really that way mainly for historical reasons. I'd be > glad to see patches that simplify things (of course making sure that > everything still works ;) i'm imagining that all the "proprietary" eth interfaces + ipoib need to do about the same thing when it comes to registering with mcast groups. would you (all) be averse to pulling some of the mcast group registration code out into the core ib driver for all to use? arthur From swise at opengridcomputing.com Fri Oct 6 08:18:52 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 06 Oct 2006 10:18:52 -0500 Subject: [openib-general] [PATCH] RFC libibverbs, pass AEs to provider library. Message-ID: <1160147932.6276.8.camel@stevo-desktop> Roland, This is just an RFC patch (untested). I'm adding bypass support to a device and have a need to know whenever an async event is delivered to the consumer. It allows the bypass library to do WQ or CQ processing that needs to happen when a fatal async event happens. This async callback is similar to the cq_event callback that already exists in libibverbs. If you think this is reasonable, then I'll submit a tested patch. Steve. Index: include/infiniband/verbs.h =================================================================== --- include/infiniband/verbs.h (revision 9349) +++ include/infiniband/verbs.h (working copy) @@ -631,6 +631,7 @@ uint16_t lid); int (*detach_mcast)(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); + void (*async_event)(struct ibv_async_event *event); }; struct ibv_context { Index: src/device.c =================================================================== --- src/device.c (revision 9349) +++ src/device.c (working copy) @@ -214,6 +214,9 @@ break; } + if (context->ops.async_event) + context->ops.async_event(event); + return 0; } From halr at voltaire.com Fri Oct 6 08:26:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Oct 2006 11:26:26 -0400 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <20061006151721.GJ2632@bauxite.pathscale.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> Message-ID: <1160148385.4502.145660.camel@hal.voltaire.com> On Fri, 2006-10-06 at 11:17, Arthur Jones wrote: > hi roland, ... > > On Thu, Oct 05, 2006 at 09:18:36PM -0700, Roland Dreier wrote: > > > 1) the set_multicast_list net device callback > > > seems to just kick off another thread to do > > > the work of registering the multicast group. > > > the mc_list net_device field is only valid > > > under the netif_tx_lock, but this lock is not > > > grabbed by the restart_task. what happens > > > if the mc_list is modified while in the > > > restart_task? > > > > Just looking quickly, I see that ipoib_mcast_restart_task() does > > netif_tx_lock() (right near the top). Isn't this sufficient? > > doh! i just missed it -- i predicted it would > be missing, so i made it missing... > > > > 2) there seem to be 2 threads, the restart_task > > > which creates queries and the join_task which sends > > > off the mad requests. why? is there some performance > > > advantage? it would seem easier to do the registrations > > > serially in the restart task... > > > > I guess it's really that way mainly for historical reasons. I'd be > > glad to see patches that simplify things (of course making sure that > > everything still works ;) > > i'm imagining that all the "proprietary" eth > interfaces + ipoib need to do about the same > thing when it comes to registering with mcast > groups. would you (all) be averse to pulling some > of the mcast group registration code out into > the core ib driver for all to use? Isn't this already done with Sean's multicast work ? -- Hal > > arthur > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From adit.262 at gmail.com Fri Oct 6 08:28:56 2006 From: adit.262 at gmail.com (Adit Ranadive) Date: Fri, 6 Oct 2006 11:28:56 -0400 Subject: [openib-general] Infiniband Crossover Cable Message-ID: Im doing project in Xen+IB and wanted to connect two nodes using the IB interconnect.. I wanted to know if there is any kind of crossover cable available which allows me to connect just these 2 nodes without the use of a switch? Thanks, Adit Ranadive Georgia Institute of Technology, Atlanta, GA From halr at voltaire.com Fri Oct 6 08:35:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Oct 2006 11:35:09 -0400 Subject: [openib-general] Infiniband Crossover Cable In-Reply-To: References: Message-ID: <1160148909.4502.145967.camel@hal.voltaire.com> On Fri, 2006-10-06 at 11:28, Adit Ranadive wrote: > Im doing project in Xen+IB and wanted to connect two nodes using the > IB interconnect.. > I wanted to know if there is any kind of crossover cable available > which allows me to connect just these 2 nodes without the use of a > switch? The same (copper) cable which is used to connect HCAs to switches or switches to other switches can be used to connect HCAs to other HCAs (for a 2 node configuration). -- Hal > Thanks, > > Adit Ranadive > Georgia Institute of Technology, > Atlanta, GA > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From Brian.Cain at ge.com Fri Oct 6 08:37:06 2006 From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare)) Date: Fri, 6 Oct 2006 11:37:06 -0400 Subject: [openib-general] Infiniband Crossover Cable In-Reply-To: Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033011C34A4@CINMLVEM11.e2k.ad.ge.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Adit Ranadive > Sent: Friday, October 06, 2006 10:29 AM > To: openib-general at openib.org > Subject: [openib-general] Infiniband Crossover Cable > > Im doing project in Xen+IB and wanted to connect two nodes using the > IB interconnect.. > I wanted to know if there is any kind of crossover cable available > which allows me to connect just these 2 nodes without the use of a > switch? The standard (straight-through) cables will work fine between two HCAs, just as between an HCA and a switch. Make sure at least one is hosting an SM though. -Brian From arthur.jones at qlogic.com Fri Oct 6 08:44:23 2006 From: arthur.jones at qlogic.com (Arthur Jones) Date: Fri, 6 Oct 2006 08:44:23 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <1160148385.4502.145660.camel@hal.voltaire.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> Message-ID: <20061006154423.GK2632@bauxite.pathscale.com> hi hal, ... On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote: > > [...] > > i'm imagining that all the "proprietary" eth > > interfaces + ipoib need to do about the same > > thing when it comes to registering with mcast > > groups. would you (all) be averse to pulling some > > of the mcast group registration code out into > > the core ib driver for all to use? > > Isn't this already done with Sean's multicast work ? i didn't know about this work. do you know where i can find it? arthur From halr at voltaire.com Fri Oct 6 08:45:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Oct 2006 11:45:10 -0400 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <20061006154423.GK2632@bauxite.pathscale.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> <20061006154423.GK2632@bauxite.pathscale.com> Message-ID: <1160149509.4502.146322.camel@hal.voltaire.com> On Fri, 2006-10-06 at 11:44, Arthur Jones wrote: > hi hal, ... > > On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote: > > > [...] > > > i'm imagining that all the "proprietary" eth > > > interfaces + ipoib need to do about the same > > > thing when it comes to registering with mcast > > > groups. would you (all) be averse to pulling some > > > of the mcast group registration code out into > > > the core ib driver for all to use? > > > > Isn't this already done with Sean's multicast work ? > > i didn't know about this work. do you know where i can > find it? I think it is in svn trunk. -- Hal > > arthur From mshefty at ichips.intel.com Fri Oct 6 09:37:39 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 06 Oct 2006 09:37:39 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <1160149509.4502.146322.camel@hal.voltaire.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> <20061006154423.GK2632@bauxite.pathscale.com> <1160149509.4502.146322.camel@hal.voltaire.com> Message-ID: <45268653.20603@ichips.intel.com> Hal Rosenstock wrote: >>i didn't know about this work. do you know where i can >>find it? > > > I think it is in svn trunk. It's in svn. I've create patches against for-2.6.19, and will post that as part of a request to merge some on the features upstream. - Sean From arthur.jones at qlogic.com Fri Oct 6 09:45:39 2006 From: arthur.jones at qlogic.com (Arthur Jones) Date: Fri, 6 Oct 2006 09:45:39 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <45268653.20603@ichips.intel.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> <20061006154423.GK2632@bauxite.pathscale.com> <1160149509.4502.146322.camel@hal.voltaire.com> <45268653.20603@ichips.intel.com> Message-ID: <20061006164539.GQ2632@bauxite.pathscale.com> thanks all! i'll have a look... arthur On Fri, Oct 06, 2006 at 09:37:39AM -0700, Sean Hefty wrote: > Hal Rosenstock wrote: > >>i didn't know about this work. do you know where i can > >>find it? > > > > > >I think it is in svn trunk. > > It's in svn. I've create patches against for-2.6.19, and will post that as > part of a request to merge some on the features upstream. > > - Sean From krause at cup.hp.com Fri Oct 6 10:30:26 2006 From: krause at cup.hp.com (Michael Krause) Date: Fri, 06 Oct 2006 10:30:26 -0700 Subject: [openib-general] Multi-port HCA In-Reply-To: References: <45251A95.9010407@charter.net> Message-ID: <6.2.0.14.2.20061006102737.024f1528@esmail.cup.hp.com> Off-line someone asked me to clarify my earlier e-mail. Given this discussion continues, perhaps this might help explain the performance a bit more. The Max Payload Size quoted here is what is typically implemented on x86 chipsets though other chipsets may use a larger value. From a pure bandwidth perspective (which is not typical of many applications), this should be reasonable accurate. In any case, this is just a fyi. A x4 IB 5 GT/s is 20 Gbps raw (customers do comprehend the marketing hype does not translate into that bandwidth being available for applications - I have had to explain this to the press in the past about how raw does equal application available bandwidth). Take off 8b/10b, protocol overheads, etc. and assuming a 2KB PMTU, then one can expect to hit perhaps 14-15 Gbps per direction depending upon the workload. Let's assume an aggregate of 30 Gbps of potential application bandwidth for simplicity. The PCIe x8 2.5 GT/s is 20 Gbps raw so take off the 8b/10b, protocol overheads, control / application overheads, etc. and given it uses at most a 256B Max Payload Size on DMA Writes and cache line sized DMA Read Completions (64B) though many people use PIO Writes to avoid DMA Reads when it comes to micro-benchmarks, the actual performance is unlikely to hit what IB might drive depending upon the direction and mix of control and application data transactions. Add in the impacts on memory controller which in real-world applications is servicing the processors quite a bit more than illustrated by micro-benchmarks and the ability of a system to drive an IB x4 DDR device at link rate is very questionable. The question is whether this really matters. If you examine most workloads on various platforms, they simply cannot generate enough bandwidth to consume the external I/O bandwidth capacity. In many cases, they are constrained by the processor or the combination of the processor / memory components. This isn't a bad thing when you think about it. For many customers, it means that the attached I/O fabrics will be sufficiently provisioned to eliminate or largely mitigate the impacts of external fabric events, e.g. congestion, and deliver a reasonable solution using the existing hardware (issues of topology, use of multi-path, etc. all come into bearing as a function of fabric diameter). In the end, customers care about whether the application performs as expected and where the real bottlenecks lie. For most applications, it will come down to the processor / memory subsystems and not the I/O or external fabric. While I haven't seen all of the latest DDR micro-benchmark results, I believe the x4 IB SDR numbers largely align with what I've outlined here. Mike At 02:09 AM 10/6/2006, john t wrote: >Hi Shannon, > >The bandwidth figures that you quoted below match with my readings for >single port Mellanox DDR HCA (both for unidirection and bidirection). So >it seems dual port SDR HCA performs as good as single port DDR HCA. It >would help if you can also tell the bandwidth that you got using one port >of your dual-port SDR HCA card. Was it half the bandwidth that you stated >below, which means having two SDR ports per HCA helps. > >In my case it seems having two ports (DDR) per HCA does not increase BW, >since PCI-e x8 limit is 16 Gb/sec per direction and each of the two HCA >ports (DDR) though capable of transferring 16 Gb/sec in each direction, >when used together can not go above 16 Gb/sec. > >Regards, >John T. > > >On 10/5/06, Shannon V. Davidson ><svdavidson at charter.net> wrote: >John, > >In our testing with dual port Mellanox SDR HCAs, we found that not all >PCI-express implementations are equal. Depending on the PCIe chipset, we >measured unidirectional SDR dual-rail bandwidth ranging from 1100-1500 >MB/sec and bidirectional SDR dual-rail bandwidth ranging from 1570-2600 >MB/sec. YMMV, but had good luck with Intel and Nvidia chipsets, and less >success with the Broadcom Serverworks HT-1000 and HT-2000 chipsets. My >last report (in June 2006) was that Broadcom was working to improve their >PCI-express performance. > >Regards, >Shannon > >john t wrote: >>Hi Bernard, >> >>I had a configuration issue. I fixed it and now I get same BW (i.e. >>around 10 Gb/sec) on each port provided I use ports on different HCA >>cards. If I use two ports of the same HCA card then BW gets divided >>between these two ports. I am using Mellanox HCA cards and doing simple >>send/recv using uverbs. >> >>Do you think it could be an issue with Mallanox driver or could it be due >>to system/PCI-E limitation. >> >>Regards, >>John T. >> >> >>On 10/3/06, Bernard King-Smith >><wombat2 at us.ibm.com > wrote: >> >>John, >> >>Who's adapter (manufacturer) are you using? It is usually an adapter >>implementation or driver issue that occures when you cannot scale across >>multiple links. The fact that you don't scale up from one link, but it >>appears they share a fixed bandwidth across N links means that there is a >>driver or stack issue. At one time I think that IPoIB and maybe other IB >>drivers used only one event queue across multiple links which would be a >>bottleneck. We added code in the IBM EHCA driver to get round this bottleneck. >> >>Are your measurements using MPI or IP. Are you using separate >>tasks/sockets per link and using different subnets if using IP? >> >>Bernie King-Smith >>IBM Corporation >>Server Group >>Cluster System Performance >>wombat2 at us.ibm.com (845)433-8483 >>Tie. 293-8483 or wombat2 on NOTES >> >>"We are not responsible for the world we are born into, only for the >>world we leave when we die. >>So we have to accept what has gone before us and work to change the only >>thing we can, >>-- The Future." William Shatner >> >>john t" < johnt1johnt2 at gmail.com> wrote on >>10/03/2006 09:42:24 AM: >> >> > >> > Hi, >> > >> > I have two HCA cards, each having two ports and each connected to a >> > separate PCI-E x8 slot. >> > >> > Using one HCA port I get end to end BW of 11.6 Gb/sec (uni-direction >> RDMA). >> > If I use two ports of the same HCA or different HCA, I get between 5 >> > to 6.5 Gb/sec point-to-point BW on each port. BW on each port >> > further reduces if I use more ports. I am not able to understand >> > this behaviour. Is there any limitation on max. BW that a system can >> > provide? Does the available BW get divided among multiple HCA ports >> > (which means having multiple ports will not increase the BW)? >> > >> > >> > Regards, >> > John T >> >> >> >> >> >> >>_______________________________________________ >> >>openib-general mailing list >> >>openib-general at openib.org >> >>http://openib.org/mailman/listinfo/openib-general >> >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general > > > >-- > >____________________________________________ > > >Shannon V. Davidson > >Senior Software Engineer Raytheon > >636-479-7465 office 443-383-0331 fax > >____________________________________________ > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Oct 6 11:28:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Oct 2006 14:28:03 -0400 Subject: [openib-general] [PATCH] OpenSM: Improve handling of IB router ports Message-ID: <1160159281.30096.1142.camel@hal.voltaire.com> OpenSM: Improve handling of IB router ports Signed-off-by: Hal Rosenstock --- Index: opensm/osm_drop_mgr.c =================================================================== --- opensm/osm_drop_mgr.c (revision 9679) +++ opensm/osm_drop_mgr.c (working copy) @@ -266,10 +266,10 @@ __osm_drop_mgr_remove_port( osm_node_unlink( p_node, (uint8_t)port_num, p_remote_node, (uint8_t)remote_port_num ); - /* If the remote node is a ca - need to remove the remote port, since - it is no longer reachable. This can be done if we reset the discovery - count of the remote port. */ - if ( osm_node_get_type( p_remote_node ) == IB_NODE_TYPE_CA ) + /* If the remote node is ca or router - need to remove the remote port, + since it is no longer reachable. This can be done if we reset the + discovery count of the remote port. */ + if ( osm_node_get_type( p_remote_node ) != IB_NODE_TYPE_SWITCH ) { if ( p_remote_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ) ) { @@ -385,25 +385,6 @@ __osm_drop_mgr_remove_switch( /********************************************************************** **********************************************************************/ -static void -__osm_drop_mgr_remove_router( - IN const osm_drop_mgr_t* const p_mgr, - IN osm_node_t* p_node ) -{ - OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_remove_router ); - - UNUSED_PARAM( p_mgr ); - UNUSED_PARAM( p_node ); - - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_drop_mgr_remove_router: ERR 0106: " - "Routers are not supported\n" ); - - OSM_LOG_EXIT( p_mgr->p_log ); -} - -/********************************************************************** - **********************************************************************/ static boolean_t __osm_drop_mgr_process_node( IN const osm_drop_mgr_t* const p_mgr, @@ -454,16 +435,13 @@ __osm_drop_mgr_process_node( switch( osm_node_get_type( p_node ) ) { case IB_NODE_TYPE_CA: + case IB_NODE_TYPE_ROUTER: break; case IB_NODE_TYPE_SWITCH: __osm_drop_mgr_remove_switch( p_mgr, p_node ); break; - case IB_NODE_TYPE_ROUTER: - __osm_drop_mgr_remove_router( p_mgr, p_node ); - break; - default: osm_log( p_mgr->p_log, OSM_LOG_ERROR, "__osm_drop_mgr_process_node: ERR 0104: " Index: opensm/osm_node_info_rcv.c =================================================================== --- opensm/osm_node_info_rcv.c (revision 9679) +++ opensm/osm_node_info_rcv.c (working copy) @@ -601,6 +601,172 @@ __osm_ni_rcv_process_new_router( __osm_ni_rcv_process_new_node( p_rcv, p_node, p_madw ); + /* + A node guid of 0 is the corner case that indicates + we discovered our own node. Initialize the subnet + object with the SM's own port guid. + */ + if( osm_madw_get_ni_context_ptr( p_madw )->node_guid == 0 ) + { + p_rcv->p_subn->sm_port_guid = p_node->node_info.port_guid; + } + + OSM_LOG_EXIT( p_rcv->p_log ); +} + +/********************************************************************** + The plock must be held before calling this function. +**********************************************************************/ +static void +__osm_ni_rcv_process_existing_router( + IN const osm_ni_rcv_t* const p_rcv, + IN osm_node_t* const p_node, + IN const osm_madw_t* const p_madw ) +{ + ib_node_info_t *p_ni; + ib_smp_t *p_smp; + osm_port_t *p_port; + osm_port_t *p_port_check; + cl_qmap_t *p_guid_tbl; + osm_madw_context_t context; + uint8_t port_num; + osm_physp_t *p_physp; + ib_api_status_t status; + osm_dr_path_t *p_dr_path; + osm_bind_handle_t h_bind; + cl_status_t cl_status; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_router ); + + p_smp = osm_madw_get_smp_ptr( p_madw ); + p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp ); + port_num = ib_node_info_get_local_port_num( p_ni ); + p_guid_tbl = &p_rcv->p_subn->port_guid_tbl; + h_bind = osm_madw_get_bind_handle( p_madw ); + + /* + Determine if we have encountered this node through a + previously undiscovered port. If so, build the new + port object. + */ + p_port = (osm_port_t*)cl_qmap_get( p_guid_tbl, p_ni->port_guid ); + + if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, + "__osm_ni_rcv_process_existing_router: " + "Creating new port object with GUID = 0x%" PRIx64 "\n", + cl_ntoh64( p_ni->port_guid ) ); + + osm_node_init_physp( p_node, p_madw ); + + p_port = osm_port_new( p_ni, p_node ); + if( p_port == NULL ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_ni_rcv_process_existing_router: ERR 0D24: " + "Unable to create new port object\n" ); + goto Exit; + } + + /* + Add the new port object to the database. + */ + p_port_check = (osm_port_t*)cl_qmap_insert( p_guid_tbl, + p_ni->port_guid, &p_port->map_item ); + if( p_port_check != p_port ) + { + /* + We should never be here! + Somehow, this port GUID already exists in the table. + */ + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_ni_rcv_process_existing_router: ERR 0D22: " + "Port 0x%" PRIx64 " already in the database!\n", + cl_ntoh64( p_ni->port_guid ) ); + + osm_port_delete( &p_port ); + + goto Exit; + } + + /* If we are a master, then this means the port is new on the subnet. + Add it to the new_ports_list - need to send trap 64 on these ports. + The condition that we are master is true, since if we are in discovering + state (meaning we woke up from standby or we are just initializing), + then these ports may be new to us, but are not new on the subnet. + If we are master, then the subnet as we know it is the updated one, + and any new ports we encounter should cause trap 64. C14-72.1.1 */ + if ( p_rcv->p_subn->sm_state == IB_SMINFO_STATE_MASTER ) + { + cl_status = cl_list_insert_tail( &p_rcv->p_subn->new_ports_list, p_port ); + if( cl_status != CL_SUCCESS ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_ni_rcv_process_existing_router: ERR 0D28: " + "Error %s adding to list\n", + CL_STATUS_MSG( cl_status ) ); + osm_port_delete( &p_port ); + goto Exit; + } + else + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_ni_rcv_process_existing_router: " + "Adding port GUID:0x%016" PRIx64 " to new_ports_list\n", + cl_ntoh64(osm_node_get_node_guid( p_port->p_node )) ); + } + } + + p_physp = osm_node_get_physp_ptr( p_node, port_num ); + } + else + { + p_physp = osm_node_get_physp_ptr( p_node, port_num ); + + CL_ASSERT( p_physp ); + + if ( !osm_physp_is_valid( p_physp ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_ni_rcv_process_existing_router: ERR 0D29: " + "Invalid physical port. Aborting discovery\n"); + goto Exit; + } + + /* + Update the DR Path to the port, + in case the old one is no longer available. + */ + p_dr_path = osm_physp_get_dr_path_ptr( p_physp ); + + osm_dr_path_init( p_dr_path, h_bind, p_smp->hop_count, + p_smp->initial_path ); + } + + context.pi_context.node_guid = p_ni->node_guid; + context.pi_context.port_guid = p_ni->port_guid; + context.pi_context.set_method = FALSE; + context.pi_context.update_master_sm_base_lid = FALSE; + context.pi_context.ignore_errors = FALSE; + context.pi_context.light_sweep = FALSE; + + status = osm_req_get( p_rcv->p_gen_req, + osm_physp_get_dr_path_ptr( p_physp ), + IB_MAD_ATTR_PORT_INFO, + cl_hton32( port_num ), + CL_DISP_MSGID_NONE, + &context ); + + if( status != IB_SUCCESS ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_ni_rcv_process_existing_router: ERR 0D23: " + "Failure initiating PortInfo request (%s)\n", + ib_get_err_str(status)); + } + + Exit: OSM_LOG_EXIT( p_rcv->p_log ); } @@ -937,7 +1103,7 @@ __osm_ni_rcv_process_existing( switch( p_ni->node_type ) { case IB_NODE_TYPE_ROUTER: - /* Not supported yet. */ + __osm_ni_rcv_process_existing_router( p_rcv, p_node, p_madw ); break; case IB_NODE_TYPE_CA: Index: opensm/osm_ucast_updn.c =================================================================== --- opensm/osm_ucast_updn.c (revision 9679) +++ opensm/osm_ucast_updn.c (working copy) @@ -222,7 +222,7 @@ __updn_bfs_by_node( } else { - /* This is an HCA - need to take its remote port */ + /* This is a CA or router - need to take its remote port */ p_remote_physp = p_physp->p_remote_physp; /* make sure that the following occur: @@ -1042,7 +1042,7 @@ osm_updn_find_root_nodes_by_min_hop( cl_list_init( p_ca_list, 10 ); */ - /* Find the Maximum number of Cas for histogram normalization */ + /* Find the Maximum number of CAs (and routers) for histogram normalization */ osm_log (&(osm.log), OSM_LOG_VERBOSE, "osm_updn_find_root_nodes_by_min_hop: " "Find the number of CA and store them in cl_list\n"); @@ -1050,7 +1050,7 @@ osm_updn_find_root_nodes_by_min_hop( while( p_next_port != (osm_port_t*)cl_qmap_end( &osm.subn.port_guid_tbl ) ) { p_port = p_next_port; p_next_port = (osm_port_t*)cl_qmap_next( &p_next_port->map_item ); - if ( osm_node_get_type(p_port->p_node) == IB_NODE_TYPE_CA ) + if ( osm_node_get_type(p_port->p_node) != IB_NODE_TYPE_SWITCH ) { p_physp = osm_port_get_default_phys_ptr(p_port); self_lid_ho = cl_ntoh16( osm_physp_get_base_lid(p_physp) ); Index: opensm/osm_state_mgr.c =================================================================== --- opensm/osm_state_mgr.c (revision 9679) +++ opensm/osm_state_mgr.c (working copy) @@ -941,6 +941,7 @@ __osm_state_mgr_sweep_hop_1( switch ( osm_node_get_type( p_node ) ) { case IB_NODE_TYPE_CA: + case IB_NODE_TYPE_ROUTER: context.ni_context.node_guid = osm_node_get_node_guid( p_node ); context.ni_context.port_num = port_num; @@ -1002,8 +1003,8 @@ __osm_state_mgr_sweep_hop_1( default: osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_state_mgr_sweep_hop_1: ERR 3313: " - "Current supported node types that host SM are CA or SW only\n" ); + "__osm_state_mgr_sweep_hop_1: ERR 3313: Node type %d. " + "Current supported node types that host SM are CA, router, or SW\n", osm_node_get_type( p_node ) ); } Exit: From svdavidson at charter.net Fri Oct 6 12:40:09 2006 From: svdavidson at charter.net (Shannon V. Davidson) Date: Fri, 06 Oct 2006 14:40:09 -0500 Subject: [openib-general] Multi-port HCA In-Reply-To: References: <45251A95.9010407@charter.net> Message-ID: <4526B119.2000104@charter.net> john t wrote: > Hi Shannon, > > The bandwidth figures that you quoted below match with my readings for > single port Mellanox DDR HCA (both for unidirection and bidirection). > So it seems dual port SDR HCA performs as good as single port DDR HCA. > It would help if you can also tell the bandwidth that you got using > one port of your dual-port SDR HCA card. Was it half the bandwidth > that you stated below, which means having two SDR ports per HCA helps. For a single PCIe Mellanox SDR HCA port, most architectures we've tested provide 960-970 MB/sec unidirectional and 1800-1900 MB/sec bidirectional bandwidth using MPI. Shannon > In my case it seems having two ports (DDR) per HCA does not increase > BW, since PCI-e x8 limit is 16 Gb/sec per direction and each of the > two HCA ports (DDR) though capable of transferring 16 Gb/sec in each > direction, when used together can not go above 16 Gb/sec. > > Regards, > John T. > > > On 10/5/06, *Shannon V. Davidson* > wrote: > > John, > > In our testing with dual port Mellanox SDR HCAs, we found that not > all PCI-express implementations are equal. Depending on the PCIe > chipset, we measured unidirectional SDR dual-rail bandwidth > ranging from 1100-1500 MB/sec and bidirectional SDR dual-rail > bandwidth ranging from 1570-2600 MB/sec. YMMV, but had good luck > with Intel and Nvidia chipsets, and less success with the Broadcom > Serverworks HT-1000 and HT-2000 chipsets. My last report (in June > 2006) was that Broadcom was working to improve their PCI-express > performance. > > Regards, > Shannon > > john t wrote: >> Hi Bernard, >> >> I had a configuration issue. I fixed it and now I get same BW >> (i.e. around 10 Gb/sec) on each port provided I use ports on >> different HCA cards. If I use two ports of the same HCA card then >> BW gets divided between these two ports. I am using Mellanox HCA >> cards and doing simple send/recv using uverbs. >> >> Do you think it could be an issue with Mallanox driver or could >> it be due to system/PCI-E limitation. >> >> Regards, >> John T. >> >> >> On 10/3/06, *Bernard King-Smith* > > wrote: >> >> >> John, >> >> Who's adapter (manufacturer) are you using? It is usually an >> adapter implementation or driver issue that occures when you >> cannot scale across multiple links. The fact that you don't >> scale up from one link, but it appears they share a fixed >> bandwidth across N links means that there is a driver or >> stack issue. At one time I think that IPoIB and maybe other >> IB drivers used only one event queue across multiple links >> which would be a bottleneck. We added code in the IBM EHCA >> driver to get round this bottleneck. >> >> Are your measurements using MPI or IP. Are you using separate >> tasks/sockets per link and using different subnets if using IP? >> >> Bernie King-Smith >> IBM Corporation >> Server Group >> Cluster System Performance >> wombat2 at us.ibm.com (845)433-8483 >> Tie. 293-8483 or wombat2 on NOTES >> >> "We are not responsible for the world we are born into, only >> for the world we leave when we die. >> So we have to accept what has gone before us and work to >> change the only thing we can, >> -- The Future." William Shatner >> >> john t" < johnt1johnt2 at gmail.com >> > wrote on 10/03/2006 09:42:24 >> AM: >> >> > >> > Hi, >> > >> > I have two HCA cards, each having two ports and each >> connected to a >> > separate PCI-E x8 slot. >> > >> > Using one HCA port I get end to end BW of 11.6 Gb/sec >> (uni-direction RDMA). >> > If I use two ports of the same HCA or different HCA, I get >> between 5 >> > to 6.5 Gb/sec point-to-point BW on each port. BW on each port >> > further reduces if I use more ports. I am not able to >> understand >> > this behaviour. Is there any limitation on max. BW that a >> system can >> > provide? Does the available BW get divided among multiple >> HCA ports >> > (which means having multiple ports will not increase the BW)? >> > >> > >> > Regards, >> > John T >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > -- > ____________________________________________ > > Shannon V. Davidson > Senior Software Engineer Raytheon > 636-479-7465 office 443-383-0331 fax > ____________________________________________ > > > > -- ____________________________________________ Shannon V. Davidson Senior Software Engineer Raytheon 636-479-7465 office 443-383-0331 fax ____________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From arthur.jones at qlogic.com Fri Oct 6 12:47:46 2006 From: arthur.jones at qlogic.com (Arthur Jones) Date: Fri, 6 Oct 2006 12:47:46 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <1160148385.4502.145660.camel@hal.voltaire.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> Message-ID: <20061006194746.GA25553@bauxite.pathscale.com> hi hal, ... On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote: > > [...] > > i'm imagining that all the "proprietary" eth > > interfaces + ipoib need to do about the same > > thing when it comes to registering with mcast > > groups. would you (all) be averse to pulling some > > of the mcast group registration code out into > > the core ib driver for all to use? > > Isn't this already done with Sean's multicast work ? after reading the code, iiuc, sean's work provides nice infrastructure for ib_multicast group join/leave. i was thinking about one more level up, i.e. generic _net_ multicast join/leave infrastructure. i'm not sure exactly how it would go -- but i think all the ib net_devices are going to need a way to associate a multicast hw addr w/ a live mgid. if that could be broken out, we could all share it... arthur From halr at voltaire.com Fri Oct 6 13:09:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Oct 2006 16:09:05 -0400 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <20061006194746.GA25553@bauxite.pathscale.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> <20061006194746.GA25553@bauxite.pathscale.com> Message-ID: <1160165345.30096.4928.camel@hal.voltaire.com> On Fri, 2006-10-06 at 15:47, Arthur Jones wrote: > hi hal, ... > > On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote: > > > [...] > > > i'm imagining that all the "proprietary" eth > > > interfaces + ipoib need to do about the same > > > thing when it comes to registering with mcast > > > groups. would you (all) be averse to pulling some > > > of the mcast group registration code out into > > > the core ib driver for all to use? > > > > Isn't this already done with Sean's multicast work ? > > after reading the code, iiuc, sean's work provides > nice infrastructure for ib_multicast group join/leave. > i was thinking about one more level up, i.e. generic > _net_ multicast join/leave infrastructure. i'm not > sure exactly how it would go -- but i think all the ib > net_devices are going to need a way to associate a > multicast hw addr w/ a live mgid. Don't IPmc addresses translate to MGIDs per the RFC ? MGIDs are not hardware addresses (MLIDs are). -- Hal > if that could be broken out, we could all share it... > arthur From arthur.jones at qlogic.com Fri Oct 6 13:34:55 2006 From: arthur.jones at qlogic.com (Arthur Jones) Date: Fri, 6 Oct 2006 13:34:55 -0700 Subject: [openib-general] ipoib mcast questions... In-Reply-To: <1160165345.30096.4928.camel@hal.voltaire.com> References: <20061005212327.GI2632@bauxite.pathscale.com> <20061006151721.GJ2632@bauxite.pathscale.com> <1160148385.4502.145660.camel@hal.voltaire.com> <20061006194746.GA25553@bauxite.pathscale.com> <1160165345.30096.4928.camel@hal.voltaire.com> Message-ID: <20061006203455.GB25553@bauxite.pathscale.com> hi hal, ... On Fri, Oct 06, 2006 at 04:09:05PM -0400, Hal Rosenstock wrote: > On Fri, 2006-10-06 at 15:47, Arthur Jones wrote: > > hi hal, ... > > > > On Fri, Oct 06, 2006 at 11:26:26AM -0400, Hal Rosenstock wrote: > > > > [...] > > > > i'm imagining that all the "proprietary" eth > > > > interfaces + ipoib need to do about the same > > > > thing when it comes to registering with mcast > > > > groups. would you (all) be averse to pulling some > > > > of the mcast group registration code out into > > > > the core ib driver for all to use? > > > > > > Isn't this already done with Sean's multicast work ? > > > > after reading the code, iiuc, sean's work provides > > nice infrastructure for ib_multicast group join/leave. > > i was thinking about one more level up, i.e. generic > > _net_ multicast join/leave infrastructure. i'm not > > sure exactly how it would go -- but i think all the ib > > net_devices are going to need a way to associate a > > multicast hw addr w/ a live mgid. > > Don't IPmc addresses translate to MGIDs per the RFC ? that's a different problem than the one i'm trying to address. i think you're talking about mapping ip mcast addresses to "hardware" addresses. rfc4391 tells ipoib how to do that, for the virtual ethernet devices, we'll need to come up w/ something different... > MGIDs are not hardware addresses (MLIDs are). mgids are generated from the mc_list->dmi_addr. this is a "hardware" address to the linux net code. i'm looking for commonality to reduce duplicated code. we all (ipoib + virtual eth) need to associate mgids, however we got them, with the mlids (i think). i'm guessing we'll do it in a very similar way... arthur From or.gerlitz at gmail.com Fri Oct 6 14:11:20 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Fri, 6 Oct 2006 23:11:20 +0200 Subject: [openib-general] [PATCH] 2.6.19 rdma_cm: add rdma_establish call to connect if RTU is lost In-Reply-To: <45254399.8090703@ichips.intel.com> References: <000001c6e89f$2ae347a0$8698070a@amr.corp.intel.com> <45254399.8090703@ichips.intel.com> Message-ID: <15ddcffd0610061411q1ea5bf81t7d407639d81ee089@mail.gmail.com> On 10/5/06, Sean Hefty wrote: > > I'm confused -- how does this fix anything? I don't see any callers > > of the new rdma_establish() function ?? > > This was submitted at the request of Michael and Or, so I'll let them comment. > There are no in tree passive side users of the rdma_cm, so passive side calls > are unused (rdma_listen, rdma_accept, rdma_establish). > > I have no issues deferring this patch until a user is added (which I hope will > be 2.6.20). My understanding is that the only in tree usage of the passive side calls in 2.6.20 would be the rdma cm user space support. Generally speaking, I have no problems deferring this to 2.6.20, however, the problem is that this API is exposed by the OFED's CMA. Hence there is a kernel API diff between OFED to the kernel IB code. This puts the developers of ULPs who are CMA consumers (iSER target, RDS, Lustre, NFSoRDMA, SDP) in a problem, since they can't have the same code base for OFED and the kernel. My take on that as i expressed it to Sean over the "rdma_cm" thread, is that basically, either the IB kernel maintainers (Roland and Sean) decide to push it into 2.6.19 (eg under the justification that rdma_listen and rdma_accept are already there) or demand taking it out from OFED 1.1 as it violates the "don't put a future kernel IB code in second place relative to OFED" guideline. Or. From swise at opengridcomputing.com Fri Oct 6 15:29:34 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 06 Oct 2006 17:29:34 -0500 Subject: [openib-general] [PATCH] RFC libibverbs - Pass provider data through ibv_cmd_req_notify_cq() Message-ID: <1160173774.4324.7.camel@stevo-desktop> Roland, Here is another change I need to support kernel bypass in the driver I'm working on. For this device, the req_notify_cq() operation cannot be bypassed. Further, the lib needs to pass some info down to the kernel verb to correctly implement re-arm. This patch enables passing provider-specific data down to the kernel verb. A kernel patch is also needed, that I will submit as a separate RFC patch. There are no dependencies build-wise between the user and kernel patches by the way. You just will need both patches to get provider data passed down to your kernel verb. Comments? Steve. ------------------------- Pass provider-specific data down in ibv_cmd_req_notify_cq(). From: Steve Wise The Chelsio iwarp provider library needs to pass information to the kernel verb for re-arming the CQ. Signed-off-by: Steve Wise --- libibverbs/include/infiniband/driver.h | 5 ++++- libibverbs/src/cmd.c | 12 ++++++------ 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/libibverbs/include/infiniband/driver.h b/libibverbs/include/infiniband/driver.h index 45bd1b0..279485b 100644 --- a/libibverbs/include/infiniband/driver.h +++ b/libibverbs/include/infiniband/driver.h @@ -95,7 +95,10 @@ int ibv_cmd_create_cq(struct ibv_context struct ibv_create_cq *cmd, size_t cmd_size, struct ibv_create_cq_resp *resp, size_t resp_size); int ibv_cmd_poll_cq(struct ibv_cq *cq, int ne, struct ibv_wc *wc); -int ibv_cmd_req_notify_cq(struct ibv_cq *cq, int solicited_only); +#define IBV_CMD_REQ_NOTIFY_HAS_CMD_DATA +int ibv_cmd_req_notify_cq(struct ibv_cq *cq, int solicited_only, + struct ibv_req_notify_cq *cmd, size_t cmd_size); + #define IBV_CMD_RESIZE_CQ_HAS_RESP_PARAMS int ibv_cmd_resize_cq(struct ibv_cq *cq, int cqe, struct ibv_resize_cq *cmd, size_t cmd_size, diff --git a/libibverbs/src/cmd.c b/libibverbs/src/cmd.c index 8dbdfe8..15eccdc 100644 --- a/libibverbs/src/cmd.c +++ b/libibverbs/src/cmd.c @@ -372,15 +372,15 @@ out: return ret; } -int ibv_cmd_req_notify_cq(struct ibv_cq *ibcq, int solicited_only) +int ibv_cmd_req_notify_cq(struct ibv_cq *ibcq, int solicited_only, + struct ibv_req_notify_cq *cmd, size_t cmd_size) { - struct ibv_req_notify_cq cmd; - IBV_INIT_CMD(&cmd, sizeof cmd, REQ_NOTIFY_CQ); - cmd.cq_handle = ibcq->handle; - cmd.solicited = !!solicited_only; + IBV_INIT_CMD(cmd, cmd_size, REQ_NOTIFY_CQ); + cmd->cq_handle = ibcq->handle; + cmd->solicited = !!solicited_only; - if (write(ibcq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) + if (write(ibcq->context->cmd_fd, cmd, cmd_size) != cmd_size) return errno; return 0; From swise at opengridcomputing.com Fri Oct 6 15:40:14 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 06 Oct 2006 17:40:14 -0500 Subject: [openib-general] [PATCH] RFC libibverbs - Pass provider data through ibv_cmd_req_notify_cq() In-Reply-To: <1160173774.4324.7.camel@stevo-desktop> References: <1160173774.4324.7.camel@stevo-desktop> Message-ID: <1160174414.4324.12.camel@stevo-desktop> On Fri, 2006-10-06 at 17:29 -0500, Steve Wise wrote: > Roland, > > Here is another change I need to support kernel bypass in the driver I'm > working on. For this device, the req_notify_cq() operation cannot be > bypassed. Further, the lib needs to pass some info down to the kernel > verb to correctly implement re-arm. > > This patch enables passing provider-specific data down to the kernel > verb. A kernel patch is also needed, that I will submit as a separate > RFC patch. There are no dependencies build-wise between the user and > kernel patches by the way. You just will need both patches to get > provider data passed down to your kernel verb. > Here is the kernel patch for review. Note I fixed all the devices to use the new req_notify_cq() signature... Steve. ----- Support provider-specific data in ib_uverbs_cmd_req_notify_cq(). From: Steve Wise The Chelsio iwarp provider library needs to pass information to the kernel verb for re-arming the CQ. Signed-off-by: Steve Wise --- drivers/infiniband/core/uverbs_cmd.c | 9 +++++++-- drivers/infiniband/hw/amso1100/c2.h | 2 +- drivers/infiniband/hw/amso1100/c2_cq.c | 3 ++- drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 ++- drivers/infiniband/hw/ehca/ehca_reqs.c | 3 ++- drivers/infiniband/hw/ipath/ipath_cq.c | 4 +++- drivers/infiniband/hw/ipath/ipath_verbs.h | 3 ++- drivers/infiniband/hw/mthca/mthca_cq.c | 6 ++++-- drivers/infiniband/hw/mthca/mthca_dev.h | 4 ++-- include/rdma/ib_verbs.h | 5 +++-- 10 files changed, 28 insertions(+), 14 deletions(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index b72c7f6..06cba8b 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -959,6 +959,7 @@ ssize_t ib_uverbs_req_notify_cq(struct i int out_len) { struct ib_uverbs_req_notify_cq cmd; + struct ib_udata udata; struct ib_cq *cq; if (copy_from_user(&cmd, buf, sizeof cmd)) @@ -968,8 +969,12 @@ ssize_t ib_uverbs_req_notify_cq(struct i if (!cq) return -EINVAL; - ib_req_notify_cq(cq, cmd.solicited_only ? - IB_CQ_SOLICITED : IB_CQ_NEXT_COMP); + INIT_UDATA(&udata, buf + sizeof cmd, 0, + in_len - sizeof cmd, 0); + + cq->device->req_notify_cq(cq, cmd.solicited_only ? + IB_CQ_SOLICITED : IB_CQ_NEXT_COMP, + &udata); put_cq_read(cq); diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h index 1b17dcd..716f9dc 100644 --- a/drivers/infiniband/hw/amso1100/c2.h +++ b/drivers/infiniband/hw/amso1100/c2.h @@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2 extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index); extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index); extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, struct ib_udata *udata); /* CM */ extern int c2_llp_connect(struct iw_cm_id *cm_id, diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 9d7bcc5..c99ae20 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -217,7 +217,8 @@ int c2_poll_cq(struct ib_cq *ibcq, int n return npolled; } -int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + struct ib_udata *udata) { struct c2_mq_shared __iomem *shared; struct c2_cq *cq; diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 319c39d..8933382 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -135,7 +135,8 @@ int ehca_poll_cq(struct ib_cq *cq, int n int ehca_peek_cq(struct ib_cq *cq, int wc_cnt); -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify); +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify, + struct ib_udata *udata); struct ib_qp *ehca_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index b46bda1..3ed6992 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -634,7 +634,8 @@ poll_cq_exit0: return ret; } -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify, + struct ib_udata *udata) { struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 87462e0..27ba4db 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -307,13 +307,15 @@ int ipath_destroy_cq(struct ib_cq *ibcq) * ipath_req_notify_cq - change the notification type for a completion queue * @ibcq: the completion queue * @notify: the type of notification to request + * @udata: user data * * Returns 0 for success. * * This may be called from interrupt context. Also called by * ib_req_notify_cq() in the generic verbs code. */ -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + struct ib_udata *udata) { struct ipath_cq *cq = to_icq(ibcq); unsigned long flags; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 8039f6e..0d39960 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -716,7 +716,8 @@ struct ib_cq *ipath_create_cq(struct ib_ int ipath_destroy_cq(struct ib_cq *ibcq); -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + struct ib_udata *udata); int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index e393681..cfd69a8 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -716,7 +716,8 @@ repoll: return err == 0 || err == -EAGAIN ? npolled : err; } -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify) +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, + struct ib_udata *udata) { __be32 doorbell[2]; @@ -733,7 +734,8 @@ int mthca_tavor_arm_cq(struct ib_cq *cq, return 0; } -int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + struct ib_udata *udata) { struct mthca_cq *cq = to_mcq(ibcq); __be32 doorbell[2]; diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index fe5cecf..6b9ccf6 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -493,8 +493,8 @@ void mthca_unmap_eq_icm(struct mthca_dev int mthca_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify); -int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify); +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata); +int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata); int mthca_init_cq(struct mthca_dev *dev, int nent, struct mthca_ucontext *ctx, u32 pdn, struct mthca_cq *cq); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8eacc35..e3e1a2c 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -941,7 +941,8 @@ struct ib_device { struct ib_wc *wc); int (*peek_cq)(struct ib_cq *cq, int wc_cnt); int (*req_notify_cq)(struct ib_cq *cq, - enum ib_cq_notify cq_notify); + enum ib_cq_notify cq_notify, + struct ib_udata *udata); int (*req_ncomp_notif)(struct ib_cq *cq, int wc_cnt); struct ib_mr * (*get_dma_mr)(struct ib_pd *pd, @@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_ static inline int ib_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) { - return cq->device->req_notify_cq(cq, cq_notify); + return cq->device->req_notify_cq(cq, cq_notify, NULL); } /** From johann.george at qlogic.com Fri Oct 6 21:41:56 2006 From: johann.george at qlogic.com (Johann George) Date: Fri, 6 Oct 2006 21:41:56 -0700 Subject: [openib-general] new server up and running Message-ID: <20061007044156.GA11282@cuprite.pathscale.com> As many of you know, we have been in the process of finding a hosting company that could host the OpenFabrics site along with the SVN/git/? repository, wiki pages, etc. This has been done and the server is up and running. It is a dedicated server hosted by johncompanies.com, has dual-core 3.4GHz Pentium D processors and is running Ubuntu Dapper. The plan is to keep both servers up and running migrating a portion at at time until we feel comfortable enough with the new server. If anyone is interested in helping to set up the server, please contact me or Matt Leininger who has been copied on this email. Thanks. Johann From adit.262 at gmail.com Sat Oct 7 18:16:37 2006 From: adit.262 at gmail.com (Adit Ranadive) Date: Sat, 7 Oct 2006 21:16:37 -0400 Subject: [openib-general] Xen build for IB Message-ID: Hi, I wanted to know if any existing build of Xen needs to be compiled again if Infiniband support needs to be incorporated in it? Thanks, Adit Ranadive Georgia Institute of Technology, Atlanta, GA From sean.hefty at intel.com Sat Oct 7 23:20:15 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sat, 7 Oct 2006 23:20:15 -0700 Subject: [openib-general] libmthca: build error with svn 9735 Message-ID: <000001c6eaa1$d23b3e50$79d8180a@amr.corp.intel.com> I get a build error about an implicit declaration of VALGRIND_MAKE_MEM_UNDEFINED in cq.c. Adding a wrapper similar to what's done for VALGRIND_MAKE_MEM_DEFINED fixed the warning. - Sean From mst at mellanox.co.il Sun Oct 8 00:23:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 8 Oct 2006 09:23:46 +0200 Subject: [openib-general] [PATCH] 2.6.19 rdma_cm: add rdma_establish call to connect if RTU is lost In-Reply-To: References: <000001c6e89f$2ae347a0$8698070a@amr.corp.intel.com> Message-ID: <20061008072345.GA25179@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] 2.6.19 rdma_cm: add rdma_establish call to connect if RTU is lost > > > Please consider for 2.6.19, since it does allow a connection to be made > > when data is received on a QP, but the RTU is lost. This problem has been > > reported on the OFA OFED releases. > > I'm confused -- how does this fix anything? I don't see any callers > of the new rdma_establish() function ?? This is required to complete the passive side support in CMA. Since there are no in-tree passive side users yet, this is strictly to make life easier for out-of-tree users. -- MST From bugzilla-daemon at openib.org Sun Oct 8 02:58:18 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sun, 8 Oct 2006 02:58:18 -0700 (PDT) Subject: [openib-general] [Bug 272] New: IPoIB: kernel Oops as a result of interface Up/Down Message-ID: <20061008095818.245CA2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=272 Summary: IPoIB: kernel Oops as a result of interface Up/Down Product: OpenFabrics Linux Version: 1.1rc7 Platform: X86-64 OS/Version: SLES 10 Status: NEW Severity: normal Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: vlad at mellanox.co.il Setup: Two nodes (node1 and node2) connected to the IB switch with both IB ports. To reproduce: IPoIB High Availability service is available on node2. '/etc/init.d/opensmd restart' executed on node1 in infinite loop. Then after ~ 6 hours the following kernel Oops received on node2: kernel: ib0: dev_queue_xmit failed to requeue packet kernel: NMI Watchdog detected LOCKUP on CPU 0 kernel: CPU 0 kernel: Modules linked in: mst_pciconf mst_pci rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core autofs4 ipv6 nfs lockd nfs_acl sunrpc af_packet button battery ac apparmor aamatch_pcre loop lug uhci_hcd ehci_hcd i2c_i801 i2c_core hw_random i8xx_tco tg3 usbcore ext3 jbd edd fan thermal processor mptspi mptscsih mptbase scsi_transport_spi sg sr_mod cdrom ata_piix libata sd_mod scsi_mod kernel: Pid: 7307, comm: ib_mad2 Tainted: GU 2.6.16.21-0.8-smp #1 kernel: RIP: 0010:[] {.text.lock.spinlock+34} kernel: RSP: 0018:ffff81011683dbf0 EFLAGS: 00000086 kernel: RAX: 0000000000000092 RBX: ffff8100c8ede148 RCX: ffffffff883544ee kernel: RDX: ffff8100c8ede0c0 RSI: 0000000000000000 RDI: ffff8100c8ede150 kernel: RBP: ffff81011683dc18 R08: ffff810113ade640 R09: ffff810113d16460 kernel: R10: 000000004523d9d2 R11: 000000000000206d R12: ffff8100c8ede150 kernel: R13: ffff81011683dc78 R14: ffff810121fce500 R15: 0000000000000286 kernel: FS: 0000000000000000(0000) GS:ffffffff80445000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b kernel: CR2: 00000000005e6288 CR3: 00000000ce84a000 CR4: 00000000000006e0 kernel: Process ib_mad2 (pid: 7307, threadinfo ffff81011683c000, task ffff81012217e7d0) kernel: Stack: ffffffff80129a15 ffff810121fce000 0000000000000000 ffff8100c8ede0c0 kernel: ffff81011683dc78 00000000fffffffc ffffffff8835e593 ffff810121fce000 kernel: ffff810121fce000 0000000000000000 kernel: Call Trace: {complete+28} {:ib_ipoib:path_rec_completion+764} kernel: {dev_queue_xmit+545} {:ib_ipoib:path_rec_completion+848} kernel: {:ib_sa:ib_sa_path_rec_callback+64} kernel: {lock_timer_base+27} {try_to_del_timer_sync+81} kernel: {:ib_sa:send_handler+72} {:ib_mad:ib_mad_complete_send_wr+421} kernel: {:ib_mad:ib_mad_completion_handler+947} kernel: {:ib_mad:ib_mad_completion_handler+0} kernel: {run_workqueue+153} {worker_thread+0} kernel: {keventd_create_kthread+0} {worker_thread+265} kernel: {__wake_up_common+62} {default_wake_function+0} kernel: {keventd_create_kthread+0} {kthread+236} kernel: {child_rip+8} {keventd_create_kthread+0} kernel: {kthread+0} {child_rip+0} kernel: kernel: Code: 83 3f 00 7e f9 e9 c2 fe ff ff f3 90 83 3f 00 7e f9 e9 c1 fe kernel: console shuts up ... kernel: NMI Watchdog detected LOCKUP on CPU 2 kernel: CPU 2 kernel: Modules linked in: mst_pciconf mst_pci rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core autofs4 ipv6 nfs lockd nfs_acl sunrpc af_packet button battery ac apparmor aamatch_pcre loop lug uhci_hcd ehci_hcd i2c_i801 i2c_core hw_random i8xx_tco tg3 usbcore ext3 jbd edd fan thermal processor mptspi mptscsih mptbase scsi_transport_spi sg sr_mod cdrom ata_piix libata sd_mod scsi_mod kernel: Pid: 7336, comm: ipoib Tainted: G U 2.6.16.21-0.8-smp #1 kernel: RIP: 0010:[] {.text.lock.spinlock+49} kernel: RSP: 0018:ffff810113acfd80 EFLAGS: 00000086 kernel: RAX: 0000000000000000 RBX: ffff810121fce000 RCX: 0000000000000000 kernel: RDX: 0000000000000002 RSI: ffff810121fce5e0 RDI: ffff810121fce500 kernel: RBP: ffff810121fce500 R08: 0000000000000000 R09: ffff810121fce000 kernel: R10: ffff8101235add9f R11: 0000000000000286 R12: ffff8100c8ede0c0 kernel: R13: 0000000000000000 R14: ffff810121fce500 R15: ffff810121fce000 kernel: FS: 0000000000000000(0000) GS:ffff810123e2b3c0(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b kernel: CR2: 0000000000591cf0 CR3: 00000000c0e01000 CR4: 00000000000006e0 kernel: Process ipoib (pid: 7336, threadinfo ffff810113ace000, task ffff810123042040) kernel: Stack: ffffffff88361212 ffff810100e18780 ffff810113acfd60 ffff810113acfd60 kernel: ffffffff88361807 0000000000000000 0000000000000246 0000ffff1b4012ff kernel: 0100000000000000 ffff810113acfdc8 kernel: Call Trace: {:ib_ipoib:ipoib_mcast_start_thread+109} kernel: {:ib_ipoib:ipoib_mcast_restart_task+965} kernel: {:ib_ipoib:ipoib_ib_dev_flush+0} {:ib_ipoib:ipoib_ib_dev_flush+158} kernel: {run_workqueue+153} {worker_thread+0} kernel: {keventd_create_kthread+0} {worker_thread+265} kernel: {__wake_up_common+62} {default_wake_function+0} kernel: {keventd_create_kthread+0} {keventd_create_kthread+0} kernel: {kthread+236} {child_rip+8} kernel: {keventd_create_kthread+0} {kthread+0} kernel: {child_rip+0} kernel: kernel: Code: 7e f9 e9 c1 fe ff ff f3 90 83 3f 00 7e f9 e9 d2 fe ff ff e8 kernel: console shuts up ... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kliteyn at dev.mellanox.co.il Sun Oct 8 08:14:04 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 08 Oct 2006 17:14:04 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <20061003221638.GZ10617@sashak.voltaire.com> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> <20061003221638.GZ10617@sashak.voltaire.com> Message-ID: <452915BC.6060103@dev.mellanox.co.il> Hi Sasha [snip] >> --- opensm/osm_ucast_file.c (revision 9502) >> +++ opensm/osm_ucast_file.c (working copy) >> @@ -52,18 +52,11 @@ >> >> #include >> #include >> +#include > > Why this? This is where PRIx64 is defined in Windows. [snip] >> #include >> #include >> #include >> >> -#define PARSEERR(log, file_name, lineno, fmt, arg...) \ >> - osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \ >> - file_name, lineno, ##arg ) >> - >> -#define PARSEWARN(log, file_name, lineno, fmt, arg...) \ >> - osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \ >> - file_name, lineno, ##arg ) >> - > > Is it possible to use C99 style var args macros (with __VA_ARGS___)? MS > claims it is supported by VC. And it is supported by gcc too. Indeed, the C99 style var arg macros are supported by VC6 and by gcc, but not by WinDDK (at least by the version that we're using - 4.23). [snip] >> @@ -72,10 +65,11 @@ static uint16_t remap_lid(osm_opensm_t * >> >> p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid); >> if (!p_port || >> - p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) { >> + p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) >> + { > > Please don't break existing code formatting. I will issue a new patch shortly w/o these cosmetic changes. -- Yevgeny. From mst at mellanox.co.il Sun Oct 8 08:25:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 8 Oct 2006 17:25:34 +0200 Subject: [openib-general] new server up and running In-Reply-To: <20061007044156.GA11282@cuprite.pathscale.com> References: <20061007044156.GA11282@cuprite.pathscale.com> Message-ID: <20061008152534.GD29668@mellanox.co.il> Quoting r. Johann George : > Subject: new server up and running > > As many of you know, we have been in the process of finding a hosting > company that could host the OpenFabrics site along with the SVN/git/? > repository, wiki pages, etc. This has been done and the server is up and > running. It is a dedicated server hosted by johncompanies.com, has > dual-core 3.4GHz Pentium D processors and is running Ubuntu Dapper. The > plan is to keep both servers up and running migrating a portion at at time > until we feel comfortable enough with the new server. > > If anyone is interested in helping to set up the server, please contact me > or Matt Leininger who has been copied on this email. Thanks. > > Johann Not sure what kind of help is needed - I've set up the git repository at mellanox here, so I might be able to help out if you have problems with that. It's actually pretty straight-forward though - the main trick to know is to have a central tree where you clone Linus' tree from time to time. Other trees can then use that one as an alternate object repository and save a lot of disk space. -- MST From kliteyn at dev.mellanox.co.il Sun Oct 8 08:42:48 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 08 Oct 2006 17:42:48 +0200 Subject: [openib-general] [PATCHv2 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c Message-ID: Hi Hal This is the re-submission of the patch that was originally sibmitted by Eitan - just removing some cosmetic changes from the patch and re-diffing it with the trunk: 1. Avoid varargs macros not supported by Windows 2. Included additional header for PRIx64 macro Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_ucast_file.c =================================================================== --- opensm/osm_ucast_file.c (revision 9738) +++ opensm/osm_ucast_file.c (working copy) @@ -52,18 +52,11 @@ #include #include +#include #include #include #include -#define PARSEERR(log, file_name, lineno, fmt, arg...) \ - osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \ - file_name, lineno, ##arg ) - -#define PARSEWARN(log, file_name, lineno, fmt, arg...) \ - osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \ - file_name, lineno, ##arg ) - static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid) { osm_port_t *p_port; @@ -183,16 +176,17 @@ static int do_ucast_file_load(void *cont } else if (!strncmp(p, "Unicast lids", 12)) { q = strstr(p, " guid 0x"); if (!q) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse switch definition\n"); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + " cannot parse switch definition\n", + file_name, lineno); return -1; } p = q + 6; sw_guid = strtoull(p, &q, 16); if (q && !isspace(*q)) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse switch guid: \'%s\'\n", - p); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + "cannot parse switch guid: \'%s\'\n", + file_name, lineno, p); return -1; } sw_guid = cl_hton64(sw_guid); @@ -211,8 +205,8 @@ static int do_ucast_file_load(void *cont } else if (p_sw && !strncmp(p, "0x", 2)) { lid = (uint16_t)strtoul(p, &q, 16); if (q && !isspace(*q)) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse lid: \'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + "cannot parse lid: \'%s\'\n", file_name, lineno, p); return -1; } p = q; @@ -220,17 +214,17 @@ static int do_ucast_file_load(void *cont p++; port_num = (uint8_t)strtoul(p, &q, 10); if (q && !isspace(*q)) { - PARSEERR(&p_osm->log, file_name, lineno, - "cannot parse port: \'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" + "cannot parse port: \'%s\'\n", file_name, lineno, p); return -1; } p = q; /* additionally try to exract guid */ q = strstr(p, " portguid 0x"); if (!q) { - PARSEWARN(&p_osm->log, file_name, lineno, - "cannot find port guid " - "(maybe broken dump): \'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" + "cannot find port guid " + "(maybe broken dump): \'%s\'\n", file_name, lineno, p); port_guid = 0; } else @@ -238,11 +232,10 @@ static int do_ucast_file_load(void *cont p = q + 10; port_guid = strtoull(p, &q, 16); if (!q && !isspace(*q) && *q != ':') { - PARSEWARN(&p_osm->log, file_name, - lineno, - "cannot parse port guid " - "(maybe broken dump): " - "\'%s\'\n", p); + osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" + "cannot parse port guid " + "(maybe broken dump): " + "\'%s\'\n", file_name, lineno, p); port_guid = 0; } } From sashak at voltaire.com Sun Oct 8 09:03:04 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 8 Oct 2006 18:03:04 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <452915BC.6060103@dev.mellanox.co.il> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> <20061003221638.GZ10617@sashak.voltaire.com> <452915BC.6060103@dev.mellanox.co.il> Message-ID: <20061008160304.GA6216@sashak.voltaire.com> On 17:14 Sun 08 Oct , Yevgeny Kliteynik wrote: > Hi Sasha > > [snip] > >> --- opensm/osm_ucast_file.c (revision 9502) > >> +++ opensm/osm_ucast_file.c (working copy) > >> @@ -52,18 +52,11 @@ > >> > >> #include > >> #include > >> +#include > > > > Why this? > > This is where PRIx64 is defined in Windows. I see. > > [snip] > >> #include > >> #include > >> #include > >> > >> -#define PARSEERR(log, file_name, lineno, fmt, arg...) \ > >> - osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \ > >> - file_name, lineno, ##arg ) > >> - > >> -#define PARSEWARN(log, file_name, lineno, fmt, arg...) \ > >> - osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \ > >> - file_name, lineno, ##arg ) > >> - > > > > Is it possible to use C99 style var args macros (with __VA_ARGS___)? MS > > claims it is supported by VC. And it is supported by gcc too. > > Indeed, the C99 style var arg macros are supported by VC6 and by gcc, > but not by WinDDK (at least by the version that we're using - 4.23). Any chance to upgrade to VC6? At least in a future? Sasha > > [snip] > >> @@ -72,10 +65,11 @@ static uint16_t remap_lid(osm_opensm_t * > >> > >> p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid); > >> if (!p_port || > >> - p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) { > >> + p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) > >> + { > > > > Please don't break existing code formatting. > > I will issue a new patch shortly w/o these cosmetic changes. > > -- > Yevgeny. > From sashak at voltaire.com Sun Oct 8 11:05:16 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 8 Oct 2006 20:05:16 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: make some local functions static Message-ID: <20061008180516.GC18411@sashak.voltaire.com> This makes some local functions static in osm_mcast_mgr.c. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_mcast_mgr.h | 44 ------------------------------------ osm/opensm/osm_mcast_mgr.c | 12 +++++----- 2 files changed, 6 insertions(+), 50 deletions(-) diff --git a/osm/include/opensm/osm_mcast_mgr.h b/osm/include/opensm/osm_mcast_mgr.h index b460949..a78c641 100644 --- a/osm/include/opensm/osm_mcast_mgr.h +++ b/osm/include/opensm/osm_mcast_mgr.h @@ -268,50 +268,6 @@ osm_mcast_mgr_process( * Multicast Manager, Node Info Response Controller *********/ -/****f* OpenSM: Multicast Manager/osm_mcast_mgr_process_mgrp -* NAME -* osm_mcast_mgr_process_mgrp -* -* DESCRIPTION -* Processes a specific multicast group. This function is called -* by the SM to process a multicast group. Note that this function -* returns BEFORE the switch tables have been configured over the wire, -* and AFTER switch table configuration MADs are all placed in the -* VL15 FIFOs. In other words, the switch table configuration is -* imminent but probably not yet complete at the time this call returns. -* -* SYNOPSIS -*/ -osm_signal_t -osm_mcast_mgr_process_mgrp( - IN osm_mcast_mgr_t* const p_mgr, - IN osm_mgrp_t* const p_mgrp, - IN osm_mcast_req_type_t req_type, - IN ib_net64_t port_guid ); -/* -* PARAMETERS -* p_mgr -* [in] Pointer to an osm_mcast_mgr_t object. -* -* p_mgrp -* [in] Pointer to the multicast group to process. -* -* req_type -* [in] Type of the multicast request that caused this processing -* (MC create/join/leave). -* -* port_guid -* [in] Port guid of the port that was added/removed due to this call. -* -* RETURN VALUES -* OSM_SIGNAL_DONE -* OSM_SIGNAL_DONE_PENDING -* -* NOTES -* -* SEE ALSO -*********/ - /****f* OpenSM: Multicast Manager/osm_mcast_mgr_process_mgrp_cb * NAME * osm_mcast_mgr_process_mgrp_cb diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index bdd0d61..cb0ffb1 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -110,7 +110,7 @@ __osm_mcast_work_obj_delete( /********************************************************************** Recursively remove nodes from the tree **********************************************************************/ -void +static void __osm_mcast_mgr_purge_tree_node( IN osm_mtree_node_t* p_mtn ) { @@ -148,7 +148,7 @@ __osm_mcast_mgr_purge_tree( /********************************************************************** **********************************************************************/ -float +static float osm_mcast_mgr_compute_avg_hops( osm_mcast_mgr_t* const p_mgr, const osm_mgrp_t* const p_mgrp, @@ -215,7 +215,7 @@ osm_mcast_mgr_compute_avg_hops( Calculate the maximal "min hops" from the given switch to any of the group HCAs **********************************************************************/ -float +static float osm_mcast_mgr_compute_max_hops( osm_mcast_mgr_t* const p_mgr, const osm_mgrp_t* const p_mgrp, @@ -1286,7 +1286,7 @@ #endif /********************************************************************** lock must already be held on entry **********************************************************************/ -ib_api_status_t +static ib_api_status_t osm_mcast_mgr_process_tree( IN osm_mcast_mgr_t* const p_mgr, IN osm_mgrp_t* const p_mgrp, @@ -1377,7 +1377,7 @@ osm_mcast_mgr_process_tree( /********************************************************************** **********************************************************************/ -void +static void osm_mcast_mgr_dump_mcast_routes( IN const osm_mcast_mgr_t* const p_mgr, IN const osm_switch_t* const p_sw ) @@ -1490,7 +1490,7 @@ __unlink_mcast_fdb(IN osm_mcast_mgr_t* c NOTE : The lock should be held externally! **********************************************************************/ -osm_signal_t +static osm_signal_t osm_mcast_mgr_process_mgrp( IN osm_mcast_mgr_t* const p_mgr, IN osm_mgrp_t* const p_mgrp, -- 1.4.2.3 From mst at mellanox.co.il Sun Oct 8 11:07:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 8 Oct 2006 20:07:30 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <20061008160304.GA6216@sashak.voltaire.com> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> <20061003221638.GZ10617@sashak.voltaire.com> <452915BC.6060103@dev.mellanox.co.il> <20061008160304.GA6216@sashak.voltaire.com> Message-ID: <20061008180730.GB30377@mellanox.co.il> Quoting r. Sasha Khapyorsky : > > Indeed, the C99 style var arg macros are supported by VC6 and by gcc, > > but not by WinDDK (at least by the version that we're using - 4.23). > > Any chance to upgrade to VC6? At least in a future? VC makes cross-compiling quite hard. It's probably easier to wait for a new DDK with proper C99 support to come out. -- MST From sashak at voltaire.com Sun Oct 8 11:52:03 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 8 Oct 2006 20:52:03 +0200 Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <20061008180730.GB30377@mellanox.co.il> References: <86sliqlc0r.fsf@mtl066.yok.mtl.com> <20061003221638.GZ10617@sashak.voltaire.com> <452915BC.6060103@dev.mellanox.co.il> <20061008160304.GA6216@sashak.voltaire.com> <20061008180730.GB30377@mellanox.co.il> Message-ID: <20061008185203.GD18411@sashak.voltaire.com> On 20:07 Sun 08 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > > Indeed, the C99 style var arg macros are supported by VC6 and by gcc, > > > but not by WinDDK (at least by the version that we're using - 4.23). > > > > Any chance to upgrade to VC6? At least in a future? > > VC makes cross-compiling quite hard. I'm not following, which cross-compiling? Actually the only thing is needed is proper C pre-processor, isn't it? Sasha > It's probably easier to wait for a new DDK with proper C99 support to come out. > > -- > MST From rdreier at cisco.com Sun Oct 8 14:37:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 08 Oct 2006 14:37:40 -0700 Subject: [openib-general] libmthca: build error with svn 9735 In-Reply-To: <000001c6eaa1$d23b3e50$79d8180a@amr.corp.intel.com> (Sean Hefty's message of "Sat, 7 Oct 2006 23:20:15 -0700") References: <000001c6eaa1$d23b3e50$79d8180a@amr.corp.intel.com> Message-ID: My bad... I fixed it up. From bunk at stusta.de Sun Oct 8 16:16:35 2006 From: bunk at stusta.de (Adrian Bunk) Date: Mon, 9 Oct 2006 01:16:35 +0200 Subject: [openib-general] [2.6 patch] drivers/infiniband/hw/amso1100/c2_rnic.c: fix a NULL dereference Message-ID: <20061008231635.GR6755@stusta.de> This patch fixes a NULL dereference spotted by the Coverity checker. Signed-off-by: Adrian Bunk --- linux-2.6/drivers/infiniband/hw/amso1100/c2_rnic.c.old 2006-10-09 00:39:32.000000000 +0200 +++ linux-2.6/drivers/infiniband/hw/amso1100/c2_rnic.c 2006-10-09 00:40:30.000000000 +0200 @@ -150,8 +150,8 @@ (struct c2wr_rnic_query_rep *) (unsigned long) (vq_req->reply_msg); if (!reply) err = -ENOMEM; - - err = c2_errno(reply); + else + err = c2_errno(reply); if (err) goto bail2; From bugzilla-daemon at openib.org Sun Oct 8 20:02:25 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sun, 8 Oct 2006 20:02:25 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20061009030225.23E022283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #4 from sweitzen at cisco.com 2006-10-08 20:02 ------- RENICE_IB_MAD=yes works well on RHEL4 U4 x86_64 on Dell PE 1950 Woodcrest system. I haven't tried on a newer kernel yet. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Sun Oct 8 21:21:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 06:21:42 +0200 Subject: [openib-general] ipoib_ib.c - alignment questions Message-ID: <20061009042142.GA25964@mellanox.co.il> Hi, Roland! ipoib_ib.c has this: /* * IB will leave a 40 byte gap for a GRH and IPoIB adds a 4 byte * header. So we need 4 more bytes to get to 48 and align the * IP header to a multiple of 16. */ skb_reserve(skb, 4); Some questions on this: - Why do we try to align the IP header to a multiple of 16? - This works if skb start is 16 byte aligned. What guarantees that skb data is 16 byte aligned? - Would the following code be better than the comment: skb_reserve(skb, ALIGN(IB_GRH_BYTES + IPOIB_ENCAP_LEN, 16) - IB_GRH_BYTES - IPOIB_ENCAP_LEN); comments have a bigger tenency to bitrot ... -- MST From ogerlitz at voltaire.com Mon Oct 9 00:35:12 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 09 Oct 2006 09:35:12 +0200 Subject: [openib-general] OFED 1.1 RC7 In-Reply-To: <4525271E.8070000@dev.mellanox.co.il> References: <4525271E.8070000@dev.mellanox.co.il> Message-ID: <4529FBB0.50700@voltaire.com> Aviram Gutman wrote: > OFED-1.1-rc7 is available on > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > Release details: > ================ > BUILD_ID: > OFED-1.1-rc7 > > openib-1.1 (REV=9725) > # User space > https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 > ref: refs/heads/ofed_1_1 > commit fde99a7a22e56d6aa90dae9db3d600755efcedb5 > > # MPI > mpi_osu-0.9.7-mlx2.2.0.tgz > openmpi-1.1.1-1.src.rpm > mpitests-2.0-0.src.rpm Michael, I thought the info includes also on which **tag** in Linus GIT an OFED release is based? Also i understand that sometimes you use Roland tree and on other times Linus tree, so this should be stated as well. Or. From sweitzen at cisco.com Mon Oct 9 00:34:59 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 9 Oct 2006 00:34:59 -0700 Subject: [openib-general] Cisco SQA results for OFED 1.1 rc6 Message-ID: The testing in general went well. All testing was done on Mellanox HCAs, both SDR and DDR. Most testing was done on RHEL4 U3, but we have done some testing on RHEL4 U4 and SLES10, and in the future will test less and less on RHEL4 U3. See attached spreadsheet for more details. The following bugs and enhancement requests were filed. 247 OFED IPoIB HA not working on RHEL4 U3 249 OFED 1.1: Open MPI 1.1.1 won't compile with Intel C 9.[01] on SLES 10 258 OFED: ppc64 GNU mpif90 missing for MVAPICH 259 problems with OFED IPoIB HA on SLES10 260 IPoIB HA does not migrate IPoIB pkey interfaces 261 can't configure IPoIB pkey interfaces at boot time 262 can't configure SRP mounts from /etc/fstab 263 OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop 264 add binary compat to OFED MVAPICH for programs linked with VAPI MVAPICH 265 OFED: can't install multiple kernel-ib packages for dual-boot system 266 IPoIB multicast does not work with RHEL4 U4 267 OFED 1.1 MVAPICH not working on SLES10 x86_64 268 OFED openibd script references IBG2 269 OFED 1.1 rc6 IPoIB does not interoperate with Cisco SFS 3001 270 tvflash does not work with HCA recovery jumper 271 misleading error message when stopping openibd if SDP in use Below are images of the Microway MPI Link Checker running on two 32-node clusters, one has RHEL4 32-bit and PCI-X, one has RHEL4 64-bit and PCI-E. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems PCI-X MVAPICH PCI-X Open MPI PCI-X Intel MPI Has some problems, perhaps rc7 will be better. PCI-X HP MPI PCI-E MVAPICH PCI-E Open MPI PCI-E Intel MPI PCI-E HP MPI -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 111018 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 109151 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 109177 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 97542 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 103253 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 101433 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 99389 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook.jpg Type: image/jpeg Size: 162200 bytes Desc: Outlook.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_sqa_results.xls Type: application/vnd.ms-excel Size: 182272 bytes Desc: ofed_sqa_results.xls URL: From halr at voltaire.com Mon Oct 9 02:57:44 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 05:57:44 -0400 Subject: [openib-general] [PATCH TRIVIAL] opensm: make some local functions static In-Reply-To: <20061008180516.GC18411@sashak.voltaire.com> References: <20061008180516.GC18411@sashak.voltaire.com> Message-ID: <1160387851.30096.153216.camel@hal.voltaire.com> On Sun, 2006-10-08 at 14:05, Sasha Khapyorsky wrote: > This makes some local functions static in osm_mcast_mgr.c. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From mst at mellanox.co.il Mon Oct 9 03:04:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 12:04:30 +0200 Subject: [openib-general] [PATCH for-2.6.19] IB/mthca: fix off-by-one in create_srq Message-ID: <20061009100430.GA24406@mellanox.co.il> Tavor needs a spare entry in SRQ - same as memfree. Note that in userspace libmthca already handles this correctly. Signed-off-by: Michael S. Tsirkin --- diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 0f316c8..46b2747 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -201,6 +201,8 @@ int mthca_alloc_srq(struct mthca_dev *de if (mthca_is_memfree(dev)) srq->max = roundup_pow_of_two(srq->max + 1); + else + srq->max = srq->max + 1; ds = max(64UL, roundup_pow_of_two(sizeof (struct mthca_next_seg) + -- MST From mst at mellanox.co.il Mon Oct 9 03:39:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 12:39:13 +0200 Subject: [openib-general] [PATCHv2] IB/mthca: fix off-by-one in mthca srq creation Message-ID: <20061009103913.GA25395@mellanox.co.il> IB/mthca: fix off-by-one in mthca srq creation note that in userspace libmthca already handles this correctly Noted by Jack Morgenstein Signed-off-by: Michael S. Tsirkin --- Update: the previous version failed to update attr->max_wr correctly. diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 0f316c8..46b2747 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -201,6 +201,8 @@ int mthca_alloc_srq(struct mthca_dev *de if (mthca_is_memfree(dev)) srq->max = roundup_pow_of_two(srq->max + 1); + else + srq->max = srq->max + 1; ds = max(64UL, roundup_pow_of_two(sizeof (struct mthca_next_seg) + @@ -279,7 +279,7 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; - attr->max_wr = (mthca_is_memfree(dev)) ? srq->max - 1 : srq->max; + attr->max_wr = srq->max - 1; attr->max_sge = srq->max_gs; return 0; -- MST From mlakshmanan at silverstorm.com Mon Oct 9 04:53:49 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Mon, 9 Oct 2006 07:53:49 -0400 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target Message-ID: Quoting r. Roland Dreier : > Thanks, queued for 2.6.19 I tested the patches, which are included in OFED 1.1 RC7, against Silverstorm SRP targets. The patch breaks backward compatibility for fabrics that use Silverstorm targets, due to the following: It defaults the new parameter "initiator_ext" to 0. Silverstorm SRP targets, when configured for working with OFED stacks, are usually set to expect an initiator extension of 1, to overcome the earlier limitation of OFED stacks setting initiator extension to the port number. This implies that a user must, without exception, add "initiator_ext=" to the add target echo string. It'd be useful if either or both of the following could be done: 1. Release note the above requirement of adding the "initiator_ext=" string to the add target echo string, for all Silverstorm targets. 2. Maintain the earlier default of the initiator extension being equal to the port number. I have prepared a patch that does step 2 above, which I'll send in a separate e-mail based on the feedback to the above suggestions. Madhu From mst at mellanox.co.il Mon Oct 9 06:35:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 15:35:04 +0200 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: References: Message-ID: <20061009133504.GA26849@mellanox.co.il> Quoting r. Lakshmanan, Madhu : > Subject: FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target > > Quoting r. Roland Dreier : > > Thanks, queued for 2.6.19 > > I tested the patches, which are included in OFED 1.1 RC7, against > Silverstorm SRP targets. The patch breaks backward compatibility for > fabrics that use Silverstorm targets, due to the following: > > It defaults the new parameter "initiator_ext" to 0. Silverstorm SRP > targets, when configured for working with OFED stacks, are usually set > to expect an initiator extension of 1, to overcome the earlier > limitation of OFED stacks setting initiator extension to the port > number. Sounds like a target bug - why does it expect *anything* specific in the initiator extension? > This implies that a user must, without exception, add > "initiator_ext=" to the add target echo string. > > It'd be useful if either or both of the following could be done: > > 1. Release note the above requirement of adding the "initiator_ext=" > string to the add target echo string, for all Silverstorm targets. I don't think we'll be touching OFED 1.1 anymore. So maybe this is the best choice for kernel.org, too. > 2. Maintain the earlier default of the initiator extension being equal > to the port number. > > I have prepared a patch that does step 2 above, which I'll send in a > separate e-mail based on the feedback to the above suggestions. Hmm. What, exactly, is the target assumption? -- MST From mst at mellanox.co.il Mon Oct 9 07:42:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 16:42:23 +0200 Subject: [openib-general] ipoib: ignores dma mapping errors on TX? Message-ID: <20061009144223.GB26849@mellanox.co.il> It seems that IPoIB ignores the possibility that dma_map_single with DMA_TO_DEVICE direction might return dma_mapping_error. Is there some reason that such mappings can't fail? -- MST From aviram at dev.mellanox.co.il Mon Oct 9 07:54:36 2006 From: aviram at dev.mellanox.co.il (Aviram Gutman) Date: Mon, 09 Oct 2006 16:54:36 +0200 Subject: [openib-general] [openfabrics-ewg] Cisco SQA results for OFED 1.1 rc6 In-Reply-To: References: Message-ID: <452A62AC.8000904@dev.mellanox.co.il> Thanks for the report. Please see below Scott Weitzenkamp (sweitzen) wrote: > The testing in general went well. > > All testing was done on Mellanox HCAs, both SDR and DDR. Most testing > was done on RHEL4 U3, but we have done some testing on RHEL4 U4 and > SLES10, and in the future will test less and less on RHEL4 U3. See > attached spreadsheet for more details. > > The following bugs and enhancement requests were filed. > > 247 OFED IPoIB HA not working on RHEL4 U3 > We fixed it inRC7 > > 249 OFED 1.1: Open MPI 1.1.1 won't compile with Intel C 9.[01] > on SLES 10 > I guess this will not be fixed for OFED 1.1. Correct? > > 258 OFED: ppc64 GNU mpif90 missing for MVAPICH > Can you send us log file? > > 259 problems with OFED IPoIB HA on SLES10 > Fixed in RC7 > > 260 IPoIB HA does not migrate IPoIB pkey interfaces > Will be done for OFED 1.2 > > 261 can't configure IPoIB pkey interfaces at boot time > This is a feature request. It will not be in OFED 1.1 > > 262 can't configure SRP mounts from /etc/fstab > This is a major request. Need to work with Roland and Openib-general list. > > 263 OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop > Working on it > > 264 add binary compat to OFED MVAPICH for programs linked with > VAPI MVAPICH > > 265 OFED: can't install multiple kernel-ib packages for > dual-boot system > Feature request for future OFED releases > > 266 IPoIB multicast does not work with RHEL4 U4 > This is an issue with RH4 U4. It requires kernel patch. We documented the problem in the IPoIB release notes. > > 267 OFED 1.1 MVAPICH not working on SLES10 x86_64 > It is weird. For us it works perfectly. > > 268 OFED openibd script references IBG2 > Fixed for final > > 269 OFED 1.1 rc6 IPoIB does not interoperate with Cisco SFS 3001 > Can Cisco debug it and send patches? > > 270 tvflash does not work with HCA recovery jumper > Can Cisco send a fix? > > 271 misleading error message when stopping openibd if SDP in use > Fixed for Final > Below are images of the Microway MPI Link Checker running on two > 32-node clusters, one has RHEL4 32-bit and PCI-X, one has RHEL4 64-bit > and PCI-E. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > From halr at voltaire.com Mon Oct 9 08:24:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 11:24:41 -0400 Subject: [openib-general] ibdiagnet Message-ID: <1160407478.30096.166251.camel@hal.voltaire.com> Hi Eitan, When I run ibdiagnet, I get the following: Loading IBDIAGNET from: /usr/local/lib/ibdiagnet1.0 Loading IBDM from: /usr/local/lib/ibdm1.0 -W- Topology file is not specified. Reports regarding cluster links will use direct routes. -W- A few ports of local device are up. Since port-num was not specified (-p option), port 1 of device 1 will be used as the local port. Node type 4 is not an IB node type -E- Fail to ibcr_bind. Node type 4 is an RNIC. Shouldn't these nodes be ignored ? Also, are IB router nodes supported too by this ? Thanks. -- Hal From mst at mellanox.co.il Mon Oct 9 09:06:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 18:06:32 +0200 Subject: [openib-general] [PATCHv3] IB/mthca: fix off-by-one in mthca srq creation In-Reply-To: <20061009103913.GA25395@mellanox.co.il> References: <20061009103913.GA25395@mellanox.co.il> Message-ID: <20061009160632.GA11007@mellanox.co.il> IB/mthca: fix off-by-one in mthca srq creation note that in userspace libmthca already handles this correctly Noted by Jack Morgenstein Signed-off-by: Michael S. Tsirkin --- Hopefully last update: the previous version failed to update query_srq. diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 0f316c8..92a72f5 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -201,6 +201,8 @@ int mthca_alloc_srq(struct mthca_dev *de if (mthca_is_memfree(dev)) srq->max = roundup_pow_of_two(srq->max + 1); + else + srq->max = srq->max + 1; ds = max(64UL, roundup_pow_of_two(sizeof (struct mthca_next_seg) + @@ -277,7 +279,7 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; - attr->max_wr = (mthca_is_memfree(dev)) ? srq->max - 1 : srq->max; + attr->max_wr = srq->max - 1; attr->max_sge = srq->max_gs; return 0; @@ -413,7 +415,7 @@ int mthca_query_srq(struct ib_srq *ibsrq srq_attr->srq_limit = be16_to_cpu(tavor_ctx->limit_watermark); } - srq_attr->max_wr = (mthca_is_memfree(dev)) ? srq->max - 1 : srq->max; + srq_attr->max_wr = srq->max - 1; srq_attr->max_sge = srq->max_gs; out: -- MST From sweitzen at cisco.com Mon Oct 9 09:15:51 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 9 Oct 2006 09:15:51 -0700 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target Message-ID: I am also having new problems configuring SRP with OFED 1.1 rc7, I have asked Roland to take a look on my test networks. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of > Lakshmanan, Madhu > Sent: Monday, October 09, 2006 4:54 AM > To: Ishai Rabinovitz; openib-general at openib.org > Cc: Roland Dreier (rdreier) > Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: > enable multiple connections to the same target > > Quoting r. Roland Dreier : > > Thanks, queued for 2.6.19 > > I tested the patches, which are included in OFED 1.1 RC7, against > Silverstorm SRP targets. The patch breaks backward compatibility for > fabrics that use Silverstorm targets, due to the following: > > It defaults the new parameter "initiator_ext" to 0. Silverstorm SRP > targets, when configured for working with OFED stacks, are usually set > to expect an initiator extension of 1, to overcome the earlier > limitation of OFED stacks setting initiator extension to the port > number. This implies that a user must, without exception, add > "initiator_ext=" to the add target echo string. > > It'd be useful if either or both of the following could be done: > > 1. Release note the above requirement of adding the > "initiator_ext=" > string to the add target echo string, for all Silverstorm targets. > 2. Maintain the earlier default of the initiator extension being equal > to the port number. > > I have prepared a patch that does step 2 above, which I'll send in a > separate e-mail based on the feedback to the above suggestions. > > Madhu > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Mon Oct 9 09:20:34 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 09 Oct 2006 18:20:34 +0200 Subject: [openib-general] ibdiagnet In-Reply-To: <1160407478.30096.166251.camel@hal.voltaire.com> References: <1160407478.30096.166251.camel@hal.voltaire.com> Message-ID: <452A76D2.60700@mellanox.co.il> Hi Hal, Hal Rosenstock wrote: >Hi Eitan, > >When I run ibdiagnet, I get the following: >Loading IBDIAGNET from: /usr/local/lib/ibdiagnet1.0 >Loading IBDM from: /usr/local/lib/ibdm1.0 >-W- Topology file is not specified. > Reports regarding cluster links will use direct routes. >-W- A few ports of local device are up. > Since port-num was not specified (-p option), port 1 of device 1 will be > used as the local port. > >Node type 4 is not an IB node type > > This message is coming from: osm/libvendor/osm_vendor_ibumad.c (opensm vendor lib is used by ibis) Why is umad printing this to stdout? >-E- Fail to ibcr_bind. > > Probably the first device first port is an RNIC ? Did you try using -i and -p flags? > >Node type 4 is an RNIC. Shouldn't these nodes be ignored ? > > Probably by osm vendor >Also, are IB router nodes supported too by this ? > >Thanks. > >-- Hal > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mlakshmanan at silverstorm.com Mon Oct 9 10:04:51 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Mon, 9 Oct 2006 13:04:51 -0400 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: <20061009133504.GA26849@mellanox.co.il> Message-ID: > > I tested the patches, which are included in OFED 1.1 RC7, against > > Silverstorm SRP targets. The patch breaks backward compatibility for > > fabrics that use Silverstorm targets, due to the following: > > > > It defaults the new parameter "initiator_ext" to 0. Silverstorm SRP > > targets, when configured for working with OFED stacks, are usually set > > to expect an initiator extension of 1, to overcome the earlier > > limitation of OFED stacks setting initiator extension to the port > > number. > > Sounds like a target bug - why does it expect *anything* specific > in the initiator extension? The Silverstorm SRP targets can be configured to accept connections from specific hosts (identified by GUID) and / or specific initiator extensions. This allows for a scenario where a group of hosts can gain access to the same back-end storage device, by using the same initiator extension across all the hosts, with the host GUID being ignored / wildcarded on the SRP target. It also facilitates a level of access control to back-end storage devices and permits the grouping of hosts into logical groups. In order to interoperate successfully with the OFED 1.0 stack, such SRP targets were configured to expect the initiator extension reported by the OFED 1.0 SRP implementation, i.e. an initiator extension of 1. Note that for this particular configuration the host GUID is wildcarded, and the only unique identifier is the initiator extension. > > 1. Release note the above requirement of adding the "initiator_ext=" > > string to the add target echo string, for all Silverstorm targets. > > I don't think we'll be touching OFED 1.1 anymore. So maybe this is > the best choice for kernel.org, too. > Are the release notes / docs frozen as well? > > 2. Maintain the earlier default of the initiator extension being equal > > to the port number. > > Hmm. What, exactly, is the target assumption? > > -- > MST The target doesn't assume or default anything. It is how it is configured. The current patch breaks existing Silverstorm SRP target configurations, unless either: 1. The user specifies the "initiator_ext=" when adding a target, which is not to be found anywhere in the release notes, or 2. The configuration on the SRP target is modified to reflect the changes in OFED 1.1 where the initiator extension is passed to the target as 0, if the user doesn't specify it, which again is not specified in the release notes. Madhu From rdreier at cisco.com Mon Oct 9 10:16:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 09 Oct 2006 10:16:54 -0700 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: (Scott Weitzenkamp's message of "Mon, 9 Oct 2006 09:15:51 -0700") References: Message-ID: > I am also having new problems configuring SRP with OFED 1.1 rc7, I have > asked Roland to take a look on my test networks. The problem is that Cisco SRP targets insist on the initiator ID being 8 bytes of 0 followed by the initiator node GUID. The source says /* * Topspin/Cisco SRP targets will reject our login unless we * zero out the first 8 bytes of our initiator port ID. The * second 8 bytes must be our local node GUID, but we always * use that anyway. */ but with the change to allow userspace-specified initiator IDs, we don't use the node GUID anyway. I added the following on top of the "multiple connections" patch that was queued for 2.6.19. Can this be put into OFED as well? - R. diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 3bf0c5b..4b09147 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -359,15 +359,16 @@ static int srp_send_req(struct srp_targe /* * Topspin/Cisco SRP targets will reject our login unless we - * zero out the first 8 bytes of our initiator port ID. The - * second 8 bytes must be our local node GUID, but we always - * use that anyway. + * zero out the first 8 bytes of our initiator port ID and set + * the second 8 bytes to the local node GUID. */ if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3)) { printk(KERN_DEBUG PFX "Topspin/Cisco initiator port ID workaround " "activated for target GUID %016llx\n", (unsigned long long) be64_to_cpu(target->ioc_guid)); memset(req->priv.initiator_port_id, 0, 8); + memcpy(req->priv.initiator_port_id + 8, + &target->srp_host->dev->dev->node_guid, 8); } status = ib_send_cm_req(target->cm_id, &req->param); From mlakshmanan at silverstorm.com Mon Oct 9 10:18:00 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Mon, 9 Oct 2006 13:18:00 -0400 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: Message-ID: > From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] > > I am also having new problems configuring SRP with OFED 1.1 rc7, I have > asked Roland to take a look on my test networks. > > Scott Weitzenkamp > I had to add the following string to the parameters that are echo'ed to '/sys/class/infiniband_srp/..../add_target', to get it to work with the Silverstorm SRP targets: "....,initiator_ext=0000000000000001" I have absolutely no idea what test setup you are using, but maybe that, or something similar, works for you as well? I haven't tried HA, just been sticking to the basics. Madhu From mst at mellanox.co.il Mon Oct 9 10:24:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 19:24:09 +0200 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: References: Message-ID: <20061009172409.GD26849@mellanox.co.il> Quoting r. Lakshmanan, Madhu : > Subject: RE: FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target > > > > I tested the patches, which are included in OFED 1.1 RC7, against > > > Silverstorm SRP targets. The patch breaks backward compatibility for > > > fabrics that use Silverstorm targets, due to the following: > > > > > > It defaults the new parameter "initiator_ext" to 0. Silverstorm SRP > > > targets, when configured for working with OFED stacks, are usually > set > > > to expect an initiator extension of 1, to overcome the earlier > > > limitation of OFED stacks setting initiator extension to the port > > > number. > > > > Sounds like a target bug - why does it expect *anything* specific > > in the initiator extension? > > The Silverstorm SRP targets can be configured to accept connections from > > specific hosts (identified by GUID) and / or specific initiator > extensions. > This allows for a scenario where a group of hosts can gain access to the > same back-end storage device, by using the same initiator extension > across all the hosts, with the host GUID being ignored / wildcarded on > the SRP target. It also facilitates a level of access control to > back-end storage devices and permits the grouping of hosts into logical > groups. > > In order to interoperate successfully with the OFED 1.0 stack, such SRP > targets were configured to expect the initiator extension reported by > the OFED 1.0 SRP implementation, i.e. an initiator extension of 1. Note > that > for this particular configuration the host GUID is wildcarded, and the > only > unique identifier is the initiator extension. > > > > 1. Release note the above requirement of adding the > "initiator_ext=" > > > string to the add target echo string, for all Silverstorm targets. > > > > I don't think we'll be touching OFED 1.1 anymore. So maybe this is > > the best choice for kernel.org, too. > > > > Are the release notes / docs frozen as well? Not yet. > > > 2. Maintain the earlier default of the initiator extension being equal > > > to the port number. > > > > Hmm. What, exactly, is the target assumption? > > The target doesn't assume or default anything. It is how it is > configured. The current patch breaks existing Silverstorm SRP target > configurations, unless either: > 1. The user specifies the "initiator_ext=" when adding a target, > which is not to be found anywhere in the release notes, or > 2. The configuration on the SRP target is modified to reflect the > changes in OFED 1.1 where the initiator extension is passed to the > target as 0, if the user doesn't specify it, which again is not > specified in the release notes. OK,thanks. So we need to document this. Could you pls post a short text describing the issue, who is affected and what needs to be done by the user? -- MST From halr at voltaire.com Mon Oct 9 10:36:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 13:36:34 -0400 Subject: [openib-general] [PATCH]OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout Message-ID: <1160415392.30096.171632.camel@hal.voltaire.com> OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout Signed-off-by: Hal Rosenstock Index: libvendor/osm_vendor_ibumad.c =================================================================== --- libvendor/osm_vendor_ibumad.c (revision 9751) +++ libvendor/osm_vendor_ibumad.c (working copy) @@ -745,8 +745,8 @@ osm_vendor_open_port( "osm_vendor_open_port: ERR 542D: " "Node type %d is not an IB node type\n", p_vend->umad_ca.node_type ); - printf( "\nNode type %d is not an IB node type\n", - p_vend->umad_ca.node_type ); + fprintf( stderr, "\nNode type %d is not an IB node type\n", + p_vend->umad_ca.node_type ); goto Exit; } @@ -952,7 +952,7 @@ __osm_vendor_recv_dummy_cb( IN void *bind_context, IN osm_madw_t *p_req_madw ) { - printf("Ignoring received MAD after osm_vendor_unbind\n"); + fprintf(stderr, "__osm_vendor_recv_dummy_cb: Ignoring received MAD after osm_vendor_unbind\n"); } /********************************************************************** @@ -962,7 +962,7 @@ __osm_vendor_send_err_dummy_cb( IN void* bind_context, IN osm_madw_t *p_req_madw ) { - printf("Ignoring send error after osm_vendor_unbind\n"); + fprintf(stderr, "__osm_vendor_send_err_dummy_cb: Ignoring send error after osm_vendor_unbind\n"); } /********************************************************************** From halr at voltaire.com Mon Oct 9 10:39:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 13:39:31 -0400 Subject: [openib-general] [PATCH][TRIVIAL]ibutils: Fix some typos Message-ID: <1160415570.30096.171776.camel@hal.voltaire.com> ibutils: Fix some typos Signed-off-by: Hal Rosenstock Index: ibis/README =================================================================== --- ibis/README (revision 9748) +++ ibis/README (working copy) @@ -1,8 +1,8 @@ IBIS stands for IB In-band Services. It provides a TCL API for sending MADs over the local IB ports. -It also provides C API for sending batches of MADs and waiting for the entire batch -completion. +It also provides C API for sending batches of MADs and waiting for the entire +batch completion. The detailed API is described in the doc/ibis_wrap.html directory. @@ -13,4 +13,3 @@ ibvs_test.tcl - vendor specific gateways sac_demo.tcl - SA client queries obj.tcl - how to dump out TCL objects fields - Index: ibmgtsim/config/ibdm.m4 =================================================================== --- ibmgtsim/config/ibdm.m4 (revision 9748) +++ ibmgtsim/config/ibdm.m4 (working copy) @@ -30,7 +30,7 @@ if test "x$with_ibdm" = xnone; then elif test -d [`pwd`]/../ibdm; then with_ibdm=[`pwd`]/../ibdm else - AC_MSG_ERROR([--with-ibdm must be provided - failde to find standard IBDM installation]) + AC_MSG_ERROR([--with-ibdm must be provided - failed to find standard IBDM installation]) fi fi Index: ibdm/README =================================================================== --- ibdm/README (revision 9748) +++ ibdm/README (working copy) @@ -62,7 +62,7 @@ ibdmtr: ----------- Traces a direct route through the fabric while printing the path information at both node and system levels. -Usage: ibdmtr [-v][-h] {-c |-t } -s -p -d +Usage: ibdmtr [-v][-h] {-c |-t } -s -p -d Description: This utility parses a cabling list or topology file From mlakshmanan at silverstorm.com Mon Oct 9 10:42:08 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Mon, 9 Oct 2006 13:42:08 -0400 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: <20061009172409.GD26849@mellanox.co.il> Message-ID: > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > > Quoting r. Lakshmanan, Madhu : > > Subject: RE: FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections > to the same target > > > > > > I tested the patches, which are included in OFED 1.1 RC7, against > > > > Silverstorm SRP targets. The patch breaks backward compatibility for > > > > fabrics that use Silverstorm targets, due to the following: > > > > > > > > It defaults the new parameter "initiator_ext" to 0. Silverstorm SRP > > > > targets, when configured for working with OFED stacks, are usually > > set > > > > to expect an initiator extension of 1, to overcome the earlier > > > > limitation of OFED stacks setting initiator extension to the port > > > > number. > > > > > > Sounds like a target bug - why does it expect *anything* specific > > > in the initiator extension? > > > > The Silverstorm SRP targets can be configured to accept connections from > > > > specific hosts (identified by GUID) and / or specific initiator > > extensions. > > This allows for a scenario where a group of hosts can gain access to the > > same back-end storage device, by using the same initiator extension > > across all the hosts, with the host GUID being ignored / wildcarded on > > the SRP target. It also facilitates a level of access control to > > back-end storage devices and permits the grouping of hosts into logical > > groups. > > > > In order to interoperate successfully with the OFED 1.0 stack, such SRP > > targets were configured to expect the initiator extension reported by > > the OFED 1.0 SRP implementation, i.e. an initiator extension of 1. Note > > that > > for this particular configuration the host GUID is wildcarded, and the > > only > > unique identifier is the initiator extension. > > > > > > 1. Release note the above requirement of adding the > > "initiator_ext=" > > > > string to the add target echo string, for all Silverstorm targets. > > > > > > I don't think we'll be touching OFED 1.1 anymore. So maybe this is > > > the best choice for kernel.org, too. > > > > > > > Are the release notes / docs frozen as well? > > Not yet. > > > > > 2. Maintain the earlier default of the initiator extension being > equal > > > > to the port number. > > > > > > Hmm. What, exactly, is the target assumption? > > > > The target doesn't assume or default anything. It is how it is > > configured. The current patch breaks existing Silverstorm SRP target > > configurations, unless either: > > 1. The user specifies the "initiator_ext=" when adding a target, > > which is not to be found anywhere in the release notes, or > > 2. The configuration on the SRP target is modified to reflect the > > changes in OFED 1.1 where the initiator extension is passed to the > > target as 0, if the user doesn't specify it, which again is not > > specified in the release notes. > > OK,thanks. > So we need to document this. > Could you pls post a short text describing the issue, > who is affected and what needs to be done by the user? > > -- > MST Vendor specific notes: Hosts that are connected to Silverstorm SRP targets need to do either of the following steps after they are upgraded to the OFED 1.1 release in order to continue to access their storage successfully: 1. When issuing the "echo" command to add a new SRP target, the host must append the string ",initiator_ext=0000000000000001" to the original echo string. Example: If 'ibsrpdm -c' shows, and you want to connect to the first target: id_ext=0000000000000001,ioc_guid=00066a0138000165,dgid=fe8000000000000 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00 id_ext=0000000000000001,ioc_guid=00066a0238000165,dgid=fe8000000000000 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00 the echo command must be: echo -n id_ext=0000000000000001,ioc_guid=00066a0138000165,dgid=fe8000000000000 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00, initiator_ext=0000000000000001 > /sys/class/inifiniband_srp/srp-mthca0-1/add_target OR 2. Change the SRP map on the Silverstorm SRP target to set the expected initiator extension to 0. For details on how to change the SRP map on a Silverstorm SRP target, please refer to product documentation. From mst at mellanox.co.il Mon Oct 9 10:47:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 19:47:05 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. Message-ID: <20061009174705.GG26849@mellanox.co.il> Hi! I'm trying to build a network device driver supporting a very large MTU (around 64K) on top of an infiniband connection, and I've hit a couple of issues I'd appreciate some feedback on: 1. On the send side, I've set NETIF_F_SG, but hardware does not support checksum offloading, and I see "dropping NETIF_F_SG since no checksum feature" warning, and I seem to be getting large packets all in one chunk. The reason I've set NETIF_F_SG, is because I'm concerned that under real life stress Linux won't be able to allocate 64K of continuous memory. Is this concern of mine valid? I saw in-tree drivers allocating at least 8K. What's the best way to enable S/G on send side? Is checksum offloading really required for S/G? 2. On the receive side, what's the best/right way to create an skb that is larger than PAGE_SIZE? Do I allocate with alloc_page and fill in nr_frags with skb_fill_page_desc? Some drivers seem to fill in frag_list - which is better? I see than even skb_put only works properly on linear skb. What are the helpers legal for fragmented skb? Suggestions would be appreciated. Thanks, -- MST From halr at voltaire.com Mon Oct 9 10:45:52 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 13:45:52 -0400 Subject: [openib-general] [PATCH][TRIVIAL] ibdiagnet: Fix typo Message-ID: <1160415952.30096.172066.camel@hal.voltaire.com> ibdiagnet: Fix typo priority rather than priorty Signed-off-by: Hal Rosenstock Index: ibdiag/doc/ibdiagnet.pod =================================================================== --- ibdiag/doc/ibdiagnet.pod (revision 9748) +++ ibdiag/doc/ibdiagnet.pod (working copy) @@ -39,7 +39,7 @@ =item F - -List of all the SM (state and priorty) in the fabric +List of all the SM (state and priority) in the fabric =item F - Index: ibdiag/src/ibdebug_if.tcl =================================================================== --- ibdiag/src/ibdebug_if.tcl (revision 9748) +++ ibdiag/src/ibdebug_if.tcl (working copy) @@ -1393,7 +1393,7 @@ append msgText "Checking bad guids" } "-I-ibdiagnet:SM.header" { - append msgText "Summary Fabric SM-state-priorty" + append msgText "Summary Fabric SM-state-priority" } "-E-ibdiagnet:no.lst.file" { set noExiting 1 @@ -1415,8 +1415,8 @@ "-I-ibdiagnet:SM.report.body" { set msgText " " set nodeName [lindex $args 0] - set priorty [lindex $args 1] - append msgText "$nodeName priorty:$priorty" + set priority [lindex $args 1] + append msgText "$nodeName priority:$priority" } "-I-ibdiagnet:check.credit.loops.header" { append msgText "Checking credit loops" @@ -1771,7 +1771,7 @@ switches ibdiagnet.masks - In case of duplicate port/node Guids, these file include the map between masked Guid and real Guids - ibdiagnet.sm - A dump of all the SM (state and priorty) in the fabric + ibdiagnet.sm - A dump of all the SM (state and priority) in the fabric ibdiagnet.pm - In case -pm option was provided, this file contain a dump of all the nodes PM counters In addition to generating the files above, the discovery phase also checks for Index: ibdiag/src/ibdebug.tcl =================================================================== --- ibdiag/src/ibdebug.tcl (revision 9748) +++ ibdiag/src/ibdebug.tcl (working copy) @@ -835,7 +835,7 @@ # DUPandZERO(,NodeGUID) : # DUPandZERO(,) : # -# SM( : ,SMpriorty +# SM( : ,SMpriority # # SECOND_PATH - list of second paths # @@ -2564,7 +2564,7 @@ if { $tmpDirectPath == "" } { set nodeName "The Local Device : $nodeName" } - set msg " $nodeName priorty:[lindex $element 1]" + set msg " $nodeName priority:[lindex $element 1]" if {$_fileName == "stdout"} { inform "-I-ibdiagnet:SM.report.body" $nodeName [lindex $element 1] } else { From mst at mellanox.co.il Mon Oct 9 10:47:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 9 Oct 2006 19:47:51 +0200 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target In-Reply-To: References: Message-ID: <20061009174751.GH26849@mellanox.co.il> Quoting r. Lakshmanan, Madhu : > Vendor specific notes: > > Hosts that are connected to Silverstorm SRP targets need to do > either of the following steps after they are upgraded to the OFED 1.1 > release in order to continue to access their storage successfully: > > 1. When issuing the "echo" command to add a new SRP target, the host > must append the string ",initiator_ext=0000000000000001" to the original > echo string. > Example: > > If 'ibsrpdm -c' shows, and you want to connect to the first > target: > > > id_ext=0000000000000001,ioc_guid=00066a0138000165,dgid=fe8000000000000 > 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00 > > id_ext=0000000000000001,ioc_guid=00066a0238000165,dgid=fe8000000000000 > 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00 > > the echo command must be: > > echo -n > id_ext=0000000000000001,ioc_guid=00066a0138000165,dgid=fe8000000000000 > 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00, > initiator_ext=0000000000000001 > > /sys/class/inifiniband_srp/srp-mthca0-1/add_target > > OR > > 2. Change the SRP map on the Silverstorm SRP target to set the expected > initiator extension to 0. For details on how to change the SRP map on a > Silverstorm SRP target, please refer to product documentation. > > OK, thanks. -- MST From halr at voltaire.com Mon Oct 9 10:51:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 13:51:43 -0400 Subject: [openib-general] ibdiagnet In-Reply-To: <452A76D2.60700@mellanox.co.il> References: <1160407478.30096.166251.camel@hal.voltaire.com> <452A76D2.60700@mellanox.co.il> Message-ID: <1160416301.30096.172306.camel@hal.voltaire.com> Hi Eitan, On Mon, 2006-10-09 at 12:20, Eitan Zahavi wrote: > Hi Hal, > > Hal Rosenstock wrote: > > >Hi Eitan, > > > >When I run ibdiagnet, I get the following: > >Loading IBDIAGNET from: /usr/local/lib/ibdiagnet1.0 > >Loading IBDM from: /usr/local/lib/ibdm1.0 > >-W- Topology file is not specified. > > Reports regarding cluster links will use direct routes. > >-W- A few ports of local device are up. > > Since port-num was not specified (-p option), port 1 of device 1 will be > > used as the local port. > > > >Node type 4 is not an IB node type > > > > > This message is coming from: osm/libvendor/osm_vendor_ibumad.c (opensm > vendor lib is used by ibis) > Why is umad printing this to stdout? I sent a patch for this to the list. > >-E- Fail to ibcr_bind. > > > > > Probably the first device first port is an RNIC ? Yes. > Did you try using -i and -p flags? This works. > >Node type 4 is an RNIC. Shouldn't these nodes be ignored ? > > > > > Probably by osm vendor osm vendor returns an error for these. -- Hal > >Also, are IB router nodes supported too by this ? > > > >Thanks. > > > >-- Hal > > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From shemminger at osdl.org Mon Oct 9 09:50:51 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Mon, 9 Oct 2006 09:50:51 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061009174705.GG26849@mellanox.co.il> References: <20061009174705.GG26849@mellanox.co.il> Message-ID: <20061009095051.38ed9f22@freekitty> On Mon, 9 Oct 2006 19:47:05 +0200 "Michael S. Tsirkin" wrote: > Hi! > I'm trying to build a network device driver supporting a very large MTU (around 64K) > on top of an infiniband connection, and I've hit a couple of issues I'd > appreciate some feedback on: > > 1. On the send side, > I've set NETIF_F_SG, but hardware does not support checksum offloading, > and I see "dropping NETIF_F_SG since no checksum feature" warning, > and I seem to be getting large packets all in one chunk. > The reason I've set NETIF_F_SG, is because I'm concerned that under real life > stress Linux won't be able to allocate 64K of continuous memory. > > Is this concern of mine valid? I saw in-tree drivers allocating at least 8K. > What's the best way to enable S/G on send side? > Is checksum offloading really required for S/G? Yes, in the current implementation, Linux needs checksum offload. But there is no reason, your driver can't compute the checksum in software. > 2. On the receive side, what's the best/right way to create an skb that > is larger than PAGE_SIZE? > Do I allocate with alloc_page and fill in nr_frags with skb_fill_page_desc? > Some drivers seem to fill in frag_list - which is better? > I see than even skb_put only works properly on linear skb. Allocating large buffers is problematic on busy systems. See lastest e1000 or sky2 that use frag_list. > What are the helpers legal for fragmented skb? Read the source. Setting up fragmented buffers has less helper functions, but isn't that hard. -- Stephen Hemminger From rdreier at cisco.com Mon Oct 9 11:01:06 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 09 Oct 2006 11:01:06 -0700 Subject: [openib-general] ipoib: ignores dma mapping errors on TX? In-Reply-To: <20061009144223.GB26849@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 9 Oct 2006 16:42:23 +0200") References: <20061009144223.GB26849@mellanox.co.il> Message-ID: Michael> It seems that IPoIB ignores the possibility that Michael> dma_map_single with DMA_TO_DEVICE direction might return Michael> dma_mapping_error. Michael> Is there some reason that such mappings can't fail? No, it's just an oversight. Most network device drivers don't check for DMA mapping errors but it's probably better to do so anyway. I added this to my queue: commit 8edaf479946022d67350d6c344952fb65064e51b Author: Roland Dreier Date: Mon Oct 9 10:54:20 2006 -0700 IPoIB: Check for DMA mapping error for TX packets Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f426a69..8bf5e9e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev, tx_req->skb = skb; addr = dma_map_single(priv->ca->dma_device, skb->data, skb->len, DMA_TO_DEVICE); + if (unlikely(dma_mapping_error(addr))) { + ++priv->stats.tx_errors; + dev_kfree_skb_any(skb); + return; + } pci_unmap_addr_set(tx_req, mapping, addr); if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), From rdreier at cisco.com Mon Oct 9 11:06:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 09 Oct 2006 11:06:38 -0700 Subject: [openib-general] ipoib_ib.c - alignment questions In-Reply-To: <20061009042142.GA25964@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 9 Oct 2006 06:21:42 +0200") References: <20061009042142.GA25964@mellanox.co.il> Message-ID: > ipoib_ib.c has this: > /* > * IB will leave a 40 byte gap for a GRH and IPoIB adds a 4 byte > * header. So we need 4 more bytes to get to 48 and align the > * IP header to a multiple of 16. > */ > skb_reserve(skb, 4); > > Some questions on this: > - Why do we try to align the IP header to a multiple of 16? As the comment in linux/skbuff.h near the definition of NET_IP_ALIGN says: * CPUs often take a performance hit when accessing unaligned memory * locations. The actual performance hit varies, it can be small if the * hardware handles it or large if we have to take an exception and fix it * in software. > - This works if skb start is 16 byte aligned. > What guarantees that skb data is 16 byte aligned? The implementation of the slab allocator I guess. > - Would the following code be better than the comment: > skb_reserve(skb, ALIGN(IB_GRH_BYTES + IPOIB_ENCAP_LEN, 16) - > IB_GRH_BYTES - IPOIB_ENCAP_LEN); > > comments have a bigger tenency to bitrot ... I think that's a lot harder to understand when you first read it. I guess we would still need the comment there. BTW there is still a possibility of improvement here, because for example on ppc64, the cost of unaligned DMA is much higher than the cost of an unaligned IP header -- as linux/skbuff.h goes on to say: * The downside to this alignment of the IP header is that the DMA is now * unaligned. On some architectures the cost of an unaligned DMA is high * and this cost outweighs the gains made by aligning the IP header. So really we should have something else there if NET_IP_ALIGN is 0 -- the goal would be to DMA received packets starting at a 64-byte aligned address. But I've never bothered to work out the best way to do that. - R. From rdreier at cisco.com Mon Oct 9 11:09:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 09 Oct 2006 11:09:19 -0700 Subject: [openib-general] [PATCHv3] IB/mthca: fix off-by-one in mthca srq creation In-Reply-To: <20061009160632.GA11007@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 9 Oct 2006 18:06:32 +0200") References: <20061009103913.GA25395@mellanox.co.il> <20061009160632.GA11007@mellanox.co.il> Message-ID: Thanks, applied for 2.6.19. From eitan at mellanox.co.il Mon Oct 9 11:32:47 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 09 Oct 2006 20:32:47 +0200 Subject: [openib-general] ibdiagnet In-Reply-To: <1160416301.30096.172306.camel@hal.voltaire.com> References: <1160407478.30096.166251.camel@hal.voltaire.com> <452A76D2.60700@mellanox.co.il> <1160416301.30096.172306.camel@hal.voltaire.com> Message-ID: <452A95CF.3060200@mellanox.co.il> Hal Rosenstock wrote: >Hi Eitan, > >On Mon, 2006-10-09 at 12:20, Eitan Zahavi wrote: > > >>Hi Hal, >> >>Hal Rosenstock wrote: >> >> >> >>>Hi Eitan, >>> >>>When I run ibdiagnet, I get the following: >>>Loading IBDIAGNET from: /usr/local/lib/ibdiagnet1.0 >>>Loading IBDM from: /usr/local/lib/ibdm1.0 >>>-W- Topology file is not specified. >>> Reports regarding cluster links will use direct routes. >>>-W- A few ports of local device are up. >>> Since port-num was not specified (-p option), port 1 of device 1 will be >>> used as the local port. >>> >>>Node type 4 is not an IB node type >>> >>> >>> >>> >>This message is coming from: osm/libvendor/osm_vendor_ibumad.c (opensm >>vendor lib is used by ibis) >>Why is umad printing this to stdout? >> >> > >I sent a patch for this to the list. > > > >>>-E- Fail to ibcr_bind. >>> >>> >>> >>> >>Probably the first device first port is an RNIC ? >> >> > >Yes. > > > >>Did you try using -i and -p flags? >> >> > >This works. > > > >>>Node type 4 is an RNIC. Shouldn't these nodes be ignored ? >>> >>> >>> >>> >>Probably by osm vendor >> >> > >osm vendor returns an error for these. > > Yes but maybe it should filter them out. I do not see how they can be helpful for IB SM or IB Utils ... >-- Hal > > > >>>Also, are IB router nodes supported too by this ? >>> >>>Thanks. >>> >>>-- Hal >>> >>> >>> >>>_______________________________________________ >>>openib-general mailing list >>>openib-general at openib.org >>>http://openib.org/mailman/listinfo/openib-general >>> >>>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>> >>> >>> >>> > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Mon Oct 9 11:36:10 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 09 Oct 2006 20:36:10 +0200 Subject: [openib-general] [PATCH][TRIVIAL]ibutils: Fix some typos In-Reply-To: <1160415570.30096.171776.camel@hal.voltaire.com> References: <1160415570.30096.171776.camel@hal.voltaire.com> Message-ID: <452A969A.3060804@mellanox.co.il> Thanks. Applied. Hal Rosenstock wrote: >ibutils: Fix some typos > >Signed-off-by: Hal Rosenstock > >Index: ibis/README >=================================================================== >--- ibis/README (revision 9748) >+++ ibis/README (working copy) >@@ -1,8 +1,8 @@ > IBIS stands for IB In-band Services. > It provides a TCL API for sending MADs over the local IB ports. > >-It also provides C API for sending batches of MADs and waiting for the entire batch >-completion. >+It also provides C API for sending batches of MADs and waiting for the entire >+batch completion. > > The detailed API is described in the doc/ibis_wrap.html directory. > >@@ -13,4 +13,3 @@ ibvs_test.tcl - vendor specific gateways > sac_demo.tcl - SA client queries > obj.tcl - how to dump out TCL objects fields > >- >Index: ibmgtsim/config/ibdm.m4 >=================================================================== >--- ibmgtsim/config/ibdm.m4 (revision 9748) >+++ ibmgtsim/config/ibdm.m4 (working copy) >@@ -30,7 +30,7 @@ if test "x$with_ibdm" = xnone; then > elif test -d [`pwd`]/../ibdm; then > with_ibdm=[`pwd`]/../ibdm > else >- AC_MSG_ERROR([--with-ibdm must be provided - failde to find standard IBDM installation]) >+ AC_MSG_ERROR([--with-ibdm must be provided - failed to find standard IBDM installation]) > fi > fi > >Index: ibdm/README >=================================================================== >--- ibdm/README (revision 9748) >+++ ibdm/README (working copy) >@@ -62,7 +62,7 @@ ibdmtr: > ----------- > Traces a direct route through the fabric while printing the path > information at both node and system levels. >-Usage: ibdmtr [-v][-h] {-c |-t } -s -p -d >+Usage: ibdmtr [-v][-h] {-c |-t } -s -p -d > > Description: > This utility parses a cabling list or topology file > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Mon Oct 9 11:36:29 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 09 Oct 2006 20:36:29 +0200 Subject: [openib-general] [PATCH][TRIVIAL] ibdiagnet: Fix typo In-Reply-To: <1160415952.30096.172066.camel@hal.voltaire.com> References: <1160415952.30096.172066.camel@hal.voltaire.com> Message-ID: <452A96AD.4090201@mellanox.co.il> Thanks. Applied. Hal Rosenstock wrote: >ibdiagnet: Fix typo > >priority rather than priorty > >Signed-off-by: Hal Rosenstock > >Index: ibdiag/doc/ibdiagnet.pod >=================================================================== >--- ibdiag/doc/ibdiagnet.pod (revision 9748) >+++ ibdiag/doc/ibdiagnet.pod (working copy) >@@ -39,7 +39,7 @@ > > =item F - > >-List of all the SM (state and priorty) in the fabric >+List of all the SM (state and priority) in the fabric > > =item F - > >Index: ibdiag/src/ibdebug_if.tcl >=================================================================== >--- ibdiag/src/ibdebug_if.tcl (revision 9748) >+++ ibdiag/src/ibdebug_if.tcl (working copy) >@@ -1393,7 +1393,7 @@ > append msgText "Checking bad guids" > } > "-I-ibdiagnet:SM.header" { >- append msgText "Summary Fabric SM-state-priorty" >+ append msgText "Summary Fabric SM-state-priority" > } > "-E-ibdiagnet:no.lst.file" { > set noExiting 1 >@@ -1415,8 +1415,8 @@ > "-I-ibdiagnet:SM.report.body" { > set msgText " " > set nodeName [lindex $args 0] >- set priorty [lindex $args 1] >- append msgText "$nodeName priorty:$priorty" >+ set priority [lindex $args 1] >+ append msgText "$nodeName priority:$priority" > } > "-I-ibdiagnet:check.credit.loops.header" { > append msgText "Checking credit loops" >@@ -1771,7 +1771,7 @@ > switches > ibdiagnet.masks - In case of duplicate port/node Guids, these file include > the map between masked Guid and real Guids >- ibdiagnet.sm - A dump of all the SM (state and priorty) in the fabric >+ ibdiagnet.sm - A dump of all the SM (state and priority) in the fabric > ibdiagnet.pm - In case -pm option was provided, this file contain a dump > of all the nodes PM counters > In addition to generating the files above, the discovery phase also checks for >Index: ibdiag/src/ibdebug.tcl >=================================================================== >--- ibdiag/src/ibdebug.tcl (revision 9748) >+++ ibdiag/src/ibdebug.tcl (working copy) >@@ -835,7 +835,7 @@ > # DUPandZERO(,NodeGUID) : > # DUPandZERO(,) : > # >-# SM( : ,SMpriorty >+# SM( : ,SMpriority > # > # SECOND_PATH - list of second paths > # >@@ -2564,7 +2564,7 @@ > if { $tmpDirectPath == "" } { > set nodeName "The Local Device : $nodeName" > } >- set msg " $nodeName priorty:[lindex $element 1]" >+ set msg " $nodeName priority:[lindex $element 1]" > if {$_fileName == "stdout"} { > inform "-I-ibdiagnet:SM.report.body" $nodeName [lindex $element 1] > } else { > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From eitan at mellanox.co.il Mon Oct 9 11:37:47 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 09 Oct 2006 20:37:47 +0200 Subject: [openib-general] [PATCH]OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout In-Reply-To: <1160415392.30096.171632.camel@hal.voltaire.com> References: <1160415392.30096.171632.camel@hal.voltaire.com> Message-ID: <452A96FB.1050403@mellanox.co.il> Hi Hal, I would rather remove all of these. At least for the non debug build. I do not see how they help the user . What do you say? EZ Hal Rosenstock wrote: >OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than >stdout > >Signed-off-by: Hal Rosenstock > >Index: libvendor/osm_vendor_ibumad.c >=================================================================== >--- libvendor/osm_vendor_ibumad.c (revision 9751) >+++ libvendor/osm_vendor_ibumad.c (working copy) >@@ -745,8 +745,8 @@ osm_vendor_open_port( > "osm_vendor_open_port: ERR 542D: " > "Node type %d is not an IB node type\n", > p_vend->umad_ca.node_type ); >- printf( "\nNode type %d is not an IB node type\n", >- p_vend->umad_ca.node_type ); >+ fprintf( stderr, "\nNode type %d is not an IB node type\n", >+ p_vend->umad_ca.node_type ); > goto Exit; > } > >@@ -952,7 +952,7 @@ __osm_vendor_recv_dummy_cb( > IN void *bind_context, > IN osm_madw_t *p_req_madw ) > { >- printf("Ignoring received MAD after osm_vendor_unbind\n"); >+ fprintf(stderr, "__osm_vendor_recv_dummy_cb: Ignoring received MAD after osm_vendor_unbind\n"); > } > > /********************************************************************** >@@ -962,7 +962,7 @@ __osm_vendor_send_err_dummy_cb( > IN void* bind_context, > IN osm_madw_t *p_req_madw ) > { >- printf("Ignoring send error after osm_vendor_unbind\n"); >+ fprintf(stderr, "__osm_vendor_send_err_dummy_cb: Ignoring send error after osm_vendor_unbind\n"); > } > > /********************************************************************** > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Mon Oct 9 12:05:52 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 15:05:52 -0400 Subject: [openib-general] [PATCH]{TRIVIAL] ibdiagnet: Fix another typo Message-ID: <1160420750.30096.175491.camel@hal.voltaire.com> ibdiagnet: Fix another typo performance rather than performence Signed-off-by: Hal Rosenstock Index: ibdiag/src/ibdebug_if.tcl =================================================================== --- ibdiag/src/ibdebug_if.tcl (revision 9752) +++ ibdiag/src/ibdebug_if.tcl (working copy) @@ -15,17 +15,17 @@ # -deafult "" means that the parameter does have a default value, but it will set later (after ibis is ran, in porc startIBDebug). ## TODO: sm_key is a 64-bit integer - will it be correctly cheked in parseArgv ? array set InfoArgv { - -P,name "query.performence.monitors" + -P,name "query.performance.monitors" -P,desc "If any of the provided pm is greater then its provided value, print it to screen" -P,param "=" -P,regexp "pm.name.>=1" -P,error "-E-argv:not.legal.PM" - -pc,name "reset.performence.monitors" + -pc,name "reset.performance.monitors" -pc,arglen 0 -pc,desc "reset all the fabric links pmCounters" - -pm,name "performence.monitors" + -pm,name "performance.monitors" -pm,arglen 0 -pm,desc "Dumps all pmCounters values into .pm file" Index: ibdiag/src/ibdebug.tcl =================================================================== --- ibdiag/src/ibdebug.tcl (revision 9752) +++ ibdiag/src/ibdebug.tcl (working copy) @@ -1745,7 +1745,7 @@ proc PMCounterQuery {} { # preparing database for reading PMs if {![catch {set tmpLID [GetParamValue LID $directPath -port $entryPort]}]} { if { $tmpLID != 0 } { - if {[info exists G(argv,reset.performence.monitors)]} { + if {[info exists G(argv,reset.performance.monitors)]} { catch {pmClrAllCounters $tmpLID $entryPort} } set tmpLidPort "$tmpLID:$entryPort" @@ -1767,7 +1767,7 @@ proc PMCounterQuery {} { unset tmpLidPort if {![catch {set tmpLID [GetParamValue LID $directPath -port $entryPort]}]} { if { $tmpLID != 0 } { - if {[info exists G(argv,reset.performence.monitors)]} { + if {[info exists G(argv,reset.performance.monitors)]} { catch {pmClrAllCounters $tmpLID $entryPort} } set tmpLidPort "$tmpLID:$entryPort" @@ -1839,7 +1839,7 @@ proc PMCounterQuery {} { inform "-W-ibdiagnet:bad.pm.counter.report" -deviceName $name -listOfErrors $badValues } - if {[info exists G(argv,performence.monitors)]} { + if {[info exists G(argv,performance.monitors)]} { lappend PM_DUMP(nodeNames) $name set PM_DUMP($name,pmCounterList) $pmCounterList set PM_DUMP($name,pmCounterValue) $newValues($tmpLidPort) @@ -1848,7 +1848,7 @@ proc PMCounterQuery {} { if {$firstPMcounter == 0} { inform "-I-ibdiagnet:no.pm.counter.report" } - if {[info exists G(argv,performence.monitors)]} { + if {[info exists G(argv,performance.monitors)]} { writePMFile } return 1 @@ -2377,8 +2377,8 @@ proc ComparePMCounters { oldValues newVa set errList "" set pmRequestList "" - if {[info exists G(argv,query.performence.monitors)]} { - set pmRequestList [split $G(argv,query.performence.monitors) {, =}] + if {[info exists G(argv,query.performance.monitors)]} { + set pmRequestList [split $G(argv,query.performance.monitors) {, =}] } foreach parameter [array names InfoPm] { ParseOptionsList $InfoPm($parameter) @@ -2394,7 +2394,7 @@ proc ComparePMCounters { oldValues newVa lappend errList "$parameter valueChange $oldValue->$newValue" } elseif { ( $oldValue == $overflow ) || ( $newValue == $overflow ) } { lappend errList "$parameter overflow $overflow" - } elseif {[info exists G(argv,query.performence.monitors)]} { + } elseif {[info exists G(argv,query.performance.monitors)]} { if {[lsearch $pmRequestList $parameter] != -1} { set pmTrash [WordAfterFlag $pmRequestList $parameter] if {$newValue >= $pmTrash} { From halr at voltaire.com Mon Oct 9 13:30:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Oct 2006 16:30:29 -0400 Subject: [openib-general] [PATCHv2 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: References: Message-ID: <1160425828.4524.1657.camel@hal.voltaire.com> On Sun, 2006-10-08 at 11:42, Yevgeny Kliteynik wrote: > Hi Hal > > This is the re-submission of the patch that was > originally sibmitted by Eitan - just removing some > cosmetic changes from the patch and re-diffing it > with the trunk: > > 1. Avoid varargs macros not supported by Windows > 2. Included additional header for PRIx64 macro > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From sweitzen at cisco.com Mon Oct 9 15:18:27 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 9 Oct 2006 15:18:27 -0700 Subject: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: enable multiple connections to the same target Message-ID: Vlad, I'd like either an rc8 with this patch, or a pre1 build a day before the final build so I can test this fix. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Roland Dreier (rdreier) > Sent: Monday, October 09, 2006 10:17 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Lakshmanan, Madhu; Ishai Rabinovitz; openib-general at openib.org > Subject: Re: [openib-general] FW: [PATCH fixed] [RFC] IB/srp: > enable multiple connections to the same target > > > I am also having new problems configuring SRP with OFED > 1.1 rc7, I have > > asked Roland to take a look on my test networks. > > The problem is that Cisco SRP targets insist on the initiator ID being > 8 bytes of 0 followed by the initiator node GUID. The source says > > /* > * Topspin/Cisco SRP targets will reject our login unless we > * zero out the first 8 bytes of our initiator port ID. The > * second 8 bytes must be our local node GUID, but we always > * use that anyway. > */ > > but with the change to allow userspace-specified initiator IDs, we > don't use the node GUID anyway. > > I added the following on top of the "multiple connections" patch that > was queued for 2.6.19. Can this be put into OFED as well? > > - R. > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > b/drivers/infiniband/ulp/srp/ib_srp.c > index 3bf0c5b..4b09147 100644 > --- a/drivers/infiniband/ulp/srp/ib_srp.c > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > @@ -359,15 +359,16 @@ static int srp_send_req(struct srp_targe > > /* > * Topspin/Cisco SRP targets will reject our login unless we > - * zero out the first 8 bytes of our initiator port ID. The > - * second 8 bytes must be our local node GUID, but we always > - * use that anyway. > + * zero out the first 8 bytes of our initiator port ID and set > + * the second 8 bytes to the local node GUID. > */ > if (topspin_workarounds && !memcmp(&target->ioc_guid, > topspin_oui, 3)) { > printk(KERN_DEBUG PFX "Topspin/Cisco initiator > port ID workaround " > "activated for target GUID %016llx\n", > (unsigned long long) > be64_to_cpu(target->ioc_guid)); > memset(req->priv.initiator_port_id, 0, 8); > + memcpy(req->priv.initiator_port_id + 8, > + &target->srp_host->dev->dev->node_guid, 8); > } > > status = ib_send_cm_req(target->cm_id, &req->param); > From bugzilla-daemon at openib.org Mon Oct 9 15:35:45 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 9 Oct 2006 15:35:45 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20061009223545.7D52F2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #5 from sweitzen at cisco.com 2006-10-09 15:35 ------- A customer reported this problem only happens with Cisco SM, I have asked our SM team to investigate. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sweitzen at cisco.com Mon Oct 9 16:04:43 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 9 Oct 2006 16:04:43 -0700 Subject: [openib-general] Cisco SQA results for OFED 1.1 rc6 Message-ID: I forgot to summarize some IPoIB and SDP performance numbers. We saw SDP latency as low as 9.5 usec, SDP throughput as high as 8.33 Gb/sec on one port, IPoIB latency as low as 16.0 usec, and IPoIB throughput as high at 3.4 Gb/sec. This is all on RHEL4, the numbers varied quite a bit depending on the system and HCA used. Roland has seen IPoIB throughput of 5.5 Gb/sec using a newer kernel than what RHEL4 uses. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Monday, October 09, 2006 12:35 AM To: Open Fabrics Cc: openib-General Subject: [openib-general] Cisco SQA results for OFED 1.1 rc6 The testing in general went well. All testing was done on Mellanox HCAs, both SDR and DDR. Most testing was done on RHEL4 U3, but we have done some testing on RHEL4 U4 and SLES10, and in the future will test less and less on RHEL4 U3. See attached spreadsheet for more details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon Oct 9 22:47:27 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 9 Oct 2006 22:47:27 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: Aha, I found something in /etc/hosts, thanks for the hint. 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 If I comment this line out, MVAPICH works fine. Does Mellanox have this entry in /etc/hosts? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > Sent: Thursday, October 05, 2006 5:59 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Aviram Gutman; OpenFabricsEWG; openib > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > > I see it for all MVAPICH tests, it's 100% consistent. > > MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test > over mvapich > on SUSE10 platform ? > Please check /etc/hosts file on your machines, it should be > exactly the > same on all nodes. > > Regards, > Pasha > > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > >> -----Original Message----- > >> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >> Sent: Tuesday, October 03, 2006 3:37 AM > >> To: Scott Weitzenkamp (sweitzen) > >> Cc: Aviram Gutman; OpenFabricsEWG; openib > >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >> OFED 1.1 rc6 with SLES10 x86_64 > >> > >> Hi Scott, > >> Unfortunately was not able to reproduce the failure on our > platforms. > >> Do you see the problem with all tests or with the specific only ? > >> Is it consistent problem ? > >> > >> Regards, > >> Pasha > >> > >> Scott Weitzenkamp (sweitzen) wrote: > >>> $ uname -a > >>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 > >> 18:25:39 UTC 2006 > >>> x86_64 > >>> x86_64 x86_64 GNU/Linux > >>> $ > >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >>> 192.168.2.46 192.168.2.49 hostname > >>> svbu-qa1850-4 > >>> svbu-qa1850-3 > >>> $ > >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >>> 192.168.2.46 192.168.2.49 > >>> > >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench > > marks-2.2/ > >>> osu_latency > >>> > >>> The last command just hangs. Can I try your binary RPMs? > >>> > >>> Scott Weitzenkamp > >>> SQA and Release Manager > >>> Server Virtualization Business Unit > >>> Cisco Systems > >>> > >>> > >>>> -----Original Message----- > >>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > >>>> Sent: Sunday, October 01, 2006 2:29 AM > >>>> To: Scott Weitzenkamp (sweitzen) > >>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>> > >>>> Can you please elaborate on MVAPICH issues, can you send > >>>> command line? > >>>> We ran it here on 32 Opteron nodes each quad core and also > >> rigorous > >>>> tests on the many other nodes? > >>>> > >>>> > >>>> > >>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>> We are just getting started with OFED testing on SLES10, first > >>>>> platform is x86_64. > >>>>> > >>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > >>>> working so far. > >>>>> MVAPICH with OSU benchmarks just hang. This same > >> hardware works > >>>>> fine with OFED and RHEL4 U3. > >>>>> > >>>>> Has anyone else seen this? > >>>>> > >>>>> Scott Weitzenkamp > >>>>> SQA and Release Manager > >>>>> Server Virtualization Business Unit > >>>>> Cisco Systems > >>>>> > >>>>> > >>>> -------------------------------------------------------------- > >>>> ---------- > >>>>> _______________________________________________ > >>>>> openfabrics-ewg mailing list > >>>>> openfabrics-ewg at openib.org > >>>>> http://openib.org/mailman/listinfo/openfabrics-ewg > >>>>> > >> > >> -- > >> Pavel Shamis (Pasha) > >> Software Engineer > >> Mellanox Technologies LTD. > >> pasha at mellanox.co.il > >> > > > > > -- > Pavel Shamis (Pasha) > Software Engineer > Mellanox Technologies LTD. > pasha at mellanox.co.il > From bugzilla-daemon at openib.org Mon Oct 9 23:14:03 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 9 Oct 2006 23:14:03 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20061010061403.BD5642283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #6 from mst at mellanox.co.il 2006-10-09 23:14 ------- I think Cisco SM is dropping ports which do not answer its queries, opensm does not, which is why things still work with opensm even if the mad thread goes inactive for a while. BTW, the renice is currently done upon openibd start - could you test this works correctly for you? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kliteyn at dev.mellanox.co.il Mon Oct 9 23:16:48 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 10 Oct 2006 08:16:48 +0200 Subject: [openib-general] [PATCHv2 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <1160425828.4524.1657.camel@hal.voltaire.com> References: <1160425828.4524.1657.camel@hal.voltaire.com> Message-ID: <452B3AD0.9070400@dev.mellanox.co.il> Great, thanks. Now only one patch left - it also deals with this varargs macro issue, so it was applied only partially. I'll resubmit the patch shortly to save you the boredom of looking for it and extracting the changes that weren't applied. -- Yevgeny Hal Rosenstock wrote: > On Sun, 2006-10-08 at 11:42, Yevgeny Kliteynik wrote: >> Hi Hal >> >> This is the re-submission of the patch that was >> originally sibmitted by Eitan - just removing some >> cosmetic changes from the patch and re-diffing it >> with the trunk: >> >> 1. Avoid varargs macros not supported by Windows >> 2. Included additional header for PRIx64 macro >> >> Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik > > Thanks. Applied. > > -- Hal > From kliteyn at dev.mellanox.co.il Mon Oct 9 23:25:29 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 10 Oct 2006 08:25:29 +0200 Subject: [openib-general] [PATCHv2 2/13] osm: port to WinIB stack : opensm/osm_prtn_config.c Message-ID: Hi Hal This is the re-submission of the patch that was originally sibmitted by Eitan - avoiding varargs macros not supported by Windows. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osm_prtn_config.c =================================================================== --- osm_prtn_config.c (revision 9759) +++ osm_prtn_config.c (working copy) @@ -66,17 +66,6 @@ #define STRTO_IB_NET64(str, end, base) strtoull(str, end, base) #endif -#define PARSERR(log, lnum, fmt, arg...) { \ - osm_log(log, OSM_LOG_ERROR, \ - "PARSE ERROR: line %d: " fmt , (lnum), ##arg ); \ - fprintf(stderr, \ - "\nPARSE ERROR: line %d: " fmt "\n", (lnum), ##arg ); \ -} - -#define PARSEWARN(log, lnum, fmt, arg...) \ - osm_log(log, OSM_LOG_VERBOSE, \ - "PARSE WARN: line %d: " fmt , (lnum), ##arg ) - /* */ struct part_conf { @@ -150,29 +139,33 @@ static int partition_add_flag(unsigned l conf->is_ipoib = 1; } else if (!strncmp(flag, "mtu", len)) { if (!val || (conf->mtu = strtoul(val, NULL, 0)) == 0) - PARSEWARN(conf->p_log, lineno, - "flag \'mtu\' requires valid value" - " - skipped.\n"); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "flag \'mtu\' requires valid value" + " - skipped.\n", lineno); } else if (!strncmp(flag, "rate", len)) { if (!val || (conf->rate = strtoul(val, NULL, 0)) == 0) - PARSEWARN(conf->p_log, lineno, - "flag \'rate\' requires valid value" - " - skipped.\n"); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "flag \'rate\' requires valid value" + " - skipped.\n", lineno); } else if (!strncmp(flag, "sl", len)) { unsigned sl; char *end; if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 || (*end && !isspace(*end))) - PARSEWARN(conf->p_log, lineno, - "flag \'sl\' requires valid value" - " - skipped.\n"); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "flag \'sl\' requires valid value" + " - skipped.\n", lineno); else conf->sl = sl; } else { - PARSEWARN(conf->p_log, lineno, - "unrecognized partition flag \'%s\'" - " - ignored.\n", flag); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "unrecognized partition flag \'%s\'" + " - ignored.\n", lineno, flag); } return 0; } @@ -191,9 +184,10 @@ static int partition_add_port(unsigned l if (!strncmp(flag, "full", strlen(flag))) full = TRUE; else if (strncmp(flag, "limited", strlen(flag))) { - PARSEWARN(conf->p_log, lineno, - "unrecognized port flag \'%s\'." - " Assume \'limited\'\n", flag); + osm_log(conf->p_log, OSM_LOG_VERBOSE, + "PARSE WARN: line %d: " + "unrecognized port flag \'%s\'." + " Assume \'limited\'\n", lineno, flag); } } @@ -307,8 +301,9 @@ static int parse_part_conf(struct part_c q = strchr(p, ':'); if (!q) { - PARSERR(conf->p_log, lineno, - "no partition definition found\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "no partition definition found\n", lineno); return -1; } @@ -332,8 +327,9 @@ static int parse_part_conf(struct part_c *q++ = '\0'; ret = parse_name_token(p, &flag, &flval); if (!flag) { - PARSERR(conf->p_log, lineno, - "bad partition flags\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "bad partition flags\n",lineno); return -1; } p += ret; @@ -343,8 +339,9 @@ static int parse_part_conf(struct part_c if (p != str || (partition_create(lineno, conf, name, id, flag, flval) < 0)) { - PARSERR(conf->p_log, lineno, - "bad partition definition\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "bad partition definition\n", lineno); return -1; } @@ -356,8 +353,9 @@ static int parse_part_conf(struct part_c *q++ = '\0'; ret = parse_name_token(p, &name, &flag); if (partition_add_port(lineno, conf, name, flag) < 0) { - PARSERR(conf->p_log, lineno, - "bad PortGUID\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "bad PortGUID\n", lineno); return -1; } p += ret; @@ -406,8 +404,9 @@ int osm_prtn_config_parse_file(osm_log_t if (!conf && !(conf = new_part_conf(p_log, p_subn))) { - PARSERR(p_log, lineno, - "internal: cannot create config.\n"); + osm_log(conf->p_log, OSM_LOG_ERROR, + "PARSE ERROR: line %d: " + "internal: cannot create config.\n", lineno); break; } From mst at mellanox.co.il Mon Oct 9 23:53:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 10 Oct 2006 08:53:56 +0200 Subject: [openib-general] ipoib: ignores dma mapping errors on TX? In-Reply-To: References: <20061009144223.GB26849@mellanox.co.il> Message-ID: <20061010065356.GB25665@mellanox.co.il> Quoting r. Roland Dreier : > + if (unlikely(dma_mapping_error(addr))) { > + ++priv->stats.tx_errors; > + dev_kfree_skb_any(skb); > + return; > + } Do we want a warning there? -- MST From eitan at mellanox.co.il Tue Oct 10 00:42:57 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 10 Oct 2006 09:42:57 +0200 Subject: [openib-general] [PATCH]{TRIVIAL] ibdiagnet: Fix another typo In-Reply-To: <1160420750.30096.175491.camel@hal.voltaire.com> References: <1160420750.30096.175491.camel@hal.voltaire.com> Message-ID: <452B4F01.5020707@mellanox.co.il> Thanks. Applied Hal Rosenstock wrote: >ibdiagnet: Fix another typo > >performance rather than performence > >Signed-off-by: Hal Rosenstock > >Index: ibdiag/src/ibdebug_if.tcl >=================================================================== >--- ibdiag/src/ibdebug_if.tcl (revision 9752) >+++ ibdiag/src/ibdebug_if.tcl (working copy) >@@ -15,17 +15,17 @@ > # -deafult "" means that the parameter does have a default value, but it will set later (after ibis is ran, in porc startIBDebug). > ## TODO: sm_key is a 64-bit integer - will it be correctly cheked in parseArgv ? > array set InfoArgv { >- -P,name "query.performence.monitors" >+ -P,name "query.performance.monitors" > -P,desc "If any of the provided pm is greater then its provided value, print it to screen" > -P,param "=" > -P,regexp "pm.name.>=1" > -P,error "-E-argv:not.legal.PM" > >- -pc,name "reset.performence.monitors" >+ -pc,name "reset.performance.monitors" > -pc,arglen 0 > -pc,desc "reset all the fabric links pmCounters" > >- -pm,name "performence.monitors" >+ -pm,name "performance.monitors" > -pm,arglen 0 > -pm,desc "Dumps all pmCounters values into .pm file" > >Index: ibdiag/src/ibdebug.tcl >=================================================================== >--- ibdiag/src/ibdebug.tcl (revision 9752) >+++ ibdiag/src/ibdebug.tcl (working copy) >@@ -1745,7 +1745,7 @@ proc PMCounterQuery {} { > # preparing database for reading PMs > if {![catch {set tmpLID [GetParamValue LID $directPath -port $entryPort]}]} { > if { $tmpLID != 0 } { >- if {[info exists G(argv,reset.performence.monitors)]} { >+ if {[info exists G(argv,reset.performance.monitors)]} { > catch {pmClrAllCounters $tmpLID $entryPort} > } > set tmpLidPort "$tmpLID:$entryPort" >@@ -1767,7 +1767,7 @@ proc PMCounterQuery {} { > unset tmpLidPort > if {![catch {set tmpLID [GetParamValue LID $directPath -port $entryPort]}]} { > if { $tmpLID != 0 } { >- if {[info exists G(argv,reset.performence.monitors)]} { >+ if {[info exists G(argv,reset.performance.monitors)]} { > catch {pmClrAllCounters $tmpLID $entryPort} > } > set tmpLidPort "$tmpLID:$entryPort" >@@ -1839,7 +1839,7 @@ proc PMCounterQuery {} { > inform "-W-ibdiagnet:bad.pm.counter.report" -deviceName $name -listOfErrors $badValues > } > >- if {[info exists G(argv,performence.monitors)]} { >+ if {[info exists G(argv,performance.monitors)]} { > lappend PM_DUMP(nodeNames) $name > set PM_DUMP($name,pmCounterList) $pmCounterList > set PM_DUMP($name,pmCounterValue) $newValues($tmpLidPort) >@@ -1848,7 +1848,7 @@ proc PMCounterQuery {} { > if {$firstPMcounter == 0} { > inform "-I-ibdiagnet:no.pm.counter.report" > } >- if {[info exists G(argv,performence.monitors)]} { >+ if {[info exists G(argv,performance.monitors)]} { > writePMFile > } > return 1 >@@ -2377,8 +2377,8 @@ proc ComparePMCounters { oldValues newVa > > set errList "" > set pmRequestList "" >- if {[info exists G(argv,query.performence.monitors)]} { >- set pmRequestList [split $G(argv,query.performence.monitors) {, =}] >+ if {[info exists G(argv,query.performance.monitors)]} { >+ set pmRequestList [split $G(argv,query.performance.monitors) {, =}] > } > foreach parameter [array names InfoPm] { > ParseOptionsList $InfoPm($parameter) >@@ -2394,7 +2394,7 @@ proc ComparePMCounters { oldValues newVa > lappend errList "$parameter valueChange $oldValue->$newValue" > } elseif { ( $oldValue == $overflow ) || ( $newValue == $overflow ) } { > lappend errList "$parameter overflow $overflow" >- } elseif {[info exists G(argv,query.performence.monitors)]} { >+ } elseif {[info exists G(argv,query.performance.monitors)]} { > if {[lsearch $pmRequestList $parameter] != -1} { > set pmTrash [WordAfterFlag $pmRequestList $parameter] > if {$newValue >= $pmTrash} { > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From bugzilla-daemon at openib.org Tue Oct 10 03:31:24 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 10 Oct 2006 03:31:24 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20061010103124.DE9CA2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 ------- Comment #7 from halr at voltaire.com 2006-10-10 03:31 ------- What timeout does Cisco SM use for transactions ? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Tue Oct 10 03:47:12 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Oct 2006 06:47:12 -0400 Subject: [openib-general] [PATCHv2 2/13] osm: port to WinIB stack : opensm/osm_prtn_config.c In-Reply-To: References: Message-ID: <1160477232.4524.39028.camel@hal.voltaire.com> On Tue, 2006-10-10 at 02:25, Yevgeny Kliteynik wrote: > Hi Hal > > This is the re-submission of the patch that was > originally sibmitted by Eitan - avoiding varargs > macros not supported by Windows. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied with some cosmetic changes and readded fprintf's to stderr on PARSE ERRORs. Please make sure there is no trailing whitespace in your patches. -- Hal From halr at voltaire.com Tue Oct 10 03:47:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Oct 2006 06:47:24 -0400 Subject: [openib-general] [PATCHv2 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <452B3AD0.9070400@dev.mellanox.co.il> References: <1160425828.4524.1657.camel@hal.voltaire.com> <452B3AD0.9070400@dev.mellanox.co.il> Message-ID: <1160477239.4524.39030.camel@hal.voltaire.com> On Tue, 2006-10-10 at 02:16, Yevgeny Kliteynik wrote: > Great, thanks. > > Now only one patch left I think there was one more in addition that was agreed to: Add Windows defines to config.h and remove from various opensm files. Will you being doing this once the other one is accepted ? > - it also deals with this varargs > macro issue, so it was applied only partially. > I'll resubmit the patch shortly to save you the boredom > of looking for it and extracting the changes that weren't > applied. Thanks. -- Hal > > -- > Yevgeny > > Hal Rosenstock wrote: > > On Sun, 2006-10-08 at 11:42, Yevgeny Kliteynik wrote: > >> Hi Hal > >> > >> This is the re-submission of the patch that was > >> originally sibmitted by Eitan - just removing some > >> cosmetic changes from the patch and re-diffing it > >> with the trunk: > >> > >> 1. Avoid varargs macros not supported by Windows > >> 2. Included additional header for PRIx64 macro > >> > >> Yevgeny > >> > >> Signed-off-by: Yevgeny Kliteynik > > > > Thanks. Applied. > > > > -- Hal > > From halr at voltaire.com Tue Oct 10 04:17:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Oct 2006 07:17:24 -0400 Subject: [openib-general] [PATCH]OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout In-Reply-To: <452A96FB.1050403@mellanox.co.il> References: <1160415392.30096.171632.camel@hal.voltaire.com> <452A96FB.1050403@mellanox.co.il> Message-ID: <1160479043.4524.40370.camel@hal.voltaire.com> Hi Eitan, On Mon, 2006-10-09 at 14:37, Eitan Zahavi wrote: > Hi Hal, > > I would rather remove all of these. At least for the non debug build. Fine for the stubs. > I do not see how they help the user . The one in osm_vendor_open_port is useful for the OpenIB diags when the open fails due to an RNIC being selected. This was discussed on the list a while ago when RNICs were first being integrated. -- Hal > What do you say? > > EZ > Hal Rosenstock wrote: > > >OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than > >stdout > > > >Signed-off-by: Hal Rosenstock > > > >Index: libvendor/osm_vendor_ibumad.c > >=================================================================== > >--- libvendor/osm_vendor_ibumad.c (revision 9751) > >+++ libvendor/osm_vendor_ibumad.c (working copy) > >@@ -745,8 +745,8 @@ osm_vendor_open_port( > > "osm_vendor_open_port: ERR 542D: " > > "Node type %d is not an IB node type\n", > > p_vend->umad_ca.node_type ); > >- printf( "\nNode type %d is not an IB node type\n", > >- p_vend->umad_ca.node_type ); > >+ fprintf( stderr, "\nNode type %d is not an IB node type\n", > >+ p_vend->umad_ca.node_type ); > > goto Exit; > > } > > > >@@ -952,7 +952,7 @@ __osm_vendor_recv_dummy_cb( > > IN void *bind_context, > > IN osm_madw_t *p_req_madw ) > > { > >- printf("Ignoring received MAD after osm_vendor_unbind\n"); > >+ fprintf(stderr, "__osm_vendor_recv_dummy_cb: Ignoring received MAD after osm_vendor_unbind\n"); > > } > > > > /********************************************************************** > >@@ -962,7 +962,7 @@ __osm_vendor_send_err_dummy_cb( > > IN void* bind_context, > > IN osm_madw_t *p_req_madw ) > > { > >- printf("Ignoring send error after osm_vendor_unbind\n"); > >+ fprintf(stderr, "__osm_vendor_send_err_dummy_cb: Ignoring send error after osm_vendor_unbind\n"); > > } > > > > /********************************************************************** > > > > > > > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > From halr at voltaire.com Tue Oct 10 04:17:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Oct 2006 07:17:39 -0400 Subject: [openib-general] [PATCHv2] OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout Message-ID: <1160479047.4524.40372.camel@hal.voltaire.com> OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout Signed-off-by: Hal Rosenstock Index: libvendor/osm_vendor_ibumad.c =================================================================== --- libvendor/osm_vendor_ibumad.c (revision 9765) +++ libvendor/osm_vendor_ibumad.c (working copy) @@ -745,8 +745,8 @@ osm_vendor_open_port( "osm_vendor_open_port: ERR 542D: " "Node type %d is not an IB node type\n", p_vend->umad_ca.node_type ); - printf( "\nNode type %d is not an IB node type\n", - p_vend->umad_ca.node_type ); + fprintf( stderr, "Node type %d is not an IB node type\n", + p_vend->umad_ca.node_type ); goto Exit; } @@ -952,7 +952,9 @@ __osm_vendor_recv_dummy_cb( IN void *bind_context, IN osm_madw_t *p_req_madw ) { - printf("Ignoring received MAD after osm_vendor_unbind\n"); +#ifdef _DEBUG_ + fprintf(stderr, "__osm_vendor_recv_dummy_cb: Ignoring received MAD after osm_vendor_unbind\n"); +#endif } /********************************************************************** @@ -962,7 +964,9 @@ __osm_vendor_send_err_dummy_cb( IN void* bind_context, IN osm_madw_t *p_req_madw ) { - printf("Ignoring send error after osm_vendor_unbind\n"); +#ifdef _DEBUG_ + fprintf(stderr, "__osm_vendor_send_err_dummy_cb: Ignoring send error after osm_vendor_unbind\n"); +#endif } /********************************************************************** From eitan at mellanox.co.il Tue Oct 10 05:11:40 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 10 Oct 2006 14:11:40 +0200 Subject: [openib-general] [PATCHv2] OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than stdout In-Reply-To: <1160479047.4524.40372.camel@hal.voltaire.com> References: <1160479047.4524.40372.camel@hal.voltaire.com> Message-ID: <452B8DFC.4090103@mellanox.co.il> Thanks. Please apply. Eitan Hal Rosenstock wrote: >OpenSM/libvendor/osm_vendor_ibumad.c: Print errors to stderr rather than >stdout > >Signed-off-by: Hal Rosenstock > >Index: libvendor/osm_vendor_ibumad.c >=================================================================== >--- libvendor/osm_vendor_ibumad.c (revision 9765) >+++ libvendor/osm_vendor_ibumad.c (working copy) >@@ -745,8 +745,8 @@ osm_vendor_open_port( > "osm_vendor_open_port: ERR 542D: " > "Node type %d is not an IB node type\n", > p_vend->umad_ca.node_type ); >- printf( "\nNode type %d is not an IB node type\n", >- p_vend->umad_ca.node_type ); >+ fprintf( stderr, "Node type %d is not an IB node type\n", >+ p_vend->umad_ca.node_type ); > goto Exit; > } > >@@ -952,7 +952,9 @@ __osm_vendor_recv_dummy_cb( > IN void *bind_context, > IN osm_madw_t *p_req_madw ) > { >- printf("Ignoring received MAD after osm_vendor_unbind\n"); >+#ifdef _DEBUG_ >+ fprintf(stderr, "__osm_vendor_recv_dummy_cb: Ignoring received MAD after osm_vendor_unbind\n"); >+#endif > } > > /********************************************************************** >@@ -962,7 +964,9 @@ __osm_vendor_send_err_dummy_cb( > IN void* bind_context, > IN osm_madw_t *p_req_madw ) > { >- printf("Ignoring send error after osm_vendor_unbind\n"); >+#ifdef _DEBUG_ >+ fprintf(stderr, "__osm_vendor_send_err_dummy_cb: Ignoring send error after osm_vendor_unbind\n"); >+#endif > } > > /********************************************************************** > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Tue Oct 10 05:37:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Oct 2006 08:37:07 -0400 Subject: [openib-general] [PATCH] Diags: Add initial support for IB routers Message-ID: <1160483827.4524.43749.camel@hal.voltaire.com> Diags: Add initial support for IB routers Signed-off-by: Hal Rosenstock Index: src/ibtracert.c =================================================================== --- src/ibtracert.c (revision 9767) +++ src/ibtracert.c (working copy) @@ -51,6 +51,14 @@ #define MAXHOPS 63 +static char *node_type_str[] = { + "???", + "ca", + "switch", + "router", + "iwarp rnic" +}; + static int timeout = 0; /* ms */ static int verbose; static int force; @@ -219,13 +227,15 @@ dump_endnode(int dump, char *prompt, Nod } #if __WORDSIZE == 64 fprintf(f, "%s %s {%016lx} portnum %d lid 0x%x-0x%x \"%s\"\n", - prompt, node->type == IB_NODE_SWITCH ? "switch" : "ca", + prompt, + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->type == IB_NODE_SWITCH ? 0 : port->portnum, port->lid, port->lid + (1 << port->lmc) - 1, node->nodedesc); #else fprintf(f, "%s %s {%016Lx} portnum %d lid 0x%x-0x%x \"%s\"\n", - prompt, node->type == IB_NODE_SWITCH ? "switch" : "ca", + prompt, + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->type == IB_NODE_SWITCH ? 0 : port->portnum, port->lid, port->lid + (1 << port->lmc) - 1, node->nodedesc); @@ -243,7 +253,8 @@ dump_route(int dump, Node *node, int out outport, port->portguid, port->portnum); else fprintf(f, "[%d] -> %s port {%016lx}[%d] lid 0x%x-0x%x \"%s\"\n", - outport, node->type == IB_NODE_SWITCH ? "switch" : "ca", + outport, + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), port->portguid, port->portnum, port->lid, port->lid + (1 << port->lmc) - 1, node->nodedesc); @@ -253,7 +264,8 @@ dump_route(int dump, Node *node, int out outport, port->portguid, port->portnum); else fprintf(f, "[%d] -> %s port {%016Lx}[%d] lid 0x%x-0x%x \"%s\"\n", - outport, node->type == IB_NODE_SWITCH ? "switch" : "ca", + outport, + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), port->portguid, port->portnum, port->lid, port->lid + (1 << port->lmc) - 1, node->nodedesc); @@ -314,17 +326,19 @@ find_route(ib_portid_t *from, ib_portid_ else break; /* found SMA port */ } - } else if (node->type == IB_NODE_CA) { + } else if ((node->type == IB_NODE_CA) || + (node->type == IB_NODE_ROUTER)) { int ca_src = 0; - DEBUG("ca node"); + + DEBUG("ca or router node"); if (!sameport(port, &fromport)) { - IBWARN("can't continue: reached CA port %Lx, lid %d", + IBWARN("can't continue: reached CA or router port %Lx, lid %d", port->portguid, port->lid); return -1; } - /* we are at CA "from" - go one hop back to (hopefully) a switch */ + /* we are at CA or router "from" - go one hop back to (hopefully) a switch */ if (from->drpath.cnt > 0) { - DEBUG("ca node - return back 1 hop"); + DEBUG("ca or router node - return back 1 hop"); from->drpath.cnt--; } else { ca_src = 1; @@ -332,15 +346,14 @@ find_route(ib_portid_t *from, ib_portid_ goto badpath; } /* - * else - we are running on a CA! that is impressive - - * when this code was written CAs were not supported... + * else - we are running on a CA or router! that is impressive - + * when this code was written, CAs and routers were not supported... */ - if (get_node(&nextnode, &nextport, from) < 0) { IBWARN("can't reach port at %s", portid2str(from)); return -1; } - /* fix port num to be seen from the CA side */ + /* fix port num to be seen from the CA or router side */ if (!ca_src) nextport.portnum = from->drpath.p[from->drpath.cnt+1]; } @@ -660,13 +673,13 @@ dump_mcpath(Node *node, int dumplevel) if (!node->dist) { #if __WORDSIZE == 64 printf("From %s 0x%lx port %d lid 0x%x-0x%x \"%s\"\n", - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->ports->portnum, node->ports->lid, node->ports->lid + (1 << node->ports->lmc) - 1, node->nodedesc); #else printf("From %s 0x%Lx port %d lid 0x%x-0x%x \"%s\"\n", - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->ports->portnum, node->ports->lid, node->ports->lid + (1 << node->ports->lmc) - 1, node->nodedesc); @@ -679,24 +692,24 @@ dump_mcpath(Node *node, int dumplevel) if (dumplevel == 1) printf("[%d] -> %s {%016lx}[%d]\n", node->ports->remoteport->portnum, - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->upport); else printf("[%d] -> %s 0x%lx[%d] lid 0x%x \"%s\"\n", node->ports->remoteport->portnum, - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->upport, node->ports->lid, node->nodedesc); #else if (dumplevel == 1) printf("[%d] -> %s {%016Lx}[%d]\n", node->ports->remoteport->portnum, - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->upport); else printf("[%d] -> %s 0x%Lx[%d] lid 0x%x \"%s\"\n", node->ports->remoteport->portnum, - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->upport, node->ports->lid, node->nodedesc); #endif @@ -706,13 +719,13 @@ dump_mcpath(Node *node, int dumplevel) /* target node */ #if __WORDSIZE == 64 printf("To %s 0x%lx port %d lid 0x%x-0x%x \"%s\"\n", - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->ports->portnum, node->ports->lid, node->ports->lid + (1 << node->ports->lmc) - 1, node->nodedesc); #else printf("To %s 0x%Lx port %d lid 0x%x-0x%x \"%s\"\n", - node->type == IB_NODE_SWITCH ? "switch" : "ca", + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->ports->portnum, node->ports->lid, node->ports->lid + (1 << node->ports->lmc) - 1, node->nodedesc); Index: src/ibnetdiscover.c =================================================================== --- src/ibnetdiscover.c (revision 9748) +++ src/ibnetdiscover.c (working copy) @@ -52,6 +52,14 @@ #include "ibnetdiscover.h" +static char *node_type_str[] = { + "???", + "ca", + "switch", + "router", + "iwarp rnic" +}; + static int timeout = 2000; /* ms */ static int dumplevel = 0; static int chassisnum = 0; @@ -207,13 +215,15 @@ dump_endnode(ib_portid_t *path, char *pr #if __WORDSIZE == 64 fprintf(f, "%s -> %s %s {%016lx} portnum %d lid %d-%d\"%s\"\n", - portid2str(path), prompt, node->type == SWITCH_NODE ? "switch" : "ca", + portid2str(path), prompt, + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->type == SWITCH_NODE ? 0 : port->portnum, port->lid, port->lid + (1 << port->lmc) - 1, clean_nodedesc(node->nodedesc)); #else fprintf(f, "%s -> %s %s {%016Lx} portnum %d lid %d-%d\"%s\"\n", - portid2str(path), prompt, node->type == SWITCH_NODE ? "switch" : "ca", + portid2str(path), prompt, + (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), node->nodeguid, node->type == SWITCH_NODE ? 0 : port->portnum, port->lid, port->lid + (1 << port->lmc) - 1, clean_nodedesc(node->nodedesc)); From halr at voltaire.com Tue Oct 10 06:00:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 Oct 2006 09:00:46 -0400 Subject: [openib-general] [PATCH] OpenSM: Another change for IB router support Message-ID: <1160485246.4524.44706.camel@hal.voltaire.com> OpenSM: Another change for IB router support Handle HOQLife and VLStallCount for IB router ports like CA ports Signed-off-by: Hal Rosenstock Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 9769) +++ include/opensm/osm_subnet.h (working copy) @@ -358,21 +358,21 @@ typedef struct _osm_subn_opt * leaf_vl_stall_count * The number of sequential packets dropped that cause the port * to enter the VLStalled state. This is for switch ports driving -* a CA port. +* a CA or router port. * * head_of_queue_lifetime * The maximal time a packet can live at the head of a VL queue -* on any port not driving a CA port +* on any port not driving a CA or router port. * * leaf_head_of_queue_lifetime * The maximal time a packet can live at the head of a VL queue -* on switch ports driving a CA +* on switch ports driving a CA or router. * * local_phy_errors_threshold * Threshold of local phy errors for sending Trap 129 * * overrun_errors_threshold -* Threshold of credits over-run errors for sending Trap 129 +* Threshold of credits overrun errors for sending Trap 129 * * sminfo_polling_timeout * Specifies the polling timeout (in milliseconds) - the timeout Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9769) +++ opensm/osm_subnet.c (working copy) @@ -1040,8 +1040,8 @@ osm_subn_write_conf_file( "vl_stall_count 0x%02x\n\n" "# The number of sequential packets dropped that cause the port\n" "# to enter the VLStalled state. This value is for switch ports\n" - "# driving a CA port. The result of setting this value to zero\n" - "# is undefined.\n" + "# driving a CA or router port. The result of setting this value\n" + "# to zero is undefined.\n" "leaf_vl_stall_count 0x%02x\n\n" "# The code of maximal time a packet can wait at the head of\n" "# transmission queue. \n" @@ -1049,7 +1049,7 @@ osm_subn_write_conf_file( "# The value 0x14 disables this mechanism\n" "head_of_queue_lifetime 0x%02x\n\n" "# The maximal time a packet can wait at the head of queue on \n" - "# switch port connected to a CA port\n" + "# switch port connected to a CA or router port\n" "leaf_head_of_queue_lifetime 0x%02x\n\n" "# Limit the maximal operational VLs\n" "max_op_vls %u\n\n" Index: opensm/osm_link_mgr.c =================================================================== --- opensm/osm_link_mgr.c (revision 9769) +++ opensm/osm_link_mgr.c (working copy) @@ -269,8 +269,9 @@ __osm_link_mgr_set_physp_pi( else if (osm_node_get_type(osm_physp_get_node_ptr(p_physp)) == IB_NODE_TYPE_SWITCH) { - if (osm_node_get_type(osm_physp_get_node_ptr(p_remote_physp)) == - IB_NODE_TYPE_CA) + /* Is remote end CA or router ? */ + if (osm_node_get_type(osm_physp_get_node_ptr(p_remote_physp)) != + IB_NODE_TYPE_SWITCH) { ib_port_info_set_hoq_lifetime( p_pi, p_mgr->p_subn->opt.leaf_head_of_queue_lifetime); From eitan at mellanox.co.il Tue Oct 10 06:42:40 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 10 Oct 2006 15:42:40 +0200 Subject: [openib-general] [PATCH] OpenSM: Another change for IB router support In-Reply-To: <1160485246.4524.44706.camel@hal.voltaire.com> References: <1160485246.4524.44706.camel@hal.voltaire.com> Message-ID: <452BA350.4010403@mellanox.co.il> Looks good. Hal Rosenstock wrote: >OpenSM: Another change for IB router support > >Handle HOQLife and VLStallCount for IB router ports like CA ports > >Signed-off-by: Hal Rosenstock > >Index: include/opensm/osm_subnet.h >=================================================================== >--- include/opensm/osm_subnet.h (revision 9769) >+++ include/opensm/osm_subnet.h (working copy) >@@ -358,21 +358,21 @@ typedef struct _osm_subn_opt > * leaf_vl_stall_count > * The number of sequential packets dropped that cause the port > * to enter the VLStalled state. This is for switch ports driving >-* a CA port. >+* a CA or router port. > * > * head_of_queue_lifetime > * The maximal time a packet can live at the head of a VL queue >-* on any port not driving a CA port >+* on any port not driving a CA or router port. > * > * leaf_head_of_queue_lifetime > * The maximal time a packet can live at the head of a VL queue >-* on switch ports driving a CA >+* on switch ports driving a CA or router. > * > * local_phy_errors_threshold > * Threshold of local phy errors for sending Trap 129 > * > * overrun_errors_threshold >-* Threshold of credits over-run errors for sending Trap 129 >+* Threshold of credits overrun errors for sending Trap 129 > * > * sminfo_polling_timeout > * Specifies the polling timeout (in milliseconds) - the timeout >Index: opensm/osm_subnet.c >=================================================================== >--- opensm/osm_subnet.c (revision 9769) >+++ opensm/osm_subnet.c (working copy) >@@ -1040,8 +1040,8 @@ osm_subn_write_conf_file( > "vl_stall_count 0x%02x\n\n" > "# The number of sequential packets dropped that cause the port\n" > "# to enter the VLStalled state. This value is for switch ports\n" >- "# driving a CA port. The result of setting this value to zero\n" >- "# is undefined.\n" >+ "# driving a CA or router port. The result of setting this value\n" >+ "# to zero is undefined.\n" > "leaf_vl_stall_count 0x%02x\n\n" > "# The code of maximal time a packet can wait at the head of\n" > "# transmission queue. \n" >@@ -1049,7 +1049,7 @@ osm_subn_write_conf_file( > "# The value 0x14 disables this mechanism\n" > "head_of_queue_lifetime 0x%02x\n\n" > "# The maximal time a packet can wait at the head of queue on \n" >- "# switch port connected to a CA port\n" >+ "# switch port connected to a CA or router port\n" > "leaf_head_of_queue_lifetime 0x%02x\n\n" > "# Limit the maximal operational VLs\n" > "max_op_vls %u\n\n" >Index: opensm/osm_link_mgr.c >=================================================================== >--- opensm/osm_link_mgr.c (revision 9769) >+++ opensm/osm_link_mgr.c (working copy) >@@ -269,8 +269,9 @@ __osm_link_mgr_set_physp_pi( > else if (osm_node_get_type(osm_physp_get_node_ptr(p_physp)) == > IB_NODE_TYPE_SWITCH) > { >- if (osm_node_get_type(osm_physp_get_node_ptr(p_remote_physp)) == >- IB_NODE_TYPE_CA) >+ /* Is remote end CA or router ? */ >+ if (osm_node_get_type(osm_physp_get_node_ptr(p_remote_physp)) != >+ IB_NODE_TYPE_SWITCH) > { > ib_port_info_set_hoq_lifetime( > p_pi, p_mgr->p_subn->opt.leaf_head_of_queue_lifetime); > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Tue Oct 10 07:31:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 10 Oct 2006 16:31:27 +0200 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: References: <20060927083003.GB22263@mellanox.co.il> Message-ID: <20061010143126.GC27845@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: enable GSO over IPoIB > > "Michael S. Tsirkin" wrote on 09/27/2006 01:30:03 AM: > >Any idea what does ethtool do that IPoIB can't support? > > > ethtool is an ethernet device tool. It's OK to partically implement ethtool > operations in IPoIB. I just looked at this - this is just a matter of calling SET_ETHTOOL_OPS, isn't it? > We also need to patch the userlevel utility to support > ibX interface. Now it only supports ethX. Probably not a big deal either, right? -- MST From mst at mellanox.co.il Tue Oct 10 07:43:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 10 Oct 2006 16:43:30 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061009095051.38ed9f22@freekitty> References: <20061009095051.38ed9f22@freekitty> Message-ID: <20061010144330.GA28175@mellanox.co.il> Quoting r. Stephen Hemminger : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > On Mon, 9 Oct 2006 19:47:05 +0200 > "Michael S. Tsirkin" wrote: > > > Hi! > > I'm trying to build a network device driver supporting a very large MTU (around 64K) > > on top of an infiniband connection, and I've hit a couple of issues I'd > > appreciate some feedback on: > > > > 1. On the send side, > > I've set NETIF_F_SG, but hardware does not support checksum offloading, > > and I see "dropping NETIF_F_SG since no checksum feature" warning, > > and I seem to be getting large packets all in one chunk. > > The reason I've set NETIF_F_SG, is because I'm concerned that under real life > > stress Linux won't be able to allocate 64K of continuous memory. > > > > Is this concern of mine valid? I saw in-tree drivers allocating at least 8K. > > What's the best way to enable S/G on send side? > > Is checksum offloading really required for S/G? > > Yes, in the current implementation, Linux needs checksum offload. But there > is no reason, your driver can't compute the checksum in software. Are there drivers that do this already? Couldn't find any such beast ... I'm worried whether an extra pass over data won't eat up all of the performance gains I get from the large MTU ... > > What are the helpers legal for fragmented skb? BTW, I found skb_put_frags in sky2 which seems generic enough - I even wander why isn't this in net/core. Thanks! -- MST From rdreier at cisco.com Tue Oct 10 08:22:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 10 Oct 2006 08:22:57 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: <20061010143126.GC27845@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 10 Oct 2006 16:31:27 +0200") References: <20060927083003.GB22263@mellanox.co.il> <20061010143126.GC27845@mellanox.co.il> Message-ID: >> We also need to patch the userlevel utility to support ibX >> interface. Now it only supports ethX. Is this really even true? Surely ethtool supports people who rename their ethX interfaces to something else, even ibX? For example on my system I can do the following: # ip link set eth1 name ib5 # ethtool ib5 Settings for ib5: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! (65535) Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: no So what's the problem exactly? - R. From tom at opengridcomputing.com Tue Oct 10 10:24:18 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 10 Oct 2006 12:24:18 -0500 Subject: [openib-general] ipoib: ignores dma mapping errors on TX? In-Reply-To: Message-ID: Does anyone know what might happen if a device tries to bus master bad_dma_address. Does it get a pci-abort, an NMI, a bus err interrupt, all of the above? On 10/9/06 1:01 PM, "Roland Dreier" wrote: > Michael> It seems that IPoIB ignores the possibility that > Michael> dma_map_single with DMA_TO_DEVICE direction might return > Michael> dma_mapping_error. > > Michael> Is there some reason that such mappings can't fail? > > No, it's just an oversight. Most network device drivers don't check > for DMA mapping errors but it's probably better to do so anyway. I > added this to my queue: > > commit 8edaf479946022d67350d6c344952fb65064e51b > Author: Roland Dreier > Date: Mon Oct 9 10:54:20 2006 -0700 > > IPoIB: Check for DMA mapping error for TX packets > > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index f426a69..8bf5e9e 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev, > tx_req->skb = skb; > addr = dma_map_single(priv->ca->dma_device, skb->data, skb->len, > DMA_TO_DEVICE); > + if (unlikely(dma_mapping_error(addr))) { > + ++priv->stats.tx_errors; > + dev_kfree_skb_any(skb); > + return; > + } > pci_unmap_addr_set(tx_req, mapping, addr); > > if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From shemminger at osdl.org Tue Oct 10 10:43:15 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Tue, 10 Oct 2006 10:43:15 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061010144330.GA28175@mellanox.co.il> References: <20061009095051.38ed9f22@freekitty> <20061010144330.GA28175@mellanox.co.il> Message-ID: <20061010104315.61540986@freekitty> On Tue, 10 Oct 2006 16:43:30 +0200 "Michael S. Tsirkin" wrote: > Quoting r. Stephen Hemminger : > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > On Mon, 9 Oct 2006 19:47:05 +0200 > > "Michael S. Tsirkin" wrote: > > > > > Hi! > > > I'm trying to build a network device driver supporting a very large MTU (around 64K) > > > on top of an infiniband connection, and I've hit a couple of issues I'd > > > appreciate some feedback on: > > > > > > 1. On the send side, > > > I've set NETIF_F_SG, but hardware does not support checksum offloading, > > > and I see "dropping NETIF_F_SG since no checksum feature" warning, > > > and I seem to be getting large packets all in one chunk. > > > The reason I've set NETIF_F_SG, is because I'm concerned that under real life > > > stress Linux won't be able to allocate 64K of continuous memory. > > > > > > Is this concern of mine valid? I saw in-tree drivers allocating at least 8K. > > > What's the best way to enable S/G on send side? > > > Is checksum offloading really required for S/G? > > > > Yes, in the current implementation, Linux needs checksum offload. But there > > is no reason, your driver can't compute the checksum in software. > > Are there drivers that do this already? Couldn't find any such beast ... dev_queue_xmit() does it, all you need to do in your driver is: /* If packet is not checksummed and device does not support * checksumming for this protocol, complete checksumming here. */ if (skb->ip_summed == CHECKSUM_PARTIAL) { if (skb_checksum_help(skb)) goto error_recovery } > I'm worried whether an extra pass over data won't eat up all of > the performance gains I get from the large MTU ... Yup, the cost is in touching the data, not in the copy. > > > What are the helpers legal for fragmented skb? > > BTW, I found skb_put_frags in sky2 which seems generic enough - I even wander > why isn't this in net/core. > Only because I just wrote it for my needs. If you need it, then it can be moved to skbuff.c -- Stephen Hemminger From krause at cup.hp.com Tue Oct 10 11:26:21 2006 From: krause at cup.hp.com (Michael Krause) Date: Tue, 10 Oct 2006 11:26:21 -0700 Subject: [openib-general] ipoib: ignores dma mapping errors on TX? In-Reply-To: References: Message-ID: <6.2.0.14.2.20061010112325.02f1e068@esmail.cup.hp.com> At 10:24 AM 10/10/2006, Tom Tucker wrote: >Does anyone know what might happen if a device tries to bus master >bad_dma_address. Does it get a pci-abort, an NMI, a bus err interrupt, all >of the above? It depends upon the platform. Some will enter a containment mode and, for example, shutdown the PCI Bus or the PCIe Root Port. Others may trigger a system error and shutdown the system. These responses are in part, a policy of the implementation and how the system is implemented. In future chipsets that contain IOMMU / Address Translation Protection Tables (ATPT) / pick your favorite name, the error can be contained to a single device and the appropriate error recovery triggered without requiring the system to go down. Again, all policy at the end of the day as to what action is triggered. For most, the potential for silent data corruption is too high to risk that bus or Root Port from continuing to operate without a reset / flush so containment is used at a minimum. Mike >On 10/9/06 1:01 PM, "Roland Dreier" wrote: > > > Michael> It seems that IPoIB ignores the possibility that > > Michael> dma_map_single with DMA_TO_DEVICE direction might return > > Michael> dma_mapping_error. > > > > Michael> Is there some reason that such mappings can't fail? > > > > No, it's just an oversight. Most network device drivers don't check > > for DMA mapping errors but it's probably better to do so anyway. I > > added this to my queue: > > > > commit 8edaf479946022d67350d6c344952fb65064e51b > > Author: Roland Dreier > > Date: Mon Oct 9 10:54:20 2006 -0700 > > > > IPoIB: Check for DMA mapping error for TX packets > > > > Signed-off-by: Roland Dreier > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > > b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > > index f426a69..8bf5e9e 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > > @@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev, > > tx_req->skb = skb; > > addr = dma_map_single(priv->ca->dma_device, skb->data, skb->len, > > DMA_TO_DEVICE); > > + if (unlikely(dma_mapping_error(addr))) { > > + ++priv->stats.tx_errors; > > + dev_kfree_skb_any(skb); > > + return; > > + } > > pci_unmap_addr_set(tx_req, mapping, addr); > > > > if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue Oct 10 14:03:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 10 Oct 2006 14:03:02 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This includes various fixes found since 2.6.19-rc1: Ishai Rabinovitz: IB/srp: Remove redundant memset() IB/srp: Enable multiple connections to the same target Jack Morgenstein: IB/mthca: Query port fix Michael S. Tsirkin: IB/mthca: Fix off-by-one in mthca SRQ creation Roland Dreier: RDMA/amso1100: Fix build with debugging off IPoIB: Check for DMA mapping error for TX packets Sean Hefty: IB/cm: Fix timewait crash after module unload IB/cm: Send DREP in response to unmatched DREQ Tom Tucker: RDMA/amso1100: Add spinlocks to serialize ib_post_send/ib_post_recv drivers/infiniband/core/cm.c | 84 ++++++++++++++++++++------ drivers/infiniband/hw/amso1100/c2_ae.c | 2 - drivers/infiniband/hw/amso1100/c2_qp.c | 16 ++++- drivers/infiniband/hw/mthca/mthca_provider.c | 2 + drivers/infiniband/hw/mthca/mthca_srq.c | 6 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 5 ++ drivers/infiniband/ulp/srp/ib_srp.c | 27 +++++--- drivers/infiniband/ulp/srp/ib_srp.h | 2 - 8 files changed, 106 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index f35fcc4..25b1018 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -75,6 +75,7 @@ static struct ib_cm { struct rb_root remote_sidr_table; struct idr local_id_table; __be32 random_id_operand; + struct list_head timewait_list; struct workqueue_struct *wq; } cm; @@ -112,6 +113,7 @@ struct cm_work { struct cm_timewait_info { struct cm_work work; /* Must be first. */ + struct list_head list; struct rb_node remote_qp_node; struct rb_node remote_id_node; __be64 remote_ca_guid; @@ -647,13 +649,6 @@ static inline int cm_convert_to_ms(int i static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info) { - unsigned long flags; - - if (!timewait_info->inserted_remote_id && - !timewait_info->inserted_remote_qp) - return; - - spin_lock_irqsave(&cm.lock, flags); if (timewait_info->inserted_remote_id) { rb_erase(&timewait_info->remote_id_node, &cm.remote_id_table); timewait_info->inserted_remote_id = 0; @@ -663,7 +658,6 @@ static void cm_cleanup_timewait(struct c rb_erase(&timewait_info->remote_qp_node, &cm.remote_qp_table); timewait_info->inserted_remote_qp = 0; } - spin_unlock_irqrestore(&cm.lock, flags); } static struct cm_timewait_info * cm_create_timewait_info(__be32 local_id) @@ -684,8 +678,12 @@ static struct cm_timewait_info * cm_crea static void cm_enter_timewait(struct cm_id_private *cm_id_priv) { int wait_time; + unsigned long flags; + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + list_add_tail(&cm_id_priv->timewait_info->list, &cm.timewait_list); + spin_unlock_irqrestore(&cm.lock, flags); /* * The cm_id could be destroyed by the user before we exit timewait. @@ -701,9 +699,13 @@ static void cm_enter_timewait(struct cm_ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv) { + unsigned long flags; + cm_id_priv->id.state = IB_CM_IDLE; if (cm_id_priv->timewait_info) { + spin_lock_irqsave(&cm.lock, flags); cm_cleanup_timewait(cm_id_priv->timewait_info); + spin_unlock_irqrestore(&cm.lock, flags); kfree(cm_id_priv->timewait_info); cm_id_priv->timewait_info = NULL; } @@ -1307,6 +1309,7 @@ static struct cm_id_private * cm_match_r if (timewait_info) { cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, timewait_info->work.remote_id); + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); if (cur_cm_id_priv) { cm_dup_req_handler(work, cur_cm_id_priv); @@ -1315,7 +1318,8 @@ static struct cm_id_private * cm_match_r cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + listen_cm_id_priv = NULL; + goto out; } /* Find matching listen request. */ @@ -1323,21 +1327,20 @@ static struct cm_id_private * cm_match_r req_msg->service_id, req_msg->private_data); if (!listen_cm_id_priv) { + cm_cleanup_timewait(cm_id_priv->timewait_info); spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ, NULL, 0); - goto error; + goto out; } atomic_inc(&listen_cm_id_priv->refcount); atomic_inc(&cm_id_priv->refcount); cm_id_priv->id.state = IB_CM_REQ_RCVD; atomic_inc(&cm_id_priv->work_count); spin_unlock_irqrestore(&cm.lock, flags); +out: return listen_cm_id_priv; - -error: cm_cleanup_timewait(cm_id_priv->timewait_info); - return NULL; } static int cm_req_handler(struct cm_work *work) @@ -1899,6 +1902,32 @@ out: spin_unlock_irqrestore(&cm_id_priv- } EXPORT_SYMBOL(ib_send_cm_drep); +static int cm_issue_drep(struct cm_port *port, + struct ib_mad_recv_wc *mad_recv_wc) +{ + struct ib_mad_send_buf *msg = NULL; + struct cm_dreq_msg *dreq_msg; + struct cm_drep_msg *drep_msg; + int ret; + + ret = cm_alloc_response_msg(port, mad_recv_wc, &msg); + if (ret) + return ret; + + dreq_msg = (struct cm_dreq_msg *) mad_recv_wc->recv_buf.mad; + drep_msg = (struct cm_drep_msg *) msg->mad; + + cm_format_mad_hdr(&drep_msg->hdr, CM_DREP_ATTR_ID, dreq_msg->hdr.tid); + drep_msg->remote_comm_id = dreq_msg->local_comm_id; + drep_msg->local_comm_id = dreq_msg->remote_comm_id; + + ret = ib_post_send_mad(msg, NULL); + if (ret) + cm_free_msg(msg); + + return ret; +} + static int cm_dreq_handler(struct cm_work *work) { struct cm_id_private *cm_id_priv; @@ -1910,8 +1939,10 @@ static int cm_dreq_handler(struct cm_wor dreq_msg = (struct cm_dreq_msg *)work->mad_recv_wc->recv_buf.mad; cm_id_priv = cm_acquire_id(dreq_msg->remote_comm_id, dreq_msg->local_comm_id); - if (!cm_id_priv) + if (!cm_id_priv) { + cm_issue_drep(work->port, work->mad_recv_wc); return -EINVAL; + } work->cm_event.private_data = &dreq_msg->private_data; @@ -2601,28 +2632,29 @@ static int cm_timewait_handler(struct cm { struct cm_timewait_info *timewait_info; struct cm_id_private *cm_id_priv; - unsigned long flags; int ret; timewait_info = (struct cm_timewait_info *)work; - cm_cleanup_timewait(timewait_info); + spin_lock_irq(&cm.lock); + list_del(&timewait_info->list); + spin_unlock_irq(&cm.lock); cm_id_priv = cm_acquire_id(timewait_info->work.local_id, timewait_info->work.remote_id); if (!cm_id_priv) return -EINVAL; - spin_lock_irqsave(&cm_id_priv->lock, flags); + spin_lock_irq(&cm_id_priv->lock); if (cm_id_priv->id.state != IB_CM_TIMEWAIT || cm_id_priv->remote_qpn != timewait_info->remote_qpn) { - spin_unlock_irqrestore(&cm_id_priv->lock, flags); + spin_unlock_irq(&cm_id_priv->lock); goto out; } cm_id_priv->id.state = IB_CM_IDLE; ret = atomic_inc_and_test(&cm_id_priv->work_count); if (!ret) list_add_tail(&work->list, &cm_id_priv->work_list); - spin_unlock_irqrestore(&cm_id_priv->lock, flags); + spin_unlock_irq(&cm_id_priv->lock); if (ret) cm_process_work(cm_id_priv, work); @@ -3374,6 +3406,7 @@ static int __init ib_cm_init(void) idr_init(&cm.local_id_table); get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand); idr_pre_get(&cm.local_id_table, GFP_KERNEL); + INIT_LIST_HEAD(&cm.timewait_list); cm.wq = create_workqueue("ib_cm"); if (!cm.wq) @@ -3391,7 +3424,20 @@ error: static void __exit ib_cm_cleanup(void) { + struct cm_timewait_info *timewait_info, *tmp; + + spin_lock_irq(&cm.lock); + list_for_each_entry(timewait_info, &cm.timewait_list, list) + cancel_delayed_work(&timewait_info->work.work); + spin_unlock_irq(&cm.lock); + destroy_workqueue(cm.wq); + + list_for_each_entry_safe(timewait_info, tmp, &cm.timewait_list, list) { + list_del(&timewait_info->list); + kfree(timewait_info); + } + ib_unregister_client(&cm_client); idr_destroy(&cm.local_id_table); } diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c b/drivers/infiniband/hw/amso1100/c2_ae.c index 3aae497..a31439b 100644 --- a/drivers/infiniband/hw/amso1100/c2_ae.c +++ b/drivers/infiniband/hw/amso1100/c2_ae.c @@ -66,7 +66,6 @@ static int c2_convert_cm_status(u32 c2_s } } -#ifdef DEBUG static const char* to_event_str(int event) { static const char* event_str[] = { @@ -144,7 +143,6 @@ static const char *to_qp_state_str(int s return ""; }; } -#endif void c2_ae_event(struct c2_dev *c2dev, u32 mq_index) { diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c index 1226113..5bcf697 100644 --- a/drivers/infiniband/hw/amso1100/c2_qp.c +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -35,6 +35,8 @@ * */ +#include + #include "c2.h" #include "c2_vq.h" #include "c2_status.h" @@ -705,10 +707,8 @@ static inline void c2_activity(struct c2 * cannot get on the bus and the card and system hang in a * deadlock -- thus the need for this code. [TOT] */ - while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x80000000) { - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(0); - } + while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x80000000) + udelay(10); __raw_writel(C2_HINT_MAKE(mq_index, shared), c2dev->regs + PCI_BAR0_ADAPTER_HINT); @@ -766,6 +766,7 @@ int c2_post_send(struct ib_qp *ibqp, str struct c2_dev *c2dev = to_c2dev(ibqp->device); struct c2_qp *qp = to_c2qp(ibqp); union c2wr wr; + unsigned long lock_flags; int err = 0; u32 flags; @@ -881,8 +882,10 @@ int c2_post_send(struct ib_qp *ibqp, str /* * Post the puppy! */ + spin_lock_irqsave(&qp->lock, lock_flags); err = qp_wr_post(&qp->sq_mq, &wr, qp, msg_size); if (err) { + spin_unlock_irqrestore(&qp->lock, lock_flags); break; } @@ -890,6 +893,7 @@ int c2_post_send(struct ib_qp *ibqp, str * Enqueue mq index to activity FIFO. */ c2_activity(c2dev, qp->sq_mq.index, qp->sq_mq.hint_count); + spin_unlock_irqrestore(&qp->lock, lock_flags); ib_wr = ib_wr->next; } @@ -905,6 +909,7 @@ int c2_post_receive(struct ib_qp *ibqp, struct c2_dev *c2dev = to_c2dev(ibqp->device); struct c2_qp *qp = to_c2qp(ibqp); union c2wr wr; + unsigned long lock_flags; int err = 0; if (qp->state > IB_QPS_RTS) @@ -945,8 +950,10 @@ int c2_post_receive(struct ib_qp *ibqp, break; } + spin_lock_irqsave(&qp->lock, lock_flags); err = qp_wr_post(&qp->rq_mq, &wr, qp, qp->rq_mq.msg_size); if (err) { + spin_unlock_irqrestore(&qp->lock, lock_flags); break; } @@ -954,6 +961,7 @@ int c2_post_receive(struct ib_qp *ibqp, * Enqueue mq index to activity FIFO */ c2_activity(c2dev, qp->rq_mq.index, qp->rq_mq.hint_count); + spin_unlock_irqrestore(&qp->lock, lock_flags); ib_wr = ib_wr->next; } diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 981fe2e..fc67f78 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -179,6 +179,8 @@ static int mthca_query_port(struct ib_de props->max_mtu = out_mad->data[41] & 0xf; props->active_mtu = out_mad->data[36] >> 4; props->subnet_timeout = out_mad->data[51] & 0x1f; + props->max_vl_num = out_mad->data[37] >> 4; + props->init_type_reply = out_mad->data[41] >> 4; out: kfree(in_mad); diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 0f316c8..92a72f5 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -201,6 +201,8 @@ int mthca_alloc_srq(struct mthca_dev *de if (mthca_is_memfree(dev)) srq->max = roundup_pow_of_two(srq->max + 1); + else + srq->max = srq->max + 1; ds = max(64UL, roundup_pow_of_two(sizeof (struct mthca_next_seg) + @@ -277,7 +279,7 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; - attr->max_wr = (mthca_is_memfree(dev)) ? srq->max - 1 : srq->max; + attr->max_wr = srq->max - 1; attr->max_sge = srq->max_gs; return 0; @@ -413,7 +415,7 @@ int mthca_query_srq(struct ib_srq *ibsrq srq_attr->srq_limit = be16_to_cpu(tavor_ctx->limit_watermark); } - srq_attr->max_wr = (mthca_is_memfree(dev)) ? srq->max - 1 : srq->max; + srq_attr->max_wr = srq->max - 1; srq_attr->max_sge = srq->max_gs; out: diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index f426a69..8bf5e9e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -355,6 +355,11 @@ void ipoib_send(struct net_device *dev, tx_req->skb = skb; addr = dma_map_single(priv->ca->dma_device, skb->data, skb->len, DMA_TO_DEVICE); + if (unlikely(dma_mapping_error(addr))) { + ++priv->stats.tx_errors; + dev_kfree_skb_any(skb); + return; + } pci_unmap_addr_set(tx_req, mapping, addr); if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 44b9e5b..4b09147 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -343,29 +343,32 @@ static int srp_send_req(struct srp_targe */ if (target->io_class == SRP_REV10_IB_IO_CLASS) { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id + 8, 8); + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.initiator_port_id + 8, - target->srp_host->initiator_port_id, 8); + &target->initiator_ext, 8); memcpy(req->priv.target_port_id, &target->ioc_guid, 8); memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); } else { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id, 16); + &target->initiator_ext, 8); + memcpy(req->priv.initiator_port_id + 8, + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.target_port_id, &target->id_ext, 8); memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); } /* * Topspin/Cisco SRP targets will reject our login unless we - * zero out the first 8 bytes of our initiator port ID. The - * second 8 bytes must be our local node GUID, but we always - * use that anyway. + * zero out the first 8 bytes of our initiator port ID and set + * the second 8 bytes to the local node GUID. */ if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3)) { printk(KERN_DEBUG PFX "Topspin/Cisco initiator port ID workaround " "activated for target GUID %016llx\n", (unsigned long long) be64_to_cpu(target->ioc_guid)); memset(req->priv.initiator_port_id, 0, 8); + memcpy(req->priv.initiator_port_id + 8, + &target->srp_host->dev->dev->node_guid, 8); } status = ib_send_cm_req(target->cm_id, &req->param); @@ -1553,6 +1556,7 @@ enum { SRP_OPT_MAX_SECT = 1 << 5, SRP_OPT_MAX_CMD_PER_LUN = 1 << 6, SRP_OPT_IO_CLASS = 1 << 7, + SRP_OPT_INITIATOR_EXT = 1 << 8, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1569,6 +1573,7 @@ static match_table_t srp_opt_tokens = { { SRP_OPT_MAX_SECT, "max_sect=%d" }, { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d" }, { SRP_OPT_IO_CLASS, "io_class=%x" }, + { SRP_OPT_INITIATOR_EXT, "initiator_ext=%s" }, { SRP_OPT_ERR, NULL } }; @@ -1668,6 +1673,12 @@ static int srp_parse_options(const char target->io_class = token; break; + case SRP_OPT_INITIATOR_EXT: + p = match_strdup(args); + target->initiator_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + default: printk(KERN_WARNING PFX "unknown parameter or missing value " "'%s' in target creation request\n", p); @@ -1708,7 +1719,6 @@ static ssize_t srp_create_target(struct target_host->max_lun = SRP_MAX_LUN; target = host_to_target(target_host); - memset(target, 0, sizeof *target); target->io_class = SRP_REV16A_IB_IO_CLASS; target->scsi_host = target_host; @@ -1815,9 +1825,6 @@ static struct srp_host *srp_add_port(str host->dev = device; host->port = port; - host->initiator_port_id[7] = port; - memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); - host->class_dev.class = &srp_class; host->class_dev.dev = device->dev->dma_device; snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index 5b581fb..d4e35ef 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -91,7 +91,6 @@ struct srp_device { }; struct srp_host { - u8 initiator_port_id[16]; struct srp_device *dev; u8 port; struct class_device class_dev; @@ -122,6 +121,7 @@ struct srp_target_port { __be64 id_ext; __be64 ioc_guid; __be64 service_id; + __be64 initiator_ext; u16 io_class; struct srp_host *srp_host; struct Scsi_Host *scsi_host; From rjwalsh at pathscale.com Tue Oct 10 14:55:45 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 10 Oct 2006 14:55:45 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only Message-ID: <452C16E1.60005@pathscale.com> Hi all, This is a new version of the patch to fix a problem on modprobe of the ib_ipath module when no InfiniPath device is actually present. This version should be multithread-probe safe. Roland: can you queue this up for 2.6.19, please? OFED folks: it's not critical that this get into OFED-1.1: there's already a version of this in 1.1-RC7 that should work just fine. This is just a slightly safer variant. If you think there's time to get it in and you're OK with that, then my all means do. Regards, Robert. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: diagpkt-init-fixup.patch URL: From sean.hefty at intel.com Tue Oct 10 15:44:41 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 15:44:41 -0700 Subject: [openib-general] [RFC] [PATCH 0/7] rdma_cm 2.6.20: add userspace support Message-ID: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> The following patch series migrates rdma_cm features currently only in svn to the mainline kernel. I'd like to consider adding these features into a branch targeting 2.6.20. The patches were created against 2.6.19, and borrow heavily from code checked into svn. However, there are some notable differences. Major differences between svn and the patches being submitted will be noted in individual patches. Once these core set of changes are accepted into the kernel, I'd like to avoid maintaining the following modules outside of the mainline tree: ib_addr, ib_cm, ib_ucm, ib_multicast, rdma_cm, and rdma_ucm. (And I vote for including ib_core, ib_mad, ib_sa, ib_uverbs, ib_ipoib, and ib_umad in that list.) Signed-off-by: Sean Hefty From sean.hefty at intel.com Tue Oct 10 15:53:24 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 15:53:24 -0700 Subject: [openib-general] [RFC] [PATCH 1/7] ib_sa 2.6.20: expose MAD retry parameter through SA In-Reply-To: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> Message-ID: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> Currently, the SA query interface does not permit retrying requests automatically. Expose this capability to take advantage of underlying MAD layer API, which provides it basically for free because of RMPP. Without automatic retries pushed down into the SA query module, retries are assigned new TIDs, and appear as separate requests. This means that a delayed response will be dropped, and the remote side will not detect that the request is a duplicate, so will re-calculate the response. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9ae4f3a..b9ba68d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1334,7 +1334,7 @@ static int cma_query_ib_route(struct rdm id_priv->id.port_num, &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH, - timeout_ms, GFP_KERNEL, + timeout_ms, 0, GFP_KERNEL, cma_query_handler, work, &id_priv->query); return (id_priv->query_id < 0) ? id_priv->query_id : 0; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 1706d3c..82d4736 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -516,7 +516,8 @@ static void init_mad(struct ib_sa_mad *m spin_unlock_irqrestore(&tid_lock, flags); } -static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask) +static int send_mad(struct ib_sa_query *query, int timeout_ms, int retries, + gfp_t gfp_mask) { unsigned long flags; int ret, id; @@ -533,6 +534,7 @@ retry: return ret; query->mad_buf->timeout_ms = timeout_ms; + query->mad_buf->retries = retries; query->mad_buf->context[0] = query; query->id = id; @@ -590,6 +592,7 @@ static void ib_sa_path_rec_release(struc * @rec:Path Record to send in query * @comp_mask:component mask to send in query * @timeout_ms:time to wait for response + * @retries:number of times to retry request * @gfp_mask:GFP mask to use for internal allocations * @callback:function called when query completes, times out or is * canceled @@ -611,7 +614,7 @@ int ib_sa_path_rec_get(struct ib_sa_clie struct ib_device *device, u8 port_num, struct ib_sa_path_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_path_rec *resp, void *context), @@ -662,7 +665,7 @@ int ib_sa_path_rec_get(struct ib_sa_clie *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); + ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask); if (ret < 0) goto err2; @@ -710,6 +713,7 @@ static void ib_sa_service_rec_release(st * @rec:Service Record to send in request * @comp_mask:component mask to send in request * @timeout_ms:time to wait for response + * @retries:number of times to retry request * @gfp_mask:GFP mask to use for internal allocations * @callback:function called when request completes, times out or is * canceled @@ -732,7 +736,7 @@ int ib_sa_service_rec_query(struct ib_sa struct ib_device *device, u8 port_num, u8 method, struct ib_sa_service_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_service_rec *resp, void *context), @@ -789,7 +793,7 @@ int ib_sa_service_rec_query(struct ib_sa *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); + ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask); if (ret < 0) goto err2; @@ -833,7 +837,7 @@ int ib_sa_mcmember_rec_query(struct ib_s u8 method, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_mcmember_rec *resp, void *context), @@ -885,7 +889,7 @@ int ib_sa_mcmember_rec_query(struct ib_s *sa_query = &query->sa_query; - ret = send_mad(&query->sa_query, timeout_ms, gfp_mask); + ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask); if (ret < 0) goto err2; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1eaf00e..dafeb3d 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -470,7 +470,7 @@ static int path_rec_start(struct net_dev IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_NUMB_PATH | IB_SA_PATH_REC_PKEY, - 1000, GFP_ATOMIC, + 1000, 0, GFP_ATOMIC, path_rec_completion, path, &path->query); if (path->query_id < 0) { diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 44b9e5b..bcf151c 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -277,7 +277,7 @@ static int srp_lookup_path(struct srp_ta IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_NUMB_PATH | IB_SA_PATH_REC_PKEY, - SRP_PATH_REC_TIMEOUT_MS, + SRP_PATH_REC_TIMEOUT_MS, 0, GFP_KERNEL, srp_path_rec_completion, target, &target->path_query); diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h index 97715b0..e94656a 100644 --- a/include/rdma/ib_sa.h +++ b/include/rdma/ib_sa.h @@ -278,7 +278,7 @@ int ib_sa_path_rec_get(struct ib_sa_clie struct ib_device *device, u8 port_num, struct ib_sa_path_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_path_rec *resp, void *context), @@ -290,7 +290,7 @@ int ib_sa_mcmember_rec_query(struct ib_s u8 method, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_mcmember_rec *resp, void *context), @@ -302,7 +302,7 @@ int ib_sa_service_rec_query(struct ib_sa u8 method, struct ib_sa_service_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_service_rec *resp, void *context), @@ -317,6 +317,7 @@ int ib_sa_service_rec_query(struct ib_sa * @rec:MCMember Record to send in query * @comp_mask:component mask to send in query * @timeout_ms:time to wait for response + * @retries:number of times to retry request * @gfp_mask:GFP mask to use for internal allocations * @callback:function called when query completes, times out or is * canceled @@ -339,7 +340,7 @@ ib_sa_mcmember_rec_set(struct ib_sa_clie struct ib_device *device, u8 port_num, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_mcmember_rec *resp, void *context), @@ -349,7 +350,7 @@ ib_sa_mcmember_rec_set(struct ib_sa_clie return ib_sa_mcmember_rec_query(client, device, port_num, IB_MGMT_METHOD_SET, rec, comp_mask, - timeout_ms, gfp_mask, callback, + timeout_ms, retries, gfp_mask, callback, context, query); } @@ -361,6 +362,7 @@ ib_sa_mcmember_rec_set(struct ib_sa_clie * @rec:MCMember Record to send in query * @comp_mask:component mask to send in query * @timeout_ms:time to wait for response + * @retries:number of times to retry request * @gfp_mask:GFP mask to use for internal allocations * @callback:function called when query completes, times out or is * canceled @@ -383,7 +385,7 @@ ib_sa_mcmember_rec_delete(struct ib_sa_c struct ib_device *device, u8 port_num, struct ib_sa_mcmember_rec *rec, ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, + int timeout_ms, int retries, gfp_t gfp_mask, void (*callback)(int status, struct ib_sa_mcmember_rec *resp, void *context), @@ -393,7 +395,7 @@ ib_sa_mcmember_rec_delete(struct ib_sa_c return ib_sa_mcmember_rec_query(client, device, port_num, IB_SA_METHOD_DELETE, rec, comp_mask, - timeout_ms, gfp_mask, callback, + timeout_ms, retries, gfp_mask, callback, context, query); } From sean.hefty at intel.com Tue Oct 10 15:58:31 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 15:58:31 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> Message-ID: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> The IB SA tracks multicast join / leave requests on a per port basis. In order to support multiple users of the same multicast group from the same port, we need to perform local reference counting on each of the nodes. Add an ib_multicast module to perform reference counting of multicast join / leave requests. Modify ib_ipoib to use the multicast module. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 163d991..76cc988 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,6 +1,7 @@ infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ + ib_multicast.o \ ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o @@ -12,6 +13,8 @@ ib_mad-y := mad.o smi.o agent.o mad_rm ib_sa-y := sa_query.o +ib_multicast-y := multicast.o + ib_cm-y := cm.o iw_cm-y := iwcm.o diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c new file mode 100755 index 0000000..e7204b4 --- /dev/null +++ b/drivers/infiniband/core/multicast.c @@ -0,0 +1,795 @@ +/* + * Copyright (c) 2006 Intel Corporation.  All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("InfiniBand multicast membership handling"); +MODULE_LICENSE("Dual BSD/GPL"); + +static int retry_timer = 5000; /* 5 sec */ +module_param(retry_timer, int, 0444); +MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests."); + +static int retries = 3; +module_param(retries, int, 0444); +MODULE_PARM_DESC(retries, "Number of times to retry a request."); + +static void mcast_add_one(struct ib_device *device); +static void mcast_remove_one(struct ib_device *device); + +static struct ib_client mcast_client = { + .name = "ib_multicast", + .add = mcast_add_one, + .remove = mcast_remove_one +}; + +static struct ib_sa_client sa_client; +static struct ib_event_handler event_handler; +static struct workqueue_struct *mcast_wq; +static union ib_gid mgid0; + +struct mcast_device; + +struct mcast_port { + struct mcast_device *dev; + spinlock_t lock; + struct rb_root table; + atomic_t refcount; + struct completion comp; + u8 port_num; +}; + +struct mcast_device { + struct ib_device *device; + int start_port; + int end_port; + struct mcast_port port[0]; +}; + +enum mcast_state { + MCAST_IDLE, + MCAST_JOINING, + MCAST_MEMBER, + MCAST_BUSY, + MCAST_ERROR +}; + +struct mcast_member; + +struct mcast_group { + struct ib_sa_mcmember_rec rec; + struct rb_node node; + struct mcast_port *port; + spinlock_t lock; + struct work_struct work; + struct list_head pending_list; + struct list_head active_list; + struct mcast_member *last_join; + int members[3]; + atomic_t refcount; + enum mcast_state state; + struct ib_sa_query *query; + int query_id; +}; + +struct mcast_member { + struct ib_multicast multicast; + struct mcast_group *group; + struct list_head list; + enum mcast_state state; + atomic_t refcount; + struct completion comp; +}; + +static void join_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context); +static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context); + +static struct mcast_group *mcast_find(struct mcast_port *port, + union ib_gid *mgid) +{ + struct rb_node *node = port->table.rb_node; + struct mcast_group *group; + int ret; + + while (node) { + group = rb_entry(node, struct mcast_group, node); + ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid); + if (!ret) + return group; + + if (ret < 0) + node = node->rb_left; + else + node = node->rb_right; + } + return NULL; +} + +static struct mcast_group *mcast_insert(struct mcast_port *port, + struct mcast_group *group, + int allow_duplicates) +{ + struct rb_node **link = &port->table.rb_node; + struct rb_node *parent = NULL; + struct mcast_group *cur_group; + int ret; + + while (*link) { + parent = *link; + cur_group = rb_entry(parent, struct mcast_group, node); + + ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw, + sizeof group->rec.mgid); + if (ret < 0) + link = &(*link)->rb_left; + else if (ret > 0) + link = &(*link)->rb_right; + else if (allow_duplicates) + link = &(*link)->rb_left; + else + return cur_group; + } + rb_link_node(&group->node, parent, link); + rb_insert_color(&group->node, &port->table); + return NULL; +} + +static void deref_port(struct mcast_port *port) +{ + if (atomic_dec_and_test(&port->refcount)) + complete(&port->comp); +} + +static void release_group(struct mcast_group *group) +{ + struct mcast_port *port = group->port; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + if (atomic_dec_and_test(&group->refcount)) { + rb_erase(&group->node, &port->table); + spin_unlock_irqrestore(&port->lock, flags); + kfree(group); + deref_port(port); + } else + spin_unlock_irqrestore(&port->lock, flags); +} + +static void deref_member(struct mcast_member *member) +{ + if (atomic_dec_and_test(&member->refcount)) + complete(&member->comp); +} + +static void queue_join(struct mcast_member *member) +{ + struct mcast_group *group = member->group; + unsigned long flags; + + spin_lock_irqsave(&group->lock, flags); + list_add(&member->list, &group->pending_list); + if (group->state == MCAST_IDLE) { + group->state = MCAST_BUSY; + atomic_inc(&group->refcount); + queue_work(mcast_wq, &group->work); + } + spin_unlock_irqrestore(&group->lock, flags); +} + +/* + * A multicast group has three types of members: full member, non member, and + * send only member. We need to keep track of the number of members of each + * type based on their join state. Adjust the number of members the belong to + * the specified join states. + */ +static void adjust_membership(struct mcast_group *group, u8 join_state, int inc) +{ + int i; + + for (i = 0; i < 3; i++, join_state >>= 1) + if (join_state & 0x1) + group->members[i] += inc; +} + +/* + * If a multicast group has zero members left for a particular join state, but + * the group is still a member with the SA, we need to leave that join state. + * Determine which join states we still belong to, but that do not have any + * active members. + */ +static u8 get_leave_state(struct mcast_group *group) +{ + u8 leave_state = 0; + int i; + + for (i = 0; i < 3; i++) + if (!group->members[i]) + leave_state |= (0x1 << i); + + return leave_state & group->rec.join_state; +} + +static int cmp_rec(struct ib_sa_mcmember_rec *src, + struct ib_sa_mcmember_rec *dst, ib_sa_comp_mask comp_mask) +{ + /* MGID must already match */ + + if (comp_mask & IB_SA_MCMEMBER_REC_PORT_GID && + memcmp(&src->port_gid, &dst->port_gid, sizeof src->port_gid)) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_QKEY && src->qkey != dst->qkey) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_MLID && src->mlid != dst->mlid) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_MTU_SELECTOR && + src->mtu_selector != dst->mtu_selector) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_MTU && src->mtu != dst->mtu) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_TRAFFIC_CLASS && + src->traffic_class != dst->traffic_class) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_PKEY && src->pkey != dst->pkey) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_RATE_SELECTOR && + src->rate_selector != dst->rate_selector) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_RATE && src->rate != dst->rate) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME_SELECTOR && + src->packet_life_time_selector != dst->packet_life_time_selector) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME && + src->packet_life_time != dst->packet_life_time) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_SL && src->sl != dst->sl) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_FLOW_LABEL && + src->flow_label != dst->flow_label) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_HOP_LIMIT && + src->hop_limit != dst->hop_limit) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_SCOPE && src->scope != dst->scope) + return -EINVAL; + + /* join_state checked separately, proxy_join ignored */ + + return 0; +} + +static int send_join(struct mcast_group *group, struct mcast_member *member) +{ + struct mcast_port *port = group->port; + int ret; + + group->last_join = member; + ret = ib_sa_mcmember_rec_set(&sa_client, port->dev->device, + port->port_num, &member->multicast.rec, + member->multicast.comp_mask, + retry_timer, retries, GFP_KERNEL, + join_handler, group, &group->query); + if (ret >= 0) { + group->query_id = ret; + ret = 0; + } + return ret; +} + +static int send_leave(struct mcast_group *group, u8 leave_state) +{ + struct mcast_port *port = group->port; + struct ib_sa_mcmember_rec rec; + int ret; + + rec = group->rec; + rec.join_state = leave_state; + + ret = ib_sa_mcmember_rec_delete(&sa_client, port->dev->device, + port->port_num, &rec, + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_JOIN_STATE, + retry_timer, retries, GFP_KERNEL, + leave_handler, group, &group->query); + if (ret >= 0) { + group->query_id = ret; + ret = 0; + } + return ret; +} + +static void join_group(struct mcast_group *group, struct mcast_member *member, + u8 join_state) +{ + member->state = MCAST_MEMBER; + adjust_membership(group, join_state, 1); + group->rec.join_state |= join_state; + member->multicast.rec = group->rec; + member->multicast.rec.join_state = join_state; + list_del(&member->list); + list_add(&member->list, &group->active_list); +} + +static int fail_join(struct mcast_group *group, struct mcast_member *member, + int status) +{ + spin_lock_irq(&group->lock); + list_del_init(&member->list); + spin_unlock_irq(&group->lock); + return member->multicast.callback(status, &member->multicast); +} + +static void process_group_error(struct mcast_group *group) +{ + struct mcast_member *member; + int ret; + + spin_lock_irq(&group->lock); + while (!list_empty(&group->active_list)) { + member = list_entry(group->active_list.next, + struct mcast_member, list); + atomic_inc(&member->refcount); + list_del_init(&member->list); + adjust_membership(group, member->multicast.rec.join_state, -1); + member->state = MCAST_ERROR; + spin_unlock_irq(&group->lock); + + ret = member->multicast.callback(-ENETRESET, + &member->multicast); + deref_member(member); + if (ret) + ib_free_multicast(&member->multicast); + spin_lock_irq(&group->lock); + } + + group->rec.join_state = 0; + group->state = MCAST_BUSY; + spin_unlock_irq(&group->lock); +} + +static void mcast_work_handler(void *data) +{ + struct mcast_group *group = data; + struct mcast_member *member; + struct ib_multicast *multicast; + int status, ret; + u8 join_state; + +retest: + spin_lock_irq(&group->lock); + if (group->state == MCAST_ERROR) { + spin_unlock_irq(&group->lock); + process_group_error(group); + goto retest; + } + + while (!list_empty(&group->pending_list)) { + member = list_entry(group->pending_list.next, + struct mcast_member, list); + multicast = &member->multicast; + join_state = multicast->rec.join_state; + atomic_inc(&member->refcount); + + if (join_state == (group->rec.join_state & join_state)) { + status = cmp_rec(&group->rec, &multicast->rec, + multicast->comp_mask); + if (!status) + join_group(group, member, join_state); + else + list_del_init(&member->list); + spin_unlock_irq(&group->lock); + ret = multicast->callback(status, multicast); + } else { + spin_unlock_irq(&group->lock); + status = send_join(group, member); + if (!status) { + deref_member(member); + return; + } + ret = fail_join(group, member, status); + } + + deref_member(member); + if (ret) + ib_free_multicast(&member->multicast); + spin_lock_irq(&group->lock); + } + + join_state = get_leave_state(group); + if (join_state) { + group->rec.join_state &= ~join_state; + spin_unlock_irq(&group->lock); + if (send_leave(group, join_state)) + goto retest; + } else { + group->state = MCAST_IDLE; + spin_unlock_irq(&group->lock); + release_group(group); + } +} + +/* + * Fail a join request if it is still active - at the head of the pending queue. + */ +static void process_join_error(struct mcast_group *group, int status) +{ + struct mcast_member *member; + int ret; + + spin_lock_irq(&group->lock); + member = list_entry(group->pending_list.next, + struct mcast_member, list); + if (group->last_join == member) { + atomic_inc(&member->refcount); + list_del_init(&member->list); + spin_unlock_irq(&group->lock); + ret = member->multicast.callback(status, &member->multicast); + deref_member(member); + if (ret) + ib_free_multicast(&member->multicast); + } else + spin_unlock_irq(&group->lock); +} + +static void join_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context) +{ + struct mcast_group *group = context; + + if (status) + process_join_error(group, status); + else { + spin_lock_irq(&group->port->lock); + group->rec = *rec; + if (!memcmp(&mgid0, &group->rec.mgid, sizeof mgid0)) { + rb_erase(&group->node, &group->port->table); + mcast_insert(group->port, group, 1); + } + spin_unlock_irq(&group->port->lock); + } + mcast_work_handler(group); +} + +static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context) +{ + mcast_work_handler(context); +} + +static struct mcast_group *acquire_group(struct mcast_port *port, + union ib_gid *mgid, gfp_t gfp_mask) +{ + struct mcast_group *group, *cur_group; + unsigned long flags; + int is_mgid0; + + is_mgid0 = !memcmp(&mgid0, mgid, sizeof mgid0); + if (!is_mgid0) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + goto found; + spin_unlock_irqrestore(&port->lock, flags); + } + + group = kzalloc(sizeof *group, gfp_mask); + if (!group) + return NULL; + + group->port = port; + group->rec.mgid = *mgid; + INIT_LIST_HEAD(&group->pending_list); + INIT_LIST_HEAD(&group->active_list); + INIT_WORK(&group->work, mcast_work_handler, group); + spin_lock_init(&group->lock); + + spin_lock_irqsave(&port->lock, flags); + cur_group = mcast_insert(port, group, is_mgid0); + if (cur_group) { + kfree(group); + group = cur_group; + } else + atomic_inc(&port->refcount); +found: + atomic_inc(&group->refcount); + spin_unlock_irqrestore(&port->lock, flags); + return group; +} + +/* + * We serialize all join requests to a single group to make our lives much + * easier. Otherwise, two users could try to join the same group + * simultaneously, with different configurations, one could leave while the + * join is in progress, etc., which makes locking around error recovery + * difficult. + */ +struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, + int (*callback)(int status, + struct ib_multicast + *multicast), + void *context) +{ + struct mcast_device *dev; + struct mcast_member *member; + struct ib_multicast *multicast; + int ret; + + dev = ib_get_client_data(device, &mcast_client); + if (!dev) + return ERR_PTR(-ENODEV); + + member = kzalloc(sizeof *member, gfp_mask); + if (!member) + return ERR_PTR(-ENOMEM); + + member->multicast.rec = *rec; + member->multicast.comp_mask = comp_mask; + member->multicast.callback = callback; + member->multicast.context = context; + init_completion(&member->comp); + atomic_set(&member->refcount, 1); + member->state = MCAST_JOINING; + + member->group = acquire_group(&dev->port[port_num - dev->start_port], + &rec->mgid, gfp_mask); + if (!member->group) { + ret = -ENOMEM; + goto err; + } + + /* + * The user will get the multicast structure in their callback. They + * could then free the multicast structure before we can return from + * this routine. So we save the pointer to return before queuing + * any callback. + */ + multicast = &member->multicast; + queue_join(member); + return multicast; + +err: + kfree(member); + return ERR_PTR(ret); +} +EXPORT_SYMBOL(ib_join_multicast); + +void ib_free_multicast(struct ib_multicast *multicast) +{ + struct mcast_member *member; + struct mcast_group *group; + + member = container_of(multicast, struct mcast_member, multicast); + group = member->group; + + spin_lock_irq(&group->lock); + if (member->state == MCAST_MEMBER) + adjust_membership(group, multicast->rec.join_state, -1); + + list_del_init(&member->list); + + if (group->state == MCAST_IDLE) { + group->state = MCAST_BUSY; + spin_unlock_irq(&group->lock); + /* Continue to hold reference on group until callback */ + queue_work(mcast_wq, &group->work); + } else { + spin_unlock_irq(&group->lock); + release_group(group); + } + + deref_member(member); + wait_for_completion(&member->comp); + kfree(member); +} +EXPORT_SYMBOL(ib_free_multicast); + +int ib_get_mcmember_rec(struct ib_device *device, u8 port_num, + union ib_gid *mgid, struct ib_sa_mcmember_rec *rec) +{ + struct mcast_device *dev; + struct mcast_port *port; + struct mcast_group *group; + unsigned long flags; + int ret = 0; + + dev = ib_get_client_data(device, &mcast_client); + if (!dev) + return -ENODEV; + + port = &dev->port[port_num - dev->start_port]; + if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + *rec = group->rec; + else + ret = -EADDRNOTAVAIL; + spin_unlock_irqrestore(&port->lock, flags); + } else { + memset(rec, 0, sizeof *rec); + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); + rec->pkey = 0xFFFF; + get_random_bytes(&rec->qkey, sizeof rec->qkey); + rec->join_state = 1; + } + + return ret; +} +EXPORT_SYMBOL(ib_get_mcmember_rec); + +static void mcast_groups_lost(struct mcast_port *port) +{ + struct mcast_group *group; + struct rb_node *node; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + for (node = rb_first(&port->table); node; node = rb_next(node)) { + group = rb_entry(node, struct mcast_group, node); + spin_lock(&group->lock); + if (group->state == MCAST_IDLE) { + atomic_inc(&group->refcount); + queue_work(mcast_wq, &group->work); + } + group->state = MCAST_ERROR; + spin_unlock(&group->lock); + } + spin_unlock_irqrestore(&port->lock, flags); +} + +static void mcast_event_handler(struct ib_event_handler *handler, + struct ib_event *event) +{ + struct mcast_device *dev; + + dev = ib_get_client_data(event->device, &mcast_client); + if (!dev) + return; + + switch (event->event) { + case IB_EVENT_PORT_ERR: + case IB_EVENT_LID_CHANGE: + case IB_EVENT_SM_CHANGE: + case IB_EVENT_CLIENT_REREGISTER: + mcast_groups_lost(&dev->port[event->element.port_num - + dev->start_port]); + break; + default: + break; + } +} + +static void mcast_add_one(struct ib_device *device) +{ + struct mcast_device *dev; + struct mcast_port *port; + int i; + + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port, + GFP_KERNEL); + if (!dev) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) + dev->start_port = dev->end_port = 0; + else { + dev->start_port = 1; + dev->end_port = device->phys_port_cnt; + } + + for (i = 0; i <= dev->end_port - dev->start_port; i++) { + port = &dev->port[i]; + port->dev = dev; + port->port_num = dev->start_port + i; + spin_lock_init(&port->lock); + port->table = RB_ROOT; + init_completion(&port->comp); + atomic_set(&port->refcount, 1); + } + + dev->device = device; + ib_set_client_data(device, &mcast_client, dev); + + INIT_IB_EVENT_HANDLER(&event_handler, device, mcast_event_handler); + ib_register_event_handler(&event_handler); +} + +static void mcast_remove_one(struct ib_device *device) +{ + struct mcast_device *dev; + struct mcast_port *port; + int i; + + dev = ib_get_client_data(device, &mcast_client); + if (!dev) + return; + + ib_unregister_event_handler(&event_handler); + flush_workqueue(mcast_wq); + + for (i = 0; i < dev->end_port - dev->start_port; i++) { + port = &dev->port[i]; + deref_port(port); + wait_for_completion(&port->comp); + } + + kfree(dev); +} + +static int __init mcast_init(void) +{ + int ret; + + mcast_wq = create_singlethread_workqueue("ib_mcast_wq"); + if (!mcast_wq) + return -ENOMEM; + + ib_sa_register_client(&sa_client); + + ret = ib_register_client(&mcast_client); + if (ret) + goto err; + return 0; + +err: + ib_sa_unregister_client(&sa_client); + destroy_workqueue(mcast_wq); + return ret; +} + +static void __exit mcast_cleanup(void) +{ + ib_unregister_client(&mcast_client); + ib_sa_unregister_client(&sa_client); + destroy_workqueue(mcast_wq); +} + +module_init(mcast_init); +module_exit(mcast_cleanup); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 3faa182..b993cb1 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -45,6 +45,8 @@ #include #include +#include + #include "ipoib.h" #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG @@ -60,14 +62,11 @@ static DEFINE_MUTEX(mcast_mutex); /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */ struct ipoib_mcast { struct ib_sa_mcmember_rec mcmember; + struct ib_multicast *mc; struct ipoib_ah *ah; struct rb_node rb_node; struct list_head list; - struct completion done; - - int query_id; - struct ib_sa_query *query; unsigned long created; unsigned long backoff; @@ -299,18 +298,22 @@ static int ipoib_mcast_join_finish(struc return 0; } -static void +static int ipoib_mcast_sendonly_join_complete(int status, - struct ib_sa_mcmember_rec *mcmember, - void *mcast_ptr) + struct ib_multicast *multicast) { - struct ipoib_mcast *mcast = mcast_ptr; + struct ipoib_mcast *mcast = multicast->context; struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); + /* We trap for port events ourselves. */ + if (status == -ENETRESET) + return 0; + if (!status) - ipoib_mcast_join_finish(mcast, mcmember); - else { + status = ipoib_mcast_join_finish(mcast, &multicast->rec); + + if (status) { if (mcast->logcount++ < 20) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", @@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s spin_unlock_irq(&priv->tx_lock); /* Clear the busy flag so we try again */ - clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); - mcast->query = NULL; + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, + &mcast->flags); } - - complete(&mcast->done); + return status; } static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast) @@ -359,35 +361,32 @@ #endif rec.port_gid = priv->local_gid; rec.pkey = cpu_to_be16(priv->pkey); - init_completion(&mcast->done); - - ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, &rec, - IB_SA_MCMEMBER_REC_MGID | - IB_SA_MCMEMBER_REC_PORT_GID | - IB_SA_MCMEMBER_REC_PKEY | - IB_SA_MCMEMBER_REC_JOIN_STATE, - 1000, GFP_ATOMIC, - ipoib_mcast_sendonly_join_complete, - mcast, &mcast->query); - if (ret < 0) { - ipoib_warn(priv, "ib_sa_mcmember_rec_set failed (ret = %d)\n", + mcast->mc = ib_join_multicast(priv->ca, priv->port, &rec, + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | + IB_SA_MCMEMBER_REC_JOIN_STATE, + GFP_ATOMIC, + ipoib_mcast_sendonly_join_complete, + mcast); + if (IS_ERR(mcast->mc)) { + ret = PTR_ERR(mcast->mc); + clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + ipoib_warn(priv, "ib_join_multicast failed (ret = %d)\n", ret); } else { ipoib_dbg_mcast(priv, "no multicast record for " IPOIB_GID_FMT ", starting join\n", IPOIB_GID_ARG(mcast->mcmember.mgid)); - - mcast->query_id = ret; } return ret; } -static void ipoib_mcast_join_complete(int status, - struct ib_sa_mcmember_rec *mcmember, - void *mcast_ptr) +static int ipoib_mcast_join_complete(int status, + struct ib_multicast *multicast) { - struct ipoib_mcast *mcast = mcast_ptr; + struct ipoib_mcast *mcast = multicast->context; struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -395,23 +394,24 @@ static void ipoib_mcast_join_complete(in " (status %d)\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - if (!status && !ipoib_mcast_join_finish(mcast, mcmember)) { + /* We trap for port events ourselves. */ + if (status == -ENETRESET) + return 0; + + if (!status) + status = ipoib_mcast_join_finish(mcast, &multicast->rec); + + if (!status) { mcast->backoff = 1; mutex_lock(&mcast_mutex); if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_work(ipoib_workqueue, &priv->mcast_task); mutex_unlock(&mcast_mutex); - complete(&mcast->done); - return; - } - - if (status == -EINTR) { - complete(&mcast->done); - return; + return 0; } - if (status && mcast->logcount++ < 20) { - if (status == -ETIMEDOUT || status == -EINTR) { + if (mcast->logcount++ < 20) { + if (status == -ETIMEDOUT) { ipoib_dbg_mcast(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), @@ -428,23 +428,18 @@ static void ipoib_mcast_join_complete(in if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS; - mutex_lock(&mcast_mutex); + /* Clear the busy flag so we try again */ + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + mutex_lock(&mcast_mutex); spin_lock_irq(&priv->lock); - mcast->query = NULL; - - if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { - if (status == -ETIMEDOUT) - queue_work(ipoib_workqueue, &priv->mcast_task); - else - queue_delayed_work(ipoib_workqueue, &priv->mcast_task, - mcast->backoff * HZ); - } else - complete(&mcast->done); + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) + queue_delayed_work(ipoib_workqueue, &priv->mcast_task, + mcast->backoff * HZ); spin_unlock_irq(&priv->lock); mutex_unlock(&mcast_mutex); - return; + return status; } static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast, @@ -493,15 +488,14 @@ static void ipoib_mcast_join(struct net_ rec.hop_limit = priv->broadcast->mcmember.hop_limit; } - init_completion(&mcast->done); - - ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, - &rec, comp_mask, mcast->backoff * 1000, - GFP_ATOMIC, ipoib_mcast_join_complete, - mcast, &mcast->query); - - if (ret < 0) { - ipoib_warn(priv, "ib_sa_mcmember_rec_set failed, status %d\n", ret); + set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + mcast->mc = ib_join_multicast(priv->ca, priv->port, &rec, comp_mask, + GFP_KERNEL, ipoib_mcast_join_complete, + mcast); + if (IS_ERR(mcast->mc)) { + clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + ret = PTR_ERR(mcast->mc); + ipoib_warn(priv, "ib_join_multicast failed, status %d\n", ret); mcast->backoff *= 2; if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) @@ -513,8 +507,7 @@ static void ipoib_mcast_join(struct net_ &priv->mcast_task, mcast->backoff * HZ); mutex_unlock(&mcast_mutex); - } else - mcast->query_id = ret; + } } void ipoib_mcast_join_task(void *dev_ptr) @@ -538,7 +531,7 @@ void ipoib_mcast_join_task(void *dev_ptr priv->local_rate = attr.active_speed * ib_width_enum_to_int(attr.active_width); } else - ipoib_warn(priv, "ib_query_port failed\n"); + ipoib_warn(priv, "ib_query_port failed\n"); } if (!priv->broadcast) { @@ -565,7 +558,8 @@ void ipoib_mcast_join_task(void *dev_ptr } if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { - ipoib_mcast_join(dev, priv->broadcast, 0); + if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) + ipoib_mcast_join(dev, priv->broadcast, 0); return; } @@ -620,26 +614,9 @@ int ipoib_mcast_start_thread(struct net_ return 0; } -static void wait_for_mcast_join(struct ipoib_dev_priv *priv, - struct ipoib_mcast *mcast) -{ - spin_lock_irq(&priv->lock); - if (mcast && mcast->query) { - ib_sa_cancel_query(mcast->query_id, mcast->query); - mcast->query = NULL; - spin_unlock_irq(&priv->lock); - ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); - wait_for_completion(&mcast->done); - } - else - spin_unlock_irq(&priv->lock); -} - int ipoib_mcast_stop_thread(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_mcast *mcast; ipoib_dbg_mcast(priv, "stopping multicast thread\n"); @@ -655,52 +632,27 @@ int ipoib_mcast_stop_thread(struct net_d if (flush) flush_workqueue(ipoib_workqueue); - wait_for_mcast_join(priv, priv->broadcast); - - list_for_each_entry(mcast, &priv->multicast_list, list) - wait_for_mcast_join(priv, mcast); - return 0; } static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_sa_mcmember_rec rec = { - .join_state = 1 - }; int ret = 0; - if (!test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) - return 0; - - ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); - - rec.mgid = mcast->mcmember.mgid; - rec.port_gid = priv->local_gid; - rec.pkey = cpu_to_be16(priv->pkey); + if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { + ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); - /* Remove ourselves from the multicast group */ - ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), - &mcast->mcmember.mgid); - if (ret) - ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); + /* Remove ourselves from the multicast group */ + ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), + &mcast->mcmember.mgid); + if (ret) + ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); + } - /* - * Just make one shot at leaving and don't wait for a reply; - * if we fail, too bad. - */ - ret = ib_sa_mcmember_rec_delete(&ipoib_sa_client, priv->ca, priv->port, &rec, - IB_SA_MCMEMBER_REC_MGID | - IB_SA_MCMEMBER_REC_PORT_GID | - IB_SA_MCMEMBER_REC_PKEY | - IB_SA_MCMEMBER_REC_JOIN_STATE, - 0, GFP_ATOMIC, NULL, - mcast, &mcast->query); - if (ret < 0) - ipoib_warn(priv, "ib_sa_mcmember_rec_delete failed " - "for leave (result = %d)\n", ret); + if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) + ib_free_multicast(mcast->mc); return 0; } @@ -753,7 +705,7 @@ void ipoib_mcast_send(struct net_device dev_kfree_skb_any(skb); } - if (mcast->query) + if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) ipoib_dbg_mcast(priv, "no address vector, " "but multicast join already started\n"); else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) @@ -910,7 +862,6 @@ void ipoib_mcast_restart_task(void *dev_ /* We have to cancel outside of the spinlock */ list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { - wait_for_mcast_join(priv, mcast); ipoib_mcast_leave(mcast->dev, mcast); ipoib_mcast_free(mcast); } diff --git a/include/rdma/ib_multicast.h b/include/rdma/ib_multicast.h new file mode 100755 index 0000000..423b754 --- /dev/null +++ b/include/rdma/ib_multicast.h @@ -0,0 +1,102 @@ +/* + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef IB_MULTICAST_H +#define IB_MULTICAST_H + +#include + +struct ib_multicast { + struct ib_sa_mcmember_rec rec; + ib_sa_comp_mask comp_mask; + int (*callback)(int status, + struct ib_multicast *multicast); + void *context; +}; + +/** + * ib_join_multicast - Initiates a join request to the specified multicast + * group. + * @device: Device associated with the multicast group. + * @port_num: Port on the specified device to associate with the multicast + * group. + * @rec: SA multicast member record specifying group attributes. + * @comp_mask: Component mask indicating which group attributes of %rec are + * valid. + * @gfp_mask: GFP mask for memory allocations. + * @callback: User callback invoked once the join operation completes. + * @context: User specified context stored with the ib_multicast structure. + * + * This call initiates a multicast join request with the SA for the specified + * multicast group. If the join operation is started successfully, it returns + * an ib_multicast structure that is used to track the multicast operation. + * Users must free this structure by calling ib_free_multicast, even if the + * join operation later fails. (The callback status is non-zero.) + */ +struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, + int (*callback)(int status, + struct ib_multicast + *multicast), + void *context); + +/** + * ib_free_multicast - Frees the multicast tracking structure, and releases + * any reference on the multicast group. + * @multicast: Multicast tracking structure allocated by ib_join_multicast. + * + * This call blocks until the connection identifier is destroyed. It may + * not be called from within the multicast callback; however, returning a non- + * zero value from the callback will result in destroying the multicast + * tracking structure. + */ +void ib_free_multicast(struct ib_multicast *multicast); + +/** + * ib_get_mcmember_rec - Looks up a multicast member record by its MGID and + * returns it if found. + * @device: Device associated with the multicast group. + * @port_num: Port on the specified device to associate with the multicast + * group. + * @mgid: optional MGID of multicast group. + * @rec: Location to copy SA multicast member record. + * + * If an MGID is specified, returns an existing multicast member record if + * one is found for the local port. If no MGID is specified, or the specified + * MGID is 0, returns a multicast member record filled in with default values + * that may be used to create a new multicast group. + */ +int ib_get_mcmember_rec(struct ib_device *device, u8 port_num, + union ib_gid *mgid, struct ib_sa_mcmember_rec *rec); + +#endif /* IB_MULTICAST_H */ From sean.hefty at intel.com Tue Oct 10 16:04:32 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 16:04:32 -0700 Subject: [openib-general] [RFC] [PATCH 3/7] rdma_cm 2.6.20: remove specifying qp_type when connecting In-Reply-To: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> Message-ID: <000401c6ecc0$72de11f0$c0d4180a@amr.corp.intel.com> Ideally, there should be a 1:1 correspondence between the qp_type for a connection and the port space associated with an rdma_cm_id. Remove the qp_type from the rdma_cm interface. Signed-off-by: Sean Hefty --- This is a new patch, and is not part of svn. The qp_type isn't needed given the port space type, and removal make the userspace interface a little cleaner. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index b9ba68d..1028548 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -132,7 +132,6 @@ struct rdma_id_private { u32 seq_num; u32 qp_num; - enum ib_qp_type qp_type; u8 srq; }; @@ -391,7 +390,6 @@ int rdma_create_qp(struct rdma_cm_id *id id->qp = qp; id_priv->qp_num = qp->qp_num; - id_priv->qp_type = qp->qp_type; id_priv->srq = (qp->srq != NULL); return 0; err: @@ -1854,7 +1852,7 @@ static int cma_connect_ib(struct rdma_id req.service_id = cma_get_service_id(id_priv->id.ps, &route->addr.dst_addr); req.qp_num = id_priv->qp_num; - req.qp_type = id_priv->qp_type; + req.qp_type = IB_QPT_RC; req.starting_psn = id_priv->seq_num; req.responder_resources = conn_param->responder_resources; req.initiator_depth = conn_param->initiator_depth; @@ -1931,7 +1929,6 @@ int rdma_connect(struct rdma_cm_id *id, if (!id->qp) { id_priv->qp_num = conn_param->qp_num; - id_priv->qp_type = conn_param->qp_type; id_priv->srq = conn_param->srq; } @@ -2015,7 +2012,6 @@ int rdma_accept(struct rdma_cm_id *id, s if (!id->qp && conn_param) { id_priv->qp_num = conn_param->qp_num; - id_priv->qp_type = conn_param->qp_type; id_priv->srq = conn_param->srq; } diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h old mode 100644 new mode 100755 index deb5a0a..4c07f96 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -215,7 +215,6 @@ struct rdma_conn_param { /* Fields below ignored if a QP is created on the rdma_cm_id. */ u8 srq; u32 qp_num; - enum ib_qp_type qp_type; }; /** From sean.hefty at intel.com Tue Oct 10 16:16:08 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 16:16:08 -0700 Subject: [openib-general] [RFC] [PATCH 4/7] rdma_cm 2.6.20: report connection data with event In-Reply-To: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> Message-ID: <000501c6ecc2$1159b4a0$c0d4180a@amr.corp.intel.com> When establishing a connection, users of the rdma_cm provide connection parameters during calls to rdma_connect() and rdma_accept(). These parameters are not given to the remote side during connection establishment. The result is that the remote side does not know parameters such as initiator_depth and responder_resources until after a connection is established, and then only by querying the QP attributes. This makes it difficult to optimize resources before connecting or reject a connection if it cannot provide the required resources. Signed-off-by: Sean Hefty --- This patch is not in svn, and starts a substantial deviation from the svn tree by carrying the connection information along with the event. I focused on making this patch against the mainline kernel, so that iWarp support would be included. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c old mode 100644 new mode 100755 index 1028548..3743ede --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -591,20 +591,6 @@ static inline int cma_user_data_offset(e } } -static int cma_notify_user(struct rdma_id_private *id_priv, - enum rdma_cm_event_type type, int status, - void *data, u8 data_len) -{ - struct rdma_cm_event event; - - event.event = type; - event.status = status; - event.private_data = data; - event.private_data_len = data_len; - - return id_priv->id.event_handler(&id_priv->id, &event); -} - static void cma_cancel_route(struct rdma_id_private *id_priv) { switch (rdma_node_get_transport(id_priv->id.device->node_type)) { @@ -789,47 +775,62 @@ reject: return ret; } +static void cma_set_rep_event_data(struct rdma_cm_event *event, + struct ib_cm_rep_event_param *rep_data, + void *private_data) +{ + event->param.conn.private_data = private_data; + event->param.conn.private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; + event->param.conn.responder_resources = rep_data->responder_resources; + event->param.conn.initiator_depth = rep_data->initiator_depth; + event->param.conn.flow_control = rep_data->flow_control; + event->param.conn.rnr_retry_count = rep_data->rnr_retry_count; + event->param.conn.srq = rep_data->srq; + event->param.conn.qp_num = rep_data->remote_qpn; +} + static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; - enum rdma_cm_event_type event; - u8 private_data_len = 0; - int ret = 0, status = 0; + struct rdma_cm_event event; + int ret = 0; atomic_inc(&id_priv->dev_remove); if (!cma_comp(id_priv, CMA_CONNECT)) goto out; + memset(&event, 0, sizeof event); switch (ib_event->event) { case IB_CM_REQ_ERROR: case IB_CM_REP_ERROR: - event = RDMA_CM_EVENT_UNREACHABLE; - status = -ETIMEDOUT; + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = -ETIMEDOUT; break; case IB_CM_REP_RECEIVED: - status = cma_verify_rep(id_priv, ib_event->private_data); - if (status) - event = RDMA_CM_EVENT_CONNECT_ERROR; + event.status = cma_verify_rep(id_priv, ib_event->private_data); + if (event.status) + event.event = RDMA_CM_EVENT_CONNECT_ERROR; else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { - status = cma_rep_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + event.status = cma_rep_recv(id_priv); + event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : + RDMA_CM_EVENT_ESTABLISHED; } else - event = RDMA_CM_EVENT_CONNECT_RESPONSE; - private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; + event.event = RDMA_CM_EVENT_CONNECT_RESPONSE; + cma_set_rep_event_data(&event, &ib_event->param.rep_rcvd, + ib_event->private_data); break; case IB_CM_RTU_RECEIVED: - status = cma_rtu_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + event.status = cma_rtu_recv(id_priv); + event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : + RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: - status = -ETIMEDOUT; /* fall through */ + event.status = -ETIMEDOUT; /* fall through */ case IB_CM_DREQ_RECEIVED: case IB_CM_DREP_RECEIVED: if (!cma_comp_exch(id_priv, CMA_CONNECT, CMA_DISCONNECT)) goto out; - event = RDMA_CM_EVENT_DISCONNECTED; + event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: case IB_CM_MRA_RECEIVED: @@ -837,9 +838,10 @@ static int cma_ib_handler(struct ib_cm_i goto out; case IB_CM_REJ_RECEIVED: cma_modify_qp_err(&id_priv->id); - status = ib_event->param.rej_rcvd.reason; - event = RDMA_CM_EVENT_REJECTED; - private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; + event.status = ib_event->param.rej_rcvd.reason; + event.event = RDMA_CM_EVENT_REJECTED; + event.param.conn.private_data = ib_event->private_data; + event.param.conn.private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", @@ -847,8 +849,7 @@ static int cma_ib_handler(struct ib_cm_i goto out; } - ret = cma_notify_user(id_priv, event, status, ib_event->private_data, - private_data_len); + ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; @@ -910,9 +911,25 @@ err: return NULL; } +static void cma_set_req_event_data(struct rdma_cm_event *event, + struct ib_cm_req_event_param *req_data, + void *private_data, int offset) +{ + event->param.conn.private_data = private_data + offset; + event->param.conn.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - offset; + event->param.conn.responder_resources = req_data->responder_resources; + event->param.conn.initiator_depth = req_data->initiator_depth; + event->param.conn.flow_control = req_data->flow_control; + event->param.conn.retry_count = req_data->retry_count; + event->param.conn.rnr_retry_count = req_data->rnr_retry_count; + event->param.conn.srq = req_data->srq; + event->param.conn.qp_num = req_data->remote_qpn; +} + static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *listen_id, *conn_id; + struct rdma_cm_event event; int offset, ret; listen_id = cm_id->context; @@ -945,9 +962,11 @@ static int cma_req_handler(struct ib_cm_ cm_id->cm_handler = cma_ib_handler; offset = cma_user_data_offset(listen_id->id.ps); - ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, - ib_event->private_data + offset, - IB_CM_REQ_PRIVATE_DATA_SIZE - offset); + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + cma_set_req_event_data(&event, &ib_event->param.req_rcvd, + ib_event->private_data, offset); + ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ conn_id->cm_id.ib = NULL; @@ -1019,15 +1038,16 @@ static void cma_set_compare_data(enum rd static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) { struct rdma_id_private *id_priv = iw_id->context; - enum rdma_cm_event_type event = 0; + struct rdma_cm_event event; struct sockaddr_in *sin; int ret = 0; + memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); switch (iw_event->event) { case IW_CM_EVENT_CLOSE: - event = RDMA_CM_EVENT_DISCONNECTED; + event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IW_CM_EVENT_CONNECT_REPLY: sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; @@ -1035,20 +1055,21 @@ static int cma_iw_handler(struct iw_cm_i sin = (struct sockaddr_in *) &id_priv->id.route.addr.dst_addr; *sin = iw_event->remote_addr; if (iw_event->status) - event = RDMA_CM_EVENT_REJECTED; + event.event = RDMA_CM_EVENT_REJECTED; else - event = RDMA_CM_EVENT_ESTABLISHED; + event.event = RDMA_CM_EVENT_ESTABLISHED; break; case IW_CM_EVENT_ESTABLISHED: - event = RDMA_CM_EVENT_ESTABLISHED; + event.event = RDMA_CM_EVENT_ESTABLISHED; break; default: BUG_ON(1); } - ret = cma_notify_user(id_priv, event, iw_event->status, - iw_event->private_data, - iw_event->private_data_len); + event.status = iw_event->status; + event.param.conn.private_data = iw_event->private_data; + event.param.conn.private_data_len = iw_event->private_data_len; + ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.iw = NULL; @@ -1069,6 +1090,7 @@ static int iw_conn_req_handler(struct iw struct rdma_id_private *listen_id, *conn_id; struct sockaddr_in *sin; struct net_device *dev = NULL; + struct rdma_cm_event event; int ret; listen_id = cm_id->context; @@ -1122,9 +1144,11 @@ static int iw_conn_req_handler(struct iw sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; *sin = iw_event->remote_addr; - ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, - iw_event->private_data, - iw_event->private_data_len); + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + event.param.conn.private_data = iw_event->private_data; + event.param.conn.private_data_len = iw_event->private_data_len; + ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* User wants to destroy the CM ID */ conn_id->cm_id.iw = NULL; @@ -1514,8 +1538,9 @@ static void addr_handler(int status, str struct rdma_dev_addr *dev_addr, void *context) { struct rdma_id_private *id_priv = context; - enum rdma_cm_event_type event; + struct rdma_cm_event event; + memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); /* @@ -1535,14 +1560,15 @@ static void addr_handler(int status, str if (status) { if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ADDR_BOUND)) goto out; - event = RDMA_CM_EVENT_ADDR_ERROR; + event.event = RDMA_CM_EVENT_ADDR_ERROR; + event.status = status; } else { memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); - event = RDMA_CM_EVENT_ADDR_RESOLVED; + event.event = RDMA_CM_EVENT_ADDR_RESOLVED; } - if (cma_notify_user(id_priv, event, status, NULL, 0)) { + if (id_priv->id.event_handler(&id_priv->id, &event)) { cma_exch(id_priv, CMA_DESTROYING); cma_release_remove(id_priv); cma_deref_id(id_priv); @@ -2131,6 +2157,7 @@ err: static int cma_remove_id_dev(struct rdma_id_private *id_priv) { + struct rdma_cm_event event; enum cma_state state; /* Record that we want to remove the device */ @@ -2145,8 +2172,9 @@ static int cma_remove_id_dev(struct rdma if (!cma_comp(id_priv, CMA_DEVICE_REMOVAL)) return 0; - return cma_notify_user(id_priv, RDMA_CM_EVENT_DEVICE_REMOVAL, - 0, NULL, 0); + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_DEVICE_REMOVAL; + return id_priv->id.event_handler(&id_priv->id, &event); } static void cma_process_remove(struct cma_device *cma_dev) diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 4c07f96..aa6ce47 100755 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -77,11 +77,25 @@ struct rdma_route { int num_paths; }; +struct rdma_conn_param { + const void *private_data; + u8 private_data_len; + u8 responder_resources; + u8 initiator_depth; + u8 flow_control; + u8 retry_count; /* ignored when accepting */ + u8 rnr_retry_count; + /* Fields below ignored if a QP is created on the rdma_cm_id. */ + u8 srq; + u32 qp_num; +}; + struct rdma_cm_event { enum rdma_cm_event_type event; int status; - void *private_data; - u8 private_data_len; + union { + struct rdma_conn_param conn; + } param; }; struct rdma_cm_id; @@ -204,19 +218,6 @@ void rdma_destroy_qp(struct rdma_cm_id * int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int *qp_attr_mask); -struct rdma_conn_param { - const void *private_data; - u8 private_data_len; - u8 responder_resources; - u8 initiator_depth; - u8 flow_control; - u8 retry_count; /* ignored when accepting */ - u8 rnr_retry_count; - /* Fields below ignored if a QP is created on the rdma_cm_id. */ - u8 srq; - u32 qp_num; -}; - /** * rdma_connect - Initiate an active connection request. * From sean.hefty at intel.com Tue Oct 10 16:22:22 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 16:22:22 -0700 Subject: [openib-general] [RFC] [PATCH 5/7] rdma_cm 2.6.20: add rdma_establish In-Reply-To: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> Message-ID: <000601c6ecc2$f07b70b0$c0d4180a@amr.corp.intel.com> Allow ULPs to transition to RTS before sending a REP. This allows the ULP to respond to a received message if it arrives before the RTU or communication established event. Modify the RDMA CM to transition to RTS when sending a REP over IB, and expose a new rdma_establish interface that a user can invoke to force a connection into the established state if it polls a receive completion before an RTU arrives. Signed-off-by: Sean Hefty --- This is a new call not in svn. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 25b1018..22ec434 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3243,6 +3243,10 @@ static int cm_init_qp_rts_attr(struct cm spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { + /* Allow transition to RTS before sending REP */ + case IB_CM_REQ_RCVD: + case IB_CM_MRA_REQ_SENT: + case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: case IB_CM_REP_SENT: diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 3743ede..5059c27 100755 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -759,22 +759,6 @@ static int cma_verify_rep(struct rdma_id return 0; } -static int cma_rtu_recv(struct rdma_id_private *id_priv) -{ - int ret; - - ret = cma_modify_qp_rts(&id_priv->id); - if (ret) - goto reject; - - return 0; -reject: - cma_modify_qp_err(&id_priv->id); - ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); - return ret; -} - static void cma_set_rep_event_data(struct rdma_cm_event *event, struct ib_cm_rep_event_param *rep_data, void *private_data) @@ -820,9 +804,8 @@ static int cma_ib_handler(struct ib_cm_i ib_event->private_data); break; case IB_CM_RTU_RECEIVED: - event.status = cma_rtu_recv(id_priv); - event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + case IB_CM_USER_ESTABLISHED: + event.event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: event.status = -ETIMEDOUT; /* fall through */ @@ -1983,11 +1966,25 @@ static int cma_accept_ib(struct rdma_id_ struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; - int ret; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; - ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) - return ret; + if (id_priv->id.qp) { + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + goto out; + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, &qp_attr, + &qp_attr_mask); + if (ret) + goto out; + + qp_attr.max_rd_atomic = conn_param->initiator_depth; + ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); + if (ret) + goto out; + } memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; @@ -2002,7 +1999,9 @@ static int cma_accept_ib(struct rdma_id_ rep.rnr_retry_count = conn_param->rnr_retry_count; rep.srq = id_priv->srq ? 1 : 0; - return ib_send_cm_rep(id_priv->cm_id.ib, &rep); + ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); +out: + return ret; } static int cma_accept_iw(struct rdma_id_private *id_priv, @@ -2067,6 +2066,27 @@ reject: } EXPORT_SYMBOL(rdma_accept); +int rdma_establish(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + switch (id->device->node_type) { + case RDMA_NODE_IB_CA: + ret = ib_cm_establish(id_priv->cm_id.ib); + break; + default: + ret = 0; + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_establish); + int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index aa6ce47..dbc7c56 100755 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -253,6 +253,16 @@ int rdma_listen(struct rdma_cm_id *id, i int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** + * rdma_establish - Forces a connection state to established. + * @id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * QP before being notified that the connection has been established by the + * RDMA CM. + */ +int rdma_establish(struct rdma_cm_id *id); + +/** * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, From sean.hefty at intel.com Tue Oct 10 16:30:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 16:30:55 -0700 Subject: [openib-general] [RFC] [PATCH 6/7] rdma_cm 2.6.20: add support for RDMA_PS_UDP In-Reply-To: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> Message-ID: <000701c6ecc4$2200a2d0$c0d4180a@amr.corp.intel.com> Add missing support for RDMA_PS_UDP. This allows the use of UD QPs through the rdma_cm, which provides address translation services over IB, even if not all RDMA transports support UD. Signed-off-by: Sean Hefty --- This patch differs from svn to reflect the change to include data with the event. The result is that rdma_get_dst_attr() is removed from the API, since the data is given to the user directly. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 5059c27..19d91c8 100755 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -69,6 +69,7 @@ static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); +static DEFINE_IDR(udp_ps); struct cma_device { struct list_head list; @@ -507,9 +508,17 @@ static inline int cma_any_addr(struct so return cma_zero_addr(addr) || cma_loopback_addr(addr); } +static inline __be16 cma_port(struct sockaddr *addr) +{ + if (addr->sa_family == AF_INET) + return ((struct sockaddr_in *) addr)->sin_port; + else + return ((struct sockaddr_in6 *) addr)->sin6_port; +} + static inline int cma_any_port(struct sockaddr *addr) { - return !((struct sockaddr_in *) addr)->sin_port; + return !cma_port(addr); } static int cma_get_net_info(void *hdr, enum rdma_port_space ps, @@ -846,8 +855,8 @@ out: return ret; } -static struct rdma_id_private *cma_new_id(struct rdma_cm_id *listen_id, - struct ib_cm_event *ib_event) +static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; @@ -894,6 +903,42 @@ err: return NULL; } +static struct rdma_id_private* cma_new_udp_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv; + struct rdma_cm_id *id; + union cma_ip_addr *src, *dst; + __u16 port; + u8 ip_ver; + int ret; + + id = rdma_create_id(listen_id->event_handler, listen_id->context, + listen_id->ps); + if (IS_ERR(id)) + return NULL; + + + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); + + ret = rdma_translate_ip(&id->route.addr.src_addr, + &id->route.addr.dev_addr); + if (ret) + goto err; + + id_priv = container_of(id, struct rdma_id_private, id); + id_priv->state = CMA_CONNECT; + return id_priv; +err: + rdma_destroy_id(id); + return NULL; +} + static void cma_set_req_event_data(struct rdma_cm_event *event, struct ib_cm_req_event_param *req_data, void *private_data, int offset) @@ -922,7 +967,19 @@ static int cma_req_handler(struct ib_cm_ goto out; } - conn_id = cma_new_id(&listen_id->id, ib_event); + memset(&event, 0, sizeof event); + offset = cma_user_data_offset(listen_id->id.ps); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + if (listen_id->id.ps == RDMA_PS_UDP) { + conn_id = cma_new_udp_id(&listen_id->id, ib_event); + event.param.ud.private_data = ib_event->private_data + offset; + event.param.ud.private_data_len = + IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset; + } else { + conn_id = cma_new_conn_id(&listen_id->id, ib_event); + cma_set_req_event_data(&event, &ib_event->param.req_rcvd, + ib_event->private_data, offset); + } if (!conn_id) { ret = -ENOMEM; goto out; @@ -944,11 +1001,6 @@ static int cma_req_handler(struct ib_cm_ cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; - offset = cma_user_data_offset(listen_id->id.ps); - memset(&event, 0, sizeof event); - event.event = RDMA_CM_EVENT_CONNECT_REQUEST; - cma_set_req_event_data(&event, &ib_event->param.req_rcvd, - ib_event->private_data, offset); ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ @@ -964,8 +1016,7 @@ out: static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) { - return cpu_to_be64(((u64)ps << 16) + - be16_to_cpu(((struct sockaddr_in *) addr)->sin_port)); + return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); } static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, @@ -1741,6 +1792,9 @@ static int cma_get_port(struct rdma_id_p case RDMA_PS_TCP: ps = &tcp_ps; break; + case RDMA_PS_UDP: + ps = &udp_ps; + break; default: return -EPROTONOSUPPORT; } @@ -1822,6 +1876,110 @@ static int cma_format_hdr(void *hdr, enu return 0; } +static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv = cm_id->context; + struct rdma_cm_event event; + struct ib_cm_sidr_rep_event_param *rep = &ib_event->param.sidr_rep_rcvd; + int ret = 0; + + memset(&event, 0, sizeof event); + atomic_inc(&id_priv->dev_remove); + if (!cma_comp(id_priv, CMA_CONNECT)) + goto out; + + switch (ib_event->event) { + case IB_CM_SIDR_REQ_ERROR: + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = -ETIMEDOUT; + break; + case IB_CM_SIDR_REP_RECEIVED: + event.param.ud.private_data = ib_event->private_data; + event.param.ud.private_data_len = IB_CM_SIDR_REP_PRIVATE_DATA_SIZE; + if (rep->status != IB_SIDR_SUCCESS) { + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = ib_event->param.sidr_rep_rcvd.status; + break; + } + if (rep->qkey != RDMA_UD_QKEY) { + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = -EINVAL; + break; + } + ib_init_ah_from_path(id_priv->id.device, id_priv->id.port_num, + id_priv->id.route.path_rec, + &event.param.ud.ah_attr); + event.param.ud.qp_num = rep->qpn; + event.param.ud.qkey = rep->qkey; + event.event = RDMA_CM_EVENT_ESTABLISHED; + event.status = 0; + break; + default: + printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", + ib_event->event); + goto out; + } + + ret = id_priv->id.event_handler(&id_priv->id, &event); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + id_priv->cm_id.ib = NULL; + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return ret; + } +out: + cma_release_remove(id_priv); + return ret; +} + +static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct ib_cm_sidr_req_param req; + struct rdma_route *route; + int ret; + + req.private_data_len = sizeof(struct cma_hdr) + + conn_param->private_data_len; + req.private_data = kzalloc(req.private_data_len, GFP_ATOMIC); + if (!req.private_data) + return -ENOMEM; + + if (conn_param->private_data && conn_param->private_data_len) + memcpy((void *) req.private_data + sizeof(struct cma_hdr), + conn_param->private_data, conn_param->private_data_len); + + route = &id_priv->id.route; + ret = cma_format_hdr((void *) req.private_data, id_priv->id.ps, route); + if (ret) + goto out; + + id_priv->cm_id.ib = ib_create_cm_id(id_priv->id.device, + cma_sidr_rep_handler, id_priv); + if (IS_ERR(id_priv->cm_id.ib)) { + ret = PTR_ERR(id_priv->cm_id.ib); + goto out; + } + + req.path = route->path_rec; + req.service_id = cma_get_service_id(id_priv->id.ps, + &route->addr.dst_addr); + req.timeout_ms = 1 << (CMA_CM_RESPONSE_TIMEOUT - 8); + req.max_cm_retries = CMA_MAX_CM_RETRIES; + + ret = ib_send_cm_sidr_req(id_priv->cm_id.ib, &req); + if (ret) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } +out: + kfree(req.private_data); + return ret; +} + static int cma_connect_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { @@ -1943,7 +2101,10 @@ int rdma_connect(struct rdma_cm_id *id, switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - ret = cma_connect_ib(id_priv, conn_param); + if (id->ps == RDMA_PS_UDP) + ret = cma_resolve_ib_udp(id_priv, conn_param); + else + ret = cma_connect_ib(id_priv, conn_param); break; case RDMA_TRANSPORT_IWARP: ret = cma_connect_iw(id_priv, conn_param); @@ -2026,6 +2187,24 @@ static int cma_accept_iw(struct rdma_id_ return iw_cm_accept(id_priv->cm_id.iw, &iw_param); } +static int cma_send_sidr_rep(struct rdma_id_private *id_priv, + enum ib_cm_sidr_status status, + const void *private_data, int private_data_len) +{ + struct ib_cm_sidr_rep_param rep; + + memset(&rep, 0, sizeof rep); + rep.status = status; + if (status == IB_SIDR_SUCCESS) { + rep.qp_num = id_priv->qp_num; + rep.qkey = RDMA_UD_QKEY; + } + rep.private_data = private_data; + rep.private_data_len = private_data_len; + + return ib_send_cm_sidr_rep(id_priv->cm_id.ib, &rep); +} + int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -2042,7 +2221,11 @@ int rdma_accept(struct rdma_cm_id *id, s switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - if (conn_param) + if (id->ps == RDMA_PS_UDP) + ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS, + conn_param->private_data, + conn_param->private_data_len); + else if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); @@ -2099,9 +2282,13 @@ int rdma_reject(struct rdma_cm_id *id, c switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - ret = ib_send_cm_rej(id_priv->cm_id.ib, - IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, - private_data, private_data_len); + if (id->ps == RDMA_PS_UDP) + ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, + private_data, private_data_len); + else + ret = ib_send_cm_rej(id_priv->cm_id.ib, + IB_CM_REJ_CONSUMER_DEFINED, NULL, + 0, private_data, private_data_len); break; case RDMA_TRANSPORT_IWARP: ret = iw_cm_reject(id_priv->cm_id.iw, @@ -2273,6 +2460,7 @@ static void cma_cleanup(void) destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); idr_destroy(&tcp_ps); + idr_destroy(&udp_ps); } module_init(cma_init); diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index dbc7c56..595f1a7 100755 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -90,11 +90,20 @@ struct rdma_conn_param { u32 qp_num; }; +struct rdma_ud_param { + const void *private_data; + u8 private_data_len; + struct ib_ah_attr ah_attr; + u32 qp_num; + u32 qkey; +}; + struct rdma_cm_event { enum rdma_cm_event_type event; int status; union { struct rdma_conn_param conn; + struct rdma_ud_param ud; } param; }; @@ -220,9 +229,15 @@ int rdma_init_qp_attr(struct rdma_cm_id /** * rdma_connect - Initiate an active connection request. + * @id: Connection identifier to connect. + * @conn_param: Connection information used for connected QPs. * * Users must have resolved a route for the rdma_cm_id to connect with * by having called rdma_resolve_route before calling this routine. + * + * This call will either connect to a remote QP or obtain remote QP + * information for unconnected rdma_cm_id's. The actual operation is + * based on the rdma_cm_id's port space. */ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); diff --git a/include/rdma/rdma_cm_ib.h b/include/rdma/rdma_cm_ib.h index e8c3af1..9b176df 100644 --- a/include/rdma/rdma_cm_ib.h +++ b/include/rdma/rdma_cm_ib.h @@ -44,4 +44,7 @@ #include int rdma_set_ib_paths(struct rdma_cm_id *id, struct ib_sa_path_rec *path_rec, int num_paths); +/* Global qkey for UD QPs and multicast groups. */ +#define RDMA_UD_QKEY 0x01234567 + #endif /* RDMA_CM_IB_H */ From sean.hefty at intel.com Tue Oct 10 16:39:45 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 16:39:45 -0700 Subject: [openib-general] [RFC] [PATCH 7/7-but i can't count] rdma_cm 2.6.20: add multicast support In-Reply-To: <000701c6ecc4$2200a2d0$c0d4180a@amr.corp.intel.com> Message-ID: <000801c6ecc5$5e62e610$c0d4180a@amr.corp.intel.com> Add multicast QP support to the rdma_cm. - Users identify multicast groups by using a multicast IP address. - IB multicast group parameters are based on the ipoib broadcast group. The MGID is derived using a method similar to ipoib, but with a different signature. - QPs are automatically attached and detached from groups. - A QP may join multiple groups. Signed-off-by: Sean Hefty --- This patch differs from svn as a result of reporting data with the multicast event, rather than through a separate call. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 19d91c8..4726292 100755 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -45,6 +45,7 @@ #include #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); @@ -114,6 +115,7 @@ struct rdma_id_private { struct list_head list; struct list_head listen_list; struct cma_device *cma_dev; + struct list_head mc_list; enum cma_state state; spinlock_t lock; @@ -136,6 +138,18 @@ struct rdma_id_private { u8 srq; }; +struct cma_multicast { + struct rdma_id_private *id_priv; + union { + struct ib_multicast *ib; + } multicast; + struct list_head list; + void *context; + struct sockaddr addr; + u8 pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; +}; + struct cma_work { struct work_struct work; struct rdma_id_private *id; @@ -323,6 +337,7 @@ struct rdma_cm_id *rdma_create_id(rdma_c init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); INIT_LIST_HEAD(&id_priv->listen_list); + INIT_LIST_HEAD(&id_priv->mc_list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); return &id_priv->id; @@ -696,6 +711,19 @@ static void cma_release_port(struct rdma mutex_unlock(&lock); } +static void cma_leave_mc_groups(struct rdma_id_private *id_priv) +{ + struct cma_multicast *mc; + + while (!list_empty(&id_priv->mc_list)) { + mc = container_of(id_priv->mc_list.next, + struct cma_multicast, list); + list_del(&mc->list); + ib_free_multicast(mc->multicast.ib); + kfree(mc); + } +} + void rdma_destroy_id(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; @@ -720,6 +748,7 @@ void rdma_destroy_id(struct rdma_cm_id * default: break; } + cma_leave_mc_groups(id_priv); mutex_lock(&lock); cma_detach_from_dev(id_priv); } @@ -2333,6 +2362,159 @@ out: } EXPORT_SYMBOL(rdma_disconnect); +static int cma_ib_mc_handler(int status, struct ib_multicast *multicast) +{ + struct rdma_id_private *id_priv; + struct cma_multicast *mc = multicast->context; + struct rdma_cm_event event; + int ret; + + id_priv = mc->id_priv; + atomic_inc(&id_priv->dev_remove); + if (!cma_comp(id_priv, CMA_ADDR_BOUND) && + !cma_comp(id_priv, CMA_ADDR_RESOLVED)) + goto out; + + if (!status && id_priv->id.qp) + status = ib_attach_mcast(id_priv->id.qp, &multicast->rec.mgid, + multicast->rec.mlid); + + memset(&event, 0, sizeof event); + event.status = status; + event.param.ud.private_data = mc->context; + if (!status) { + event.event = RDMA_CM_EVENT_MULTICAST_JOIN; + ib_init_ah_from_mcmember(id_priv->id.device, + id_priv->id.port_num, &multicast->rec, + &event.param.ud.ah_attr); + event.param.ud.qp_num = 0xFFFFFF; + event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey); + } else + event.event = RDMA_CM_EVENT_MULTICAST_ERROR; + + ret = id_priv->id.event_handler(&id_priv->id, &event); + if (ret) { + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return 0; + } +out: + cma_release_remove(id_priv); + return 0; +} + +static int cma_join_ib_multicast(struct rdma_id_private *id_priv, + struct cma_multicast *mc) +{ + struct ib_sa_mcmember_rec rec; + unsigned char mc_map[MAX_ADDR_LEN]; + struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; + struct sockaddr_in *sin = (struct sockaddr_in *) &mc->addr; + ib_sa_comp_mask comp_mask; + int ret; + + ib_addr_get_mgid(dev_addr, &rec.mgid); + ret = ib_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, + &rec.mgid, &rec); + if (ret) + return ret; + + ip_ib_mc_map(sin->sin_addr.s_addr, mc_map); + mc_map[7] = 0x01; /* Use RDMA CM signature */ + mc_map[8] = ib_addr_get_pkey(dev_addr) >> 8; + mc_map[9] = (unsigned char) ib_addr_get_pkey(dev_addr); + + rec.mgid = *(union ib_gid *) (mc_map + 4); + ib_addr_get_sgid(dev_addr, &rec.port_gid); + rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); + rec.join_state = 1; + rec.qkey = sin->sin_addr.s_addr; + + comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE | + IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_SL | + IB_SA_MCMEMBER_REC_FLOW_LABEL | + IB_SA_MCMEMBER_REC_TRAFFIC_CLASS; + + mc->multicast.ib = ib_join_multicast(id_priv->id.device, + id_priv->id.port_num, &rec, + comp_mask, GFP_KERNEL, + cma_ib_mc_handler, mc); + if (IS_ERR(mc->multicast.ib)) + return PTR_ERR(mc->multicast.ib); + + return 0; +} + +int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, + void *context) +{ + struct rdma_id_private *id_priv; + struct cma_multicast *mc; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_ADDR_BOUND) && + !cma_comp(id_priv, CMA_ADDR_RESOLVED)) + return -EINVAL; + + mc = kmalloc(sizeof *mc, GFP_KERNEL); + if (!mc) + return -ENOMEM; + + memcpy(&mc->addr, addr, ip_addr_size(addr)); + mc->context = context; + mc->id_priv = id_priv; + + spin_lock(&id_priv->lock); + list_add(&mc->list, &id_priv->mc_list); + spin_unlock(&id_priv->lock); + + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: + ret = cma_join_ib_multicast(id_priv, mc); + break; + default: + ret = -ENOSYS; + break; + } + + if (ret) { + spin_lock_irq(&id_priv->lock); + list_del(&mc->list); + spin_unlock_irq(&id_priv->lock); + kfree(mc); + } + return ret; +} +EXPORT_SYMBOL(rdma_join_multicast); + +void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) +{ + struct rdma_id_private *id_priv; + struct cma_multicast *mc; + + id_priv = container_of(id, struct rdma_id_private, id); + spin_lock_irq(&id_priv->lock); + list_for_each_entry(mc, &id_priv->mc_list, list) { + if (!memcmp(&mc->addr, addr, ip_addr_size(addr))) { + list_del(&mc->list); + spin_unlock_irq(&id_priv->lock); + + if (id->qp) + ib_detach_mcast(id->qp, + &mc->multicast.ib->rec.mgid, + mc->multicast.ib->rec.mlid); + ib_free_multicast(mc->multicast.ib); + kfree(mc); + return; + } + } + spin_unlock_irq(&id_priv->lock); +} +EXPORT_SYMBOL(rdma_leave_multicast); + static void cma_add_one(struct ib_device *device) { struct cma_device *cma_dev; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 82d4736..c4c204d 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -500,6 +500,36 @@ int ib_init_ah_from_path(struct ib_devic } EXPORT_SYMBOL(ib_init_ah_from_path); +int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + struct ib_ah_attr *ah_attr) +{ + int ret; + u16 gid_index; + u8 p; + + ret = ib_find_cached_gid(device, &rec->port_gid, &p, &gid_index); + if (ret) + return ret; + + memset(ah_attr, 0, sizeof *ah_attr); + ah_attr->dlid = be16_to_cpu(rec->mlid); + ah_attr->sl = rec->sl; + ah_attr->port_num = port_num; + ah_attr->static_rate = rec->rate; + + ah_attr->ah_flags = IB_AH_GRH; + ah_attr->grh.dgid = rec->mgid; + + ah_attr->grh.sgid_index = (u8) gid_index; + ah_attr->grh.flow_label = be32_to_cpu(rec->flow_label); + ah_attr->grh.hop_limit = rec->hop_limit; + ah_attr->grh.traffic_class = rec->traffic_class; + + return 0; +} +EXPORT_SYMBOL(ib_init_ah_from_mcmember); + static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent) { unsigned long flags; diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 81b6230..5bc318c 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -92,6 +92,12 @@ static inline void ib_addr_set_pkey(stru dev_addr->broadcast[9] = (unsigned char) pkey; } +static inline void ib_addr_get_mgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) +{ + memcpy(gid, dev_addr->broadcast + 4, sizeof *gid); +} + static inline void ib_addr_get_sgid(struct rdma_dev_addr *dev_addr, union ib_gid *gid) { diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h index e94656a..1c2ccc2 100644 --- a/include/rdma/ib_sa.h +++ b/include/rdma/ib_sa.h @@ -399,6 +399,14 @@ ib_sa_mcmember_rec_delete(struct ib_sa_c context, query); } + /** + * ib_init_ah_from_mcmember - Initialize address handle attributes based on an + * SA mcmember record. + */ +int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + struct ib_ah_attr *ah_attr); + /** * ib_init_ah_from_path - Initialize address handle attributes based on an SA * path record. diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 595f1a7..9efbbdc 100755 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -52,6 +52,8 @@ enum rdma_cm_event_type { RDMA_CM_EVENT_ESTABLISHED, RDMA_CM_EVENT_DISCONNECTED, RDMA_CM_EVENT_DEVICE_REMOVAL, + RDMA_CM_EVENT_MULTICAST_JOIN, + RDMA_CM_EVENT_MULTICAST_ERROR }; enum rdma_port_space { @@ -289,5 +291,21 @@ int rdma_reject(struct rdma_cm_id *id, c */ int rdma_disconnect(struct rdma_cm_id *id); -#endif /* RDMA_CM_H */ +/** + * rdma_join_multicast - Join the multicast group specified by the given + * address. + * @id: Communication identifier associated with the request. + * @addr: Multicast address identifying the group to join. + * @context: User-defined context associated with the join request, returned + * to the user through the private_data pointer in multicast events. + */ +int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, + void *context); +/** + * rdma_leave_multicast - Leave the multicast group specified by the given + * address. + */ +void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr); + +#endif /* RDMA_CM_H */ From mst at mellanox.co.il Tue Oct 10 16:49:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 01:49:33 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> Message-ID: <20061010234933.GA29632@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > The IB SA tracks multicast join / leave requests on a per port basis. > In order to support multiple users of the same multicast group from > the same port, we need to perform local reference counting on each > of the nodes. We do need to do something about multicast support, thanks. Here are some comments on the patch. First, on an API cleanliness - it seems with this we get two ways to join an mcast group - the right refcounted one and the wrong non-refcounted one. So, why do we keep the old one around? It would make some sense to keep this API separate if the mcast module would only be targeted at userspace users - kernel consumers likely do not share mcast groups, so they could avoid the overhead. But since this patch seems to move all kernels users to the new API to - why does this need to be a separate module at all? E.g. the ipoib change looks very large - but the new and old APIs look almost exactly the same - so what's going on here? > > Add an ib_multicast module to perform reference counting of multicast > join / leave requests. Modify ib_ipoib to use the multicast module. > > Signed-off-by: Sean Hefty On the ipoib change - whya re we doing it at all? ib_ipoib does not actually need the multicast refcounting, does it? I would be worried about doing major changes in ipoib multicast code, at this point. And I'm worried about the extensive use of atomic operations this patch introduces - both performance and race-condition-wise. Can't we stick to simple locking? Replacing completion with a single BUSY bit looks especially scary. For example: > > - mutex_lock(&mcast_mutex); > + /* Clear the busy flag so we try again */ > + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); > > + mutex_lock(&mcast_mutex); > spin_lock_irq(&priv->lock); > - mcast->query = NULL; > - > - if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { > - if (status == -ETIMEDOUT) > - queue_work(ipoib_workqueue, &priv->mcast_task); > - else > - queue_delayed_work(ipoib_workqueue, &priv->mcast_task, > - mcast->backoff * HZ); > - } else > - complete(&mcast->done); > + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) > + queue_delayed_work(ipoib_workqueue, &priv->mcast_task, > + mcast->backoff * HZ); > spin_unlock_irq(&priv->lock); > mutex_unlock(&mcast_mutex); > > - return; > + return status; > } We used to do complete last thing on mcast object, now you are touching the object after IPOIB_MCAST_FLAG_BUSY is cleared, apparently. > static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast, > @@ -493,15 +488,14 @@ static void ipoib_mcast_join(struct net_ > rec.hop_limit = priv->broadcast->mcmember.hop_limit; > } > > - init_completion(&mcast->done); > - > - ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, > - &rec, comp_mask, mcast->backoff * 1000, > - GFP_ATOMIC, ipoib_mcast_join_complete, > - mcast, &mcast->query); > - > - if (ret < 0) { > - ipoib_warn(priv, "ib_sa_mcmember_rec_set failed, status %d\n", ret); > + set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); > + mcast->mc = ib_join_multicast(priv->ca, priv->port, &rec, comp_mask, > + GFP_KERNEL, ipoib_mcast_join_complete, > + mcast); > + if (IS_ERR(mcast->mc)) { > + clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); > + ret = PTR_ERR(mcast->mc); > + ipoib_warn(priv, "ib_join_multicast failed, status %d\n", ret); > > mcast->backoff *= 2; > if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) > @@ -513,8 +507,7 @@ static void ipoib_mcast_join(struct net_ > &priv->mcast_task, > mcast->backoff * HZ); > mutex_unlock(&mcast_mutex); > - } else > - mcast->query_id = ret; > + } > } > > void ipoib_mcast_join_task(void *dev_ptr) > @@ -538,7 +531,7 @@ void ipoib_mcast_join_task(void *dev_ptr > priv->local_rate = attr.active_speed * > ib_width_enum_to_int(attr.active_width); > } else > - ipoib_warn(priv, "ib_query_port failed\n"); > + ipoib_warn(priv, "ib_query_port failed\n"); > } > > if (!priv->broadcast) { > @@ -565,7 +558,8 @@ void ipoib_mcast_join_task(void *dev_ptr > } > > if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > - ipoib_mcast_join(dev, priv->broadcast, 0); > + if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) > + ipoib_mcast_join(dev, priv->broadcast, 0); > return; > } > And here (and below) it looks like you assume no one is touching the object if BUSY is cleared. Hmm. > @@ -620,26 +614,9 @@ int ipoib_mcast_start_thread(struct net_ > return 0; > } > > -static void wait_for_mcast_join(struct ipoib_dev_priv *priv, > - struct ipoib_mcast *mcast) > -{ > - spin_lock_irq(&priv->lock); > - if (mcast && mcast->query) { > - ib_sa_cancel_query(mcast->query_id, mcast->query); > - mcast->query = NULL; > - spin_unlock_irq(&priv->lock); > - ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n", > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > - wait_for_completion(&mcast->done); > - } > - else > - spin_unlock_irq(&priv->lock); > -} > - > int ipoib_mcast_stop_thread(struct net_device *dev, int flush) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > - struct ipoib_mcast *mcast; > > ipoib_dbg_mcast(priv, "stopping multicast thread\n"); > > @@ -655,52 +632,27 @@ int ipoib_mcast_stop_thread(struct net_d > if (flush) > flush_workqueue(ipoib_workqueue); > > - wait_for_mcast_join(priv, priv->broadcast); > - > - list_for_each_entry(mcast, &priv->multicast_list, list) > - wait_for_mcast_join(priv, mcast); > - > return 0; > } Looks like callbacks could be still running after we do ipoib_mcast_stop_thread? > > static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > - struct ib_sa_mcmember_rec rec = { > - .join_state = 1 > - }; > int ret = 0; > > - if (!test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) > - return 0; > - > - ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", > - IPOIB_GID_ARG(mcast->mcmember.mgid)); > - > - rec.mgid = mcast->mcmember.mgid; > - rec.port_gid = priv->local_gid; > - rec.pkey = cpu_to_be16(priv->pkey); > + if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { > + ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", > + IPOIB_GID_ARG(mcast->mcmember.mgid)); > > - /* Remove ourselves from the multicast group */ > - ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), > - &mcast->mcmember.mgid); > - if (ret) > - ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); > + /* Remove ourselves from the multicast group */ > + ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), > + &mcast->mcmember.mgid); > + if (ret) > + ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); > + } > > - /* > - * Just make one shot at leaving and don't wait for a reply; > - * if we fail, too bad. > - */ > - ret = ib_sa_mcmember_rec_delete(&ipoib_sa_client, priv->ca, priv->port, &rec, > - IB_SA_MCMEMBER_REC_MGID | > - IB_SA_MCMEMBER_REC_PORT_GID | > - IB_SA_MCMEMBER_REC_PKEY | > - IB_SA_MCMEMBER_REC_JOIN_STATE, > - 0, GFP_ATOMIC, NULL, > - mcast, &mcast->query); > - if (ret < 0) > - ipoib_warn(priv, "ib_sa_mcmember_rec_delete failed " > - "for leave (result = %d)\n", ret); > + if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) > + ib_free_multicast(mcast->mc); > > return 0; > } > @@ -753,7 +705,7 @@ void ipoib_mcast_send(struct net_device > dev_kfree_skb_any(skb); > } > > - if (mcast->query) > + if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) > ipoib_dbg_mcast(priv, "no address vector, " > "but multicast join already started\n"); > else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) > @@ -910,7 +862,6 @@ void ipoib_mcast_restart_task(void *dev_ > > /* We have to cancel outside of the spinlock */ > list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { > - wait_for_mcast_join(priv, mcast); > ipoib_mcast_leave(mcast->dev, mcast); > ipoib_mcast_free(mcast); > } What prevents us from exiting while callbacks are in progress? Basically same applies wherever we used to call wait_for_mcast_join. > diff --git a/include/rdma/ib_multicast.h b/include/rdma/ib_multicast.h > new file mode 100755 > index 0000000..423b754 > +/** > + * ib_join_multicast - Initiates a join request to the specified multicast > + * group. > + * @device: Device associated with the multicast group. > + * @port_num: Port on the specified device to associate with the multicast > + * group. > + * @rec: SA multicast member record specifying group attributes. > + * @comp_mask: Component mask indicating which group attributes of %rec are > + * valid. > + * @gfp_mask: GFP mask for memory allocations. > + * @callback: User callback invoked once the join operation completes. > + * @context: User specified context stored with the ib_multicast structure. > + * > + * This call initiates a multicast join request with the SA for the specified > + * multicast group. If the join operation is started successfully, it returns > + * an ib_multicast structure that is used to track the multicast operation. > + * Users must free this structure by calling ib_free_multicast, even if the > + * join operation later fails. (The callback status is non-zero.) > + */ > +struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, > + struct ib_sa_mcmember_rec *rec, > + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, > + int (*callback)(int status, > + struct ib_multicast > + *multicast), > + void *context); > + Is this re-introducing module unload races we had with sa all over again? I also started to wander why do we need a new API for this at all? Can't the sa module be fixed to refcount the mcast joins properly for us, with minor or no API changes? -- MST From sean.hefty at intel.com Tue Oct 10 16:51:23 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 16:51:23 -0700 Subject: [openib-general] [RFC] [PATCH 8/8] rdma_ucm 2.6.20: add userspace support for rdma_cm In-Reply-To: <000801c6ecc5$5e62e610$c0d4180a@amr.corp.intel.com> Message-ID: <000901c6ecc6$fe1b0060$c0d4180a@amr.corp.intel.com> Export the rdma_cm capabilities to userspace. Signed-off-by: Sean Hefty --- I added in a patch to include rdma_establish with this series, since it's going to miss 2.6.19. This threw off my patch counting. This patch differs from svn in a few areas. First, data is reported with events, eliminating the rdma_get_dst_attr() call from userspace. Secondly, get/set option implementations have been removed. There's also a bug in the svn code that allows reporting a multicast event to a user after they've destroyed the group. This patch includes a fix for that, which simplifies the user/kernel interface slightly. diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 76cc988..44b153e 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,10 +1,12 @@ infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o +user_access-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_ucm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ ib_multicast.o \ ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o -obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o +obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ + $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o @@ -21,6 +23,8 @@ iw_cm-y := iwcm.o rdma_cm-y := cma.o +rdma_ucm-y := ucma.o + ib_addr-y := addr.o ib_umad-y := user_mad.o diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c new file mode 100755 index 0000000..4dae930 --- /dev/null +++ b/drivers/infiniband/core/ucma.c @@ -0,0 +1,1067 @@ +/* + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access"); +MODULE_LICENSE("Dual BSD/GPL"); + +enum { + UCMA_MAX_BACKLOG = 128 +}; + +struct ucma_file { + struct mutex mut; + struct file *filp; + struct list_head ctx_list; + struct list_head event_list; + wait_queue_head_t poll_wait; +}; + +struct ucma_context { + int id; + struct completion comp; + atomic_t ref; + int events_reported; + int backlog; + + struct ucma_file *file; + struct rdma_cm_id *cm_id; + __u64 uid; + + struct list_head list; + struct list_head mc_list; +}; + +struct ucma_multicast { + struct ucma_context *ctx; + int id; + int events_reported; + + __u64 uid; + struct list_head list; + struct sockaddr addr; + u8 pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; +}; + +struct ucma_event { + struct ucma_context *ctx; + struct ucma_multicast *mc; + struct list_head list; + struct rdma_cm_id *cm_id; + struct rdma_ucm_event_resp resp; +}; + +static DEFINE_MUTEX(mut); +static DEFINE_IDR(ctx_idr); +static DEFINE_IDR(multicast_idr); + +static inline struct ucma_context* _ucma_find_context(int id, + struct ucma_file *file) +{ + struct ucma_context *ctx; + + ctx = idr_find(&ctx_idr, id); + if (!ctx) + ctx = ERR_PTR(-ENOENT); + else if (ctx->file != file) + ctx = ERR_PTR(-EINVAL); + return ctx; +} + +static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id) +{ + struct ucma_context *ctx; + + mutex_lock(&mut); + ctx = _ucma_find_context(id, file); + if (!IS_ERR(ctx)) + atomic_inc(&ctx->ref); + mutex_unlock(&mut); + return ctx; +} + +static void ucma_put_ctx(struct ucma_context *ctx) +{ + if (atomic_dec_and_test(&ctx->ref)) + complete(&ctx->comp); +} + +static struct ucma_context* ucma_alloc_ctx(struct ucma_file *file) +{ + struct ucma_context *ctx; + int ret; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + atomic_set(&ctx->ref, 1); + init_completion(&ctx->comp); + INIT_LIST_HEAD(&ctx->mc_list); + ctx->file = file; + + do { + ret = idr_pre_get(&ctx_idr, GFP_KERNEL); + if (!ret) + goto error; + + mutex_lock(&mut); + ret = idr_get_new(&ctx_idr, ctx, &ctx->id); + mutex_unlock(&mut); + } while (ret == -EAGAIN); + + if (ret) + goto error; + + list_add_tail(&ctx->list, &file->ctx_list); + return ctx; + +error: + kfree(ctx); + return NULL; +} + +static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx) +{ + struct ucma_multicast *mc; + int ret; + + mc = kzalloc(sizeof(*mc), GFP_KERNEL); + if (!mc) + return NULL; + + do { + ret = idr_pre_get(&multicast_idr, GFP_KERNEL); + if (!ret) + goto error; + + mutex_lock(&mut); + ret = idr_get_new(&multicast_idr, mc, &mc->id); + mutex_unlock(&mut); + } while (ret == -EAGAIN); + + if (ret) + goto error; + + mc->ctx = ctx; + list_add_tail(&mc->list, &ctx->mc_list); + return mc; + +error: + kfree(mc); + return NULL; +} + +static void ucma_copy_conn_event(struct rdma_ucm_conn_param *dst, + struct rdma_conn_param *src) +{ + if (src->private_data_len) + memcpy(dst->private_data, src->private_data, + src->private_data_len); + dst->private_data_len = src->private_data_len; + dst->responder_resources =src->responder_resources; + dst->initiator_depth = src->initiator_depth; + dst->flow_control = src->flow_control; + dst->retry_count = src->retry_count; + dst->rnr_retry_count = src->rnr_retry_count; + dst->srq = src->srq; + dst->qp_num = src->qp_num; +} + +static void ucma_copy_ud_event(struct rdma_ucm_ud_param *dst, + struct rdma_ud_param *src) +{ + if (src->private_data_len) + memcpy(dst->private_data, src->private_data, + src->private_data_len); + dst->private_data_len = src->private_data_len; + ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr); + dst->qp_num = src->qp_num; + dst->qkey = src->qkey; +} + +static void ucma_set_event_context(struct ucma_context *ctx, + struct rdma_cm_event *event, + struct ucma_event *uevent) +{ + uevent->ctx = ctx; + switch (event->event) { + case RDMA_CM_EVENT_MULTICAST_JOIN: + case RDMA_CM_EVENT_MULTICAST_ERROR: + uevent->mc = (struct ucma_multicast *) + event->param.ud.private_data; + uevent->resp.uid = uevent->mc->uid; + uevent->resp.id = uevent->mc->id; + break; + default: + uevent->resp.uid = ctx->uid; + uevent->resp.id = ctx->id; + break; + } +} + +static int ucma_event_handler(struct rdma_cm_id *cm_id, + struct rdma_cm_event *event) +{ + struct ucma_event *uevent; + struct ucma_context *ctx = cm_id->context; + int ret = 0; + + uevent = kzalloc(sizeof(*uevent), GFP_KERNEL); + if (!uevent) + return event->event == RDMA_CM_EVENT_CONNECT_REQUEST; + + uevent->cm_id = cm_id; + ucma_set_event_context(ctx, event, uevent); + uevent->resp.event = event->event; + uevent->resp.status = event->status; + if (cm_id->ps == RDMA_PS_UDP) + ucma_copy_ud_event(&uevent->resp.param.ud, &event->param.ud); + else + ucma_copy_conn_event(&uevent->resp.param.conn, + &event->param.conn); + + mutex_lock(&ctx->file->mut); + if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) { + if (!ctx->backlog) { + ret = -EDQUOT; + goto out; + } + ctx->backlog--; + } + list_add_tail(&uevent->list, &ctx->file->event_list); + wake_up_interruptible(&ctx->file->poll_wait); +out: + mutex_unlock(&ctx->file->mut); + return ret; +} + +static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct ucma_context *ctx; + struct rdma_ucm_get_event cmd; + struct ucma_event *uevent; + int ret = 0; + DEFINE_WAIT(wait); + + if (out_len < sizeof uevent->resp) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->mut); + while (list_empty(&file->event_list)) { + if (file->filp->f_flags & O_NONBLOCK) { + ret = -EAGAIN; + break; + } + + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + + prepare_to_wait(&file->poll_wait, &wait, TASK_INTERRUPTIBLE); + mutex_unlock(&file->mut); + schedule(); + mutex_lock(&file->mut); + finish_wait(&file->poll_wait, &wait); + } + + if (ret) + goto done; + + uevent = list_entry(file->event_list.next, struct ucma_event, list); + + if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) { + ctx = ucma_alloc_ctx(file); + if (!ctx) { + ret = -ENOMEM; + goto done; + } + uevent->ctx->backlog++; + ctx->cm_id = uevent->cm_id; + ctx->cm_id->context = ctx; + uevent->resp.id = ctx->id; + } + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &uevent->resp, sizeof uevent->resp)) { + ret = -EFAULT; + goto done; + } + + list_del(&uevent->list); + uevent->ctx->events_reported++; + if (uevent->mc) + uevent->mc->events_reported++; + kfree(uevent); +done: + mutex_unlock(&file->mut); + return ret; +} + +static ssize_t ucma_create_id(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_create_id cmd; + struct rdma_ucm_create_id_resp resp; + struct ucma_context *ctx; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->mut); + ctx = ucma_alloc_ctx(file); + mutex_unlock(&file->mut); + if (!ctx) + return -ENOMEM; + + ctx->uid = cmd.uid; + ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps); + if (IS_ERR(ctx->cm_id)) { + ret = PTR_ERR(ctx->cm_id); + goto err1; + } + + resp.id = ctx->id; + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) { + ret = -EFAULT; + goto err2; + } + return 0; + +err2: + rdma_destroy_id(ctx->cm_id); +err1: + mutex_lock(&mut); + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&mut); + kfree(ctx); + return ret; +} + +static void ucma_cleanup_multicast(struct ucma_context *ctx) +{ + struct ucma_multicast *mc, *tmp; + + mutex_lock(&mut); + list_for_each_entry_safe(mc, tmp, &ctx->mc_list, list) { + list_del(&mc->list); + idr_remove(&multicast_idr, mc->id); + kfree(mc); + } + mutex_unlock(&mut); +} + +static void ucma_cleanup_events(struct ucma_context *ctx) +{ + struct ucma_event *uevent, *tmp; + + list_for_each_entry_safe(uevent, tmp, &ctx->file->event_list, list) { + if (uevent->ctx != ctx) + continue; + + list_del(&uevent->list); + + /* clear incoming connections. */ + if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) + rdma_destroy_id(uevent->cm_id); + + kfree(uevent); + } +} + +static void ucma_cleanup_mc_events(struct ucma_multicast *mc) +{ + struct ucma_event *uevent, *tmp; + + list_for_each_entry_safe(uevent, tmp, &mc->ctx->file->event_list, list) { + if (uevent->mc != mc) + continue; + + list_del(&uevent->list); + kfree(uevent); + } +} + +static int ucma_free_ctx(struct ucma_context *ctx) +{ + int events_reported; + + /* No new events will be generated after destroying the id. */ + rdma_destroy_id(ctx->cm_id); + + ucma_cleanup_multicast(ctx); + + /* Cleanup events not yet reported to the user. */ + mutex_lock(&ctx->file->mut); + ucma_cleanup_events(ctx); + list_del(&ctx->list); + mutex_unlock(&ctx->file->mut); + + events_reported = ctx->events_reported; + kfree(ctx); + return events_reported; +} + +static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_destroy_id cmd; + struct rdma_ucm_destroy_id_resp resp; + struct ucma_context *ctx; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&mut); + ctx = _ucma_find_context(cmd.id, file); + if (!IS_ERR(ctx)) + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&mut); + + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ucma_put_ctx(ctx); + wait_for_completion(&ctx->comp); + resp.events_reported = ucma_free_ctx(ctx); + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + + return ret; +} + +static ssize_t ucma_bind_addr(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_bind_addr cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_resolve_addr(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_resolve_addr cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr, + (struct sockaddr *) &cmd.dst_addr, + cmd.timeout_ms); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_resolve_route(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_resolve_route cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms); + ucma_put_ctx(ctx); + return ret; +} + +static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp, + struct rdma_route *route) +{ + struct rdma_dev_addr *dev_addr; + + resp->num_paths = route->num_paths; + switch (route->num_paths) { + case 0: + dev_addr = &route->addr.dev_addr; + ib_addr_get_dgid(dev_addr, + (union ib_gid *) &resp->ib_route[0].dgid); + ib_addr_get_sgid(dev_addr, + (union ib_gid *) &resp->ib_route[0].sgid); + resp->ib_route[0].pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); + break; + case 2: + ib_copy_path_rec_to_user(&resp->ib_route[1], + &route->path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user(&resp->ib_route[0], + &route->path_rec[0]); + break; + default: + break; + } +} + +static ssize_t ucma_query_route(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_query_route cmd; + struct rdma_ucm_query_route_resp resp; + struct ucma_context *ctx; + struct sockaddr *addr; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + memset(&resp, 0, sizeof resp); + addr = &ctx->cm_id->route.addr.src_addr; + memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + addr = &ctx->cm_id->route.addr.dst_addr; + memcpy(&resp.dst_addr, addr, addr->sa_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + if (!ctx->cm_id->device) + goto out; + + resp.node_guid = ctx->cm_id->device->node_guid; + resp.port_num = ctx->cm_id->port_num; + switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) { + case RDMA_TRANSPORT_IB: + ucma_copy_ib_route(&resp, &ctx->cm_id->route); + break; + default: + break; + } + +out: + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + + ucma_put_ctx(ctx); + return ret; +} + +static void ucma_copy_conn_param(struct rdma_conn_param *dst, + struct rdma_ucm_conn_param *src) +{ + dst->private_data = src->private_data; + dst->private_data_len = src->private_data_len; + dst->responder_resources =src->responder_resources; + dst->initiator_depth = src->initiator_depth; + dst->flow_control = src->flow_control; + dst->retry_count = src->retry_count; + dst->rnr_retry_count = src->rnr_retry_count; + dst->srq = src->srq; + dst->qp_num = src->qp_num; +} + +static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_connect cmd; + struct rdma_conn_param conn_param; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + if (!cmd.conn_param.valid) + return -EINVAL; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ucma_copy_conn_param(&conn_param, &cmd.conn_param); + ret = rdma_connect(ctx->cm_id, &conn_param); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_listen cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ctx->backlog = cmd.backlog > 0 && cmd.backlog < UCMA_MAX_BACKLOG ? + cmd.backlog : UCMA_MAX_BACKLOG; + ret = rdma_listen(ctx->cm_id, ctx->backlog); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_accept cmd; + struct rdma_conn_param conn_param; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + if (cmd.conn_param.valid) { + ctx->uid = cmd.uid; + ucma_copy_conn_param(&conn_param, &cmd.conn_param); + ret = rdma_accept(ctx->cm_id, &conn_param); + } else + ret = rdma_accept(ctx->cm_id, NULL); + + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_reject cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_disconnect cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_disconnect(ctx->cm_id); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_init_qp_attr(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_init_qp_attr cmd; + struct ib_uverbs_qp_attr resp; + struct ucma_context *ctx; + struct ib_qp_attr qp_attr; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + resp.qp_attr_mask = 0; + memset(&qp_attr, 0, sizeof qp_attr); + qp_attr.qp_state = cmd.qp_state; + ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask); + if (ret) + goto out; + + ib_copy_qp_attr_to_user(&resp, &qp_attr); + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + +out: + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_establish(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_establish cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_establish(ctx->cm_id); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_join_multicast(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_join_mcast cmd; + struct rdma_ucm_create_id_resp resp; + struct ucma_context *ctx; + struct ucma_multicast *mc; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + mutex_lock(&file->mut); + mc = ucma_alloc_multicast(ctx); + if (IS_ERR(mc)) { + ret = PTR_ERR(mc); + goto err1; + } + + mc->uid = cmd.uid; + memcpy(&mc->addr, &cmd.addr, sizeof cmd.addr); + ret = rdma_join_multicast(ctx->cm_id, &mc->addr, mc); + if (ret) + goto err2; + + resp.id = mc->id; + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) { + ret = -EFAULT; + goto err3; + } + + mutex_unlock(&file->mut); + ucma_put_ctx(ctx); + return 0; + +err3: + rdma_leave_multicast(ctx->cm_id, &mc->addr); + ucma_cleanup_mc_events(mc); +err2: + mutex_lock(&mut); + idr_remove(&multicast_idr, mc->id); + mutex_unlock(&mut); + list_del(&mc->list); + kfree(mc); +err1: + mutex_unlock(&file->mut); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_leave_multicast(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_destroy_id cmd; + struct rdma_ucm_destroy_id_resp resp; + struct ucma_multicast *mc; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&mut); + mc = idr_find(&multicast_idr, cmd.id); + if (!mc) + mc = ERR_PTR(-ENOENT); + else if (mc->ctx->file != file) + mc = ERR_PTR(-EINVAL); + else { + idr_remove(&multicast_idr, mc->id); + atomic_inc(&mc->ctx->ref); + } + mutex_unlock(&mut); + + if (IS_ERR(mc)) { + ret = PTR_ERR(mc); + goto out; + } + + rdma_leave_multicast(mc->ctx->cm_id, &mc->addr); + mutex_lock(&mc->ctx->file->mut); + ucma_cleanup_mc_events(mc); + list_del(&mc->list); + mutex_unlock(&mc->ctx->file->mut); + + ucma_put_ctx(mc->ctx); + kfree(mc); +out: + return ret; +} + +static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) = { + [RDMA_USER_CM_CMD_CREATE_ID] = ucma_create_id, + [RDMA_USER_CM_CMD_DESTROY_ID] = ucma_destroy_id, + [RDMA_USER_CM_CMD_BIND_ADDR] = ucma_bind_addr, + [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr, + [RDMA_USER_CM_CMD_RESOLVE_ROUTE]= ucma_resolve_route, + [RDMA_USER_CM_CMD_QUERY_ROUTE] = ucma_query_route, + [RDMA_USER_CM_CMD_CONNECT] = ucma_connect, + [RDMA_USER_CM_CMD_LISTEN] = ucma_listen, + [RDMA_USER_CM_CMD_ACCEPT] = ucma_accept, + [RDMA_USER_CM_CMD_REJECT] = ucma_reject, + [RDMA_USER_CM_CMD_DISCONNECT] = ucma_disconnect, + [RDMA_USER_CM_CMD_INIT_QP_ATTR] = ucma_init_qp_attr, + [RDMA_USER_CM_CMD_GET_EVENT] = ucma_get_event, + [RDMA_USER_CM_CMD_GET_OPTION] = NULL, + [RDMA_USER_CM_CMD_SET_OPTION] = NULL, + [RDMA_USER_CM_CMD_ESTABLISH] = ucma_establish, + [RDMA_USER_CM_CMD_JOIN_MCAST] = ucma_join_multicast, + [RDMA_USER_CM_CMD_LEAVE_MCAST] = ucma_leave_multicast, +}; + +static ssize_t ucma_write(struct file *filp, const char __user *buf, + size_t len, loff_t *pos) +{ + struct ucma_file *file = filp->private_data; + struct rdma_ucm_cmd_hdr hdr; + ssize_t ret; + + if (len < sizeof(hdr)) + return -EINVAL; + + if (copy_from_user(&hdr, buf, sizeof(hdr))) + return -EFAULT; + + if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(ucma_cmd_table)) + return -EINVAL; + + if (hdr.in + sizeof(hdr) > len) + return -EINVAL; + + if (!ucma_cmd_table[hdr.cmd]) + return -ENOSYS; + + ret = ucma_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out); + if (!ret) + ret = len; + + return ret; +} + +static unsigned int ucma_poll(struct file *filp, struct poll_table_struct *wait) +{ + struct ucma_file *file = filp->private_data; + unsigned int mask = 0; + + poll_wait(filp, &file->poll_wait, wait); + + if (!list_empty(&file->event_list)) + mask = POLLIN | POLLRDNORM; + + return mask; +} + +static int ucma_open(struct inode *inode, struct file *filp) +{ + struct ucma_file *file; + + file = kmalloc(sizeof *file, GFP_KERNEL); + if (!file) + return -ENOMEM; + + INIT_LIST_HEAD(&file->event_list); + INIT_LIST_HEAD(&file->ctx_list); + init_waitqueue_head(&file->poll_wait); + mutex_init(&file->mut); + + filp->private_data = file; + file->filp = filp; + return 0; +} + +static int ucma_close(struct inode *inode, struct file *filp) +{ + struct ucma_file *file = filp->private_data; + struct ucma_context *ctx, *tmp; + + mutex_lock(&file->mut); + list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { + mutex_unlock(&file->mut); + + mutex_lock(&mut); + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&mut); + + ucma_free_ctx(ctx); + mutex_lock(&file->mut); + } + mutex_unlock(&file->mut); + kfree(file); + return 0; +} + +static struct file_operations ucma_fops = { + .owner = THIS_MODULE, + .open = ucma_open, + .release = ucma_close, + .write = ucma_write, + .poll = ucma_poll, +}; + +static struct miscdevice ucma_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "rdma_cm", + .fops = &ucma_fops, +}; + +static ssize_t show_abi_version(struct class_device *class_dev, char *buf) +{ + return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); +} +static CLASS_DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); + +static int __init ucma_init(void) +{ + int ret; + + ret = misc_register(&ucma_misc); + if (ret) + return ret; + + ret = class_device_create_file(ucma_misc.class, + &class_device_attr_abi_version); + if (ret) { + printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n"); + goto err; + } + return 0; +err: + misc_deregister(&ucma_misc); + return ret; +} + +static void __exit ucma_cleanup(void) +{ + class_device_remove_file(ucma_misc.class, + &class_device_attr_abi_version); + misc_deregister(&ucma_misc); + idr_destroy(&ctx_idr); +} + +module_init(ucma_init); +module_exit(ucma_cleanup); diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c index ce46b13..5440da0 100644 --- a/drivers/infiniband/core/uverbs_marshall.c +++ b/drivers/infiniband/core/uverbs_marshall.c @@ -32,8 +32,8 @@ #include -static void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, - struct ib_ah_attr *src) +void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, + struct ib_ah_attr *src) { memcpy(dst->grh.dgid, src->grh.dgid.raw, sizeof src->grh.dgid); dst->grh.flow_label = src->grh.flow_label; @@ -47,6 +47,7 @@ static void ib_copy_ah_attr_to_user(stru dst->is_global = src->ah_flags & IB_AH_GRH ? 1 : 0; dst->port_num = src->port_num; } +EXPORT_SYMBOL(ib_copy_ah_attr_to_user); void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, struct ib_qp_attr *src) diff --git a/include/rdma/ib_marshall.h b/include/rdma/ib_marshall.h index 66bf4d7..db03720 100644 --- a/include/rdma/ib_marshall.h +++ b/include/rdma/ib_marshall.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -41,6 +41,9 @@ #include void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, struct ib_qp_attr *src); +void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, + struct ib_ah_attr *src); + void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst, struct ib_sa_path_rec *src); diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h new file mode 100755 index 0000000..5e76fb2 --- /dev/null +++ b/include/rdma/rdma_user_cm.h @@ -0,0 +1,214 @@ +/* + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef RDMA_USER_CM_H +#define RDMA_USER_CM_H + +#include +#include +#include +#include + +#define RDMA_USER_CM_ABI_VERSION 3 + +#define RDMA_MAX_PRIVATE_DATA 256 + +enum { + RDMA_USER_CM_CMD_CREATE_ID, + RDMA_USER_CM_CMD_DESTROY_ID, + RDMA_USER_CM_CMD_BIND_ADDR, + RDMA_USER_CM_CMD_RESOLVE_ADDR, + RDMA_USER_CM_CMD_RESOLVE_ROUTE, + RDMA_USER_CM_CMD_QUERY_ROUTE, + RDMA_USER_CM_CMD_CONNECT, + RDMA_USER_CM_CMD_LISTEN, + RDMA_USER_CM_CMD_ACCEPT, + RDMA_USER_CM_CMD_REJECT, + RDMA_USER_CM_CMD_DISCONNECT, + RDMA_USER_CM_CMD_INIT_QP_ATTR, + RDMA_USER_CM_CMD_GET_EVENT, + RDMA_USER_CM_CMD_GET_OPTION, + RDMA_USER_CM_CMD_SET_OPTION, + RDMA_USER_CM_CMD_ESTABLISH, + RDMA_USER_CM_CMD_JOIN_MCAST, + RDMA_USER_CM_CMD_LEAVE_MCAST +}; + +/* + * command ABI structures. + */ +struct rdma_ucm_cmd_hdr { + __u32 cmd; + __u16 in; + __u16 out; +}; + +struct rdma_ucm_create_id { + __u64 uid; + __u64 response; + __u16 ps; + __u8 reserved[6]; +}; + +struct rdma_ucm_create_id_resp { + __u32 id; +}; + +struct rdma_ucm_destroy_id { + __u64 response; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_destroy_id_resp { + __u32 events_reported; +}; + +struct rdma_ucm_bind_addr { + __u64 response; + struct sockaddr_in6 addr; + __u32 id; +}; + +struct rdma_ucm_resolve_addr { + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 id; + __u32 timeout_ms; +}; + +struct rdma_ucm_resolve_route { + __u32 id; + __u32 timeout_ms; +}; + +struct rdma_ucm_query_route { + __u64 response; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_query_route_resp { + __u64 node_guid; + struct ib_user_path_rec ib_route[2]; + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 num_paths; + __u8 port_num; + __u8 reserved[3]; +}; + +struct rdma_ucm_conn_param { + __u32 qp_num; + __u32 reserved; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + __u8 private_data_len; + __u8 srq; + __u8 responder_resources; + __u8 initiator_depth; + __u8 flow_control; + __u8 retry_count; + __u8 rnr_retry_count; + __u8 valid; +}; + +struct rdma_ucm_ud_param { + __u32 qp_num; + __u32 qkey; + struct ib_uverbs_ah_attr ah_attr; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + __u8 private_data_len; + __u8 reserved[7]; +}; + +struct rdma_ucm_connect { + struct rdma_ucm_conn_param conn_param; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_listen { + __u32 id; + __u32 backlog; +}; + +struct rdma_ucm_accept { + __u64 uid; + struct rdma_ucm_conn_param conn_param; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_reject { + __u32 id; + __u8 private_data_len; + __u8 reserved[3]; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; +}; + +struct rdma_ucm_disconnect { + __u32 id; +}; + +struct rdma_ucm_init_qp_attr { + __u64 response; + __u32 id; + __u32 qp_state; +}; + +struct rdma_ucm_establish { + __u32 id; +}; + +struct rdma_ucm_join_mcast { + __u64 response; /* rdma_ucm_create_id_resp */ + __u64 uid; + struct sockaddr_in6 addr; + __u32 id; +}; + +struct rdma_ucm_get_event { + __u64 response; +}; + +struct rdma_ucm_event_resp { + __u64 uid; + __u32 id; + __u32 event; + __u32 status; + union { + struct rdma_ucm_conn_param conn; + struct rdma_ucm_ud_param ud; + } param; +}; + +#endif /* RDMA_USER_CM_H */ From mst at mellanox.co.il Tue Oct 10 17:13:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 02:13:38 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061010104315.61540986@freekitty> References: <20061010104315.61540986@freekitty> Message-ID: <20061011001338.GA30093@mellanox.co.il> Quoting r. Stephen Hemminger : > > > > I'm trying to build a network device driver supporting a very large MTU > > > > (around 64K) on top of an infiniband connection, and I've hit a couple > > > > of issues I'd appreciate some feedback on: > > > > > > > > 1. On the send side, > > > > I've set NETIF_F_SG, but hardware does not support checksum > > > > offloading, and I see "dropping NETIF_F_SG since no checksum feature" > > > > warning, and I seem to be getting large packets all in one chunk. > > > > The reason I've set NETIF_F_SG, is because I'm concerned that under > > > > real life stress Linux won't be able to allocate 64K of continuous > > > > memory. > > > > > > > > Is this concern of mine valid? I saw in-tree drivers allocating at > > > > least 8K. What's the best way to enable S/G on send side? Is > > > > checksum offloading really required for S/G? > > > > > > Yes, in the current implementation, Linux needs checksum offload. But > > > there is no reason, your driver can't compute the checksum in software. > > > > > I'm worried whether an extra pass over data won't eat up all of > > the performance gains I get from the large MTU ... > > Yup, the cost is in touching the data, not in the copy. Maybe I can patch linux to allow SG without checksum? Dave, maybe you could drop a hint or two on whether this is worthwhile and what are the issues that need addressing to make this work? I imagine it's not just the matter of changing net/core/dev.c :). -- MST From rdreier at cisco.com Tue Oct 10 17:15:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 10 Oct 2006 17:15:26 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011001338.GA30093@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 11 Oct 2006 02:13:38 +0200") References: <20061010104315.61540986@freekitty> <20061011001338.GA30093@mellanox.co.il> Message-ID: Michael> Maybe I can patch linux to allow SG without checksum? Michael> Dave, maybe you could drop a hint or two on whether this Michael> is worthwhile and what are the issues that need Michael> addressing to make this work? What do you really gain by allowing SG without checksum? Someone has to do the checksum anyway, so I don't see that much difference between doing it in the networking core before passing the data to/from the driver, or down in the driver itself. - R. From mst at mellanox.co.il Tue Oct 10 17:26:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 02:26:56 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: References: Message-ID: <20061011002656.GB30093@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > Michael> Maybe I can patch linux to allow SG without checksum? > Michael> Dave, maybe you could drop a hint or two on whether this > Michael> is worthwhile and what are the issues that need > Michael> addressing to make this work? > > What do you really gain by allowing SG without checksum? Someone has > to do the checksum anyway, so I don't see that much difference between > doing it in the networking core before passing the data to/from the > driver, or down in the driver itself. My guess was, an extra pass over data is likely to be expensive - dirtying the cache if nothing else. But I do plan to measure that, and see. -- MST From sean.hefty at intel.com Tue Oct 10 17:26:01 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 17:26:01 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061010234933.GA29632@mellanox.co.il> Message-ID: <000a01c6eccb$d51e0b30$c0d4180a@amr.corp.intel.com> >It would make some sense to keep this API separate if the mcast module would >only be targeted at userspace users - kernel consumers likely >do not share mcast groups, so they could avoid the overhead. >But since this patch seems to move all kernels users to the new API to - >why does this need to be a separate module at all? I view the ib_sa API as simply sending the MAD to the SA and routing back the response. That functionality is needed, and is distinct enough, that I would keep it separate from the multicast module. Kernel consumers can only avoid the overhead if we never allow a userspace application to share the same multicast group. I don't think that we need to do that, as long as the app has the right permissions. The rdma_cm also uses the cached copy of the ipoib multicast group when forming new groups. >And I'm worried about the extensive use of atomic operations >this patch introduces - both performance and race-condition-wise. >Can't we stick to simple locking? Replacing completion with a single >BUSY bit looks especially scary. I reworked the locking for this 3-4 times before I came up with something that I felt was simple, yet could still do the job. The difficulty is that multiple threads can try to join the same group at once, each with different join parameters, while existing members leave it. Or a user can leave a group before their initial join request even completes. The latter is a problem even for a single user, such as ipoib, which it doesn't handle. I really don't think the performance of the code is as much an issue versus the time required to configure the fabric. >> - mutex_lock(&mcast_mutex); >> + /* Clear the busy flag so we try again */ >> + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); >> >> + mutex_lock(&mcast_mutex); >> spin_lock_irq(&priv->lock); >> - mcast->query = NULL; >> - >> - if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { >> - if (status == -ETIMEDOUT) >> - queue_work(ipoib_workqueue, &priv->mcast_task); >> - else >> - queue_delayed_work(ipoib_workqueue, &priv->mcast_task, >> - mcast->backoff * HZ); >> - } else >> - complete(&mcast->done); >> + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) >> + queue_delayed_work(ipoib_workqueue, &priv->mcast_task, >> + mcast->backoff * HZ); >> spin_unlock_irq(&priv->lock); >> mutex_unlock(&mcast_mutex); >> >> - return; >> + return status; >> } > >We used to do complete last thing on mcast object, now you are >touching the object after IPOIB_MCAST_FLAG_BUSY is cleared, apparently. The patch removes the need to check both mcast->query and a flag to determine state. It only uses the flag. Can you clarify what issue you see with the above code? >What prevents us from exiting while callbacks are in progress? >Basically same applies wherever we used to call wait_for_mcast_join. ib_free_multicast() blocks while any callbacks are running. >> +struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 >port_num, >> + struct ib_sa_mcmember_rec *rec, >> + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, >> + int (*callback)(int status, >> + struct ib_multicast >> + *multicast), >> + void *context); >> + > >Is this re-introducing module unload races we had with sa all over again? The call returns a structure that must be freed. If the structure is freed by returning a non-zero call to the callback, then we have the same problem that ib_cm and rdma_cm have. Not allowing a return value from the callback is an easy fix for that though. >I also started to wander why do we need a new API for this at all? >Can't the sa module be fixed to refcount the mcast joins properly for us, >with minor or no API changes? The difference is that this allows the free to match up with exactly one join request. This will be needed for userspace support. Additionally, the callback remains active beyond the initial join, so that the multicast module can notify the user when an errors occur on the multicast group that requires re-joining. - Sean From mst at mellanox.co.il Tue Oct 10 18:10:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 03:10:25 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000a01c6eccb$d51e0b30$c0d4180a@amr.corp.intel.com> References: <000a01c6eccb$d51e0b30$c0d4180a@amr.corp.intel.com> Message-ID: <20061011011025.GC30093@mellanox.co.il> Quoting r. Sean Hefty : > >And I'm worried about the extensive use of atomic operations > >this patch introduces - both performance and race-condition-wise. > >Can't we stick to simple locking? Replacing completion with a single > >BUSY bit looks especially scary. > > I reworked the locking for this 3-4 times before I came up with something that I > felt was simple, yet could still do the job. > felt was simple, yet could still do the job. The difficulty is that multiple > threads can try to join the same group at once, each with different join > parameters, while existing members leave it. No, I was only speaking about IPoIB part of the patch. I didn't look at atomic usage inside the new module yet. > Or a user can leave a group before > their initial join request even completes. The latter is a problem even for a > single user, such as ipoib, which it doesn't handle. Sorry, I don't follow. Could you please give a scenario where ipoib fails but multicast module works? > I really don't think the performance of the code is as much an issue versus the > time required to configure the fabric. You might be right. But I wander whether we'll regret it later that we switched to the slower generic thing when we already had a stable, streamlined version. Speaking of which - there seems to be some liner scans of outstanding requests which could be a scalability problem if there's a large number of these, isn't that right? > >> - mutex_lock(&mcast_mutex); > >> + /* Clear the busy flag so we try again */ > >> + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); > >> > >> + mutex_lock(&mcast_mutex); > >> spin_lock_irq(&priv->lock); > >> - mcast->query = NULL; > >> - > >> - if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { > >> - if (status == -ETIMEDOUT) > >> - queue_work(ipoib_workqueue, &priv->mcast_task); > >> - else > >> - queue_delayed_work(ipoib_workqueue, &priv->mcast_task, > >> - mcast->backoff * HZ); > >> - } else > >> - complete(&mcast->done); > >> + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) > >> + queue_delayed_work(ipoib_workqueue, &priv->mcast_task, > >> + mcast->backoff * HZ); > >> spin_unlock_irq(&priv->lock); > >> mutex_unlock(&mcast_mutex); > >> > >> - return; > >> + return status; > >> } > > > >We used to do complete last thing on mcast object, now you are > >touching the object after IPOIB_MCAST_FLAG_BUSY is cleared, apparently. > > The patch removes the need to check both mcast->query and a flag to determine > state. It only uses the flag. Can you clarify what issue you see with the > above code? We used to have a completion to signal no callbacks are running, and we set it last thing. Now there seems to be a window after BUSY is clear but you are still accessing mcast->backoff. No? > >What prevents us from exiting while callbacks are in progress? > >Basically same applies wherever we used to call wait_for_mcast_join. > > ib_free_multicast() blocks while any callbacks are running. I don't follow. ib_free_multicast seems to be called only when we leave all mcast groups. So it seems callbacks can now run after we do stop_thread which I think will lead to crashes. > >I also started to wander why do we need a new API for this at all? > >Can't the sa module be fixed to refcount the mcast joins properly for us, > >with minor or no API changes? > > The difference is that this allows the free to match up with exactly one join > request. This will be needed for userspace support. Additionally, the > callback remains active beyond the initial join, so that the multicast module > can notify the user when an errors occur on the multicast group that requires > re-joining. But I still don't understand - if everyone must use the refcounted API, why do we need a separate module and an exported API for something that is just an implementation detail? I also have the following question on the API: > +struct ib_multicast { > + struct ib_sa_mcmember_rec rec; > + ib_sa_comp_mask comp_mask; > + int (*callback)(int status, > + struct ib_multicast *multicast); > + void *context; > +}; Multiple distinct mcmember_rec queries could get us the same mcast group. Say MTU > 256 and MTU > 512 look different but actually will get same group in practice. Say 2 clients ask for these 2 queries. What will be in the ib_sa_mcmember_rec in this case? -- MST From davem at davemloft.net Tue Oct 10 19:15:47 2006 From: davem at davemloft.net (David Miller) Date: Tue, 10 Oct 2006 19:15:47 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011001338.GA30093@mellanox.co.il> References: <20061010104315.61540986@freekitty> <20061011001338.GA30093@mellanox.co.il> Message-ID: <20061010.191547.83619974.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Wed, 11 Oct 2006 02:13:38 +0200 > Maybe I can patch linux to allow SG without checksum? > Dave, maybe you could drop a hint or two on whether this is worthwhile > and what are the issues that need addressing to make this work? > > I imagine it's not just the matter of changing net/core/dev.c :). You can't, it's a quality of implementation issue. We sendfile() pages directly out of the filesystem page cache without any blocking of modifications to the page contents, and the only way that works is if the card computes the checksum for us. If we sendfile() a page directly, we must compute a correct checksum no matter what the contents. We can't do this on the cpu before the data hits the device because another thread of execution can go in and modify the page contents which would invalidate the checksum and thus invalidating the packet. We cannot allow this. Blocking modifications is too expensive, so that's not an option either. From rdreier at cisco.com Tue Oct 10 20:33:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 10 Oct 2006 20:33:46 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011002656.GB30093@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 11 Oct 2006 02:26:56 +0200") References: <20061011002656.GB30093@mellanox.co.il> Message-ID: Michael> My guess was, an extra pass over data is likely to be Michael> expensive - dirtying the cache if nothing else. But I do Michael> plan to measure that, and see. I don't get it -- where's the extra pass? If you can't compute the checksum on the NIC then you have to compute sometime it on the CPU before passing the data to the NIC. - R. From davem at davemloft.net Tue Oct 10 20:36:24 2006 From: davem at davemloft.net (David Miller) Date: Tue, 10 Oct 2006 20:36:24 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: References: <20061011002656.GB30093@mellanox.co.il> Message-ID: <20061010.203624.91207079.davem@davemloft.net> From: Roland Dreier Date: Tue, 10 Oct 2006 20:33:46 -0700 > Michael> My guess was, an extra pass over data is likely to be > Michael> expensive - dirtying the cache if nothing else. But I do > Michael> plan to measure that, and see. > > I don't get it -- where's the extra pass? If you can't compute the > checksum on the NIC then you have to compute sometime it on the CPU > before passing the data to the NIC. Also, if you don't do checksumming on the card we MUST copy the data (be it from a user buffer, or from a filesystem page cache page) into a private buffer since if the data changes the checksum would become invalid, as I mentioned in another email earlier. Therefore, since we have to copy anyways, it always is better to checksum in parallel with the copy. So the whole idea of SG without hw-checksum support is without much merit at all. From rdreier at cisco.com Tue Oct 10 20:42:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 10 Oct 2006 20:42:20 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061010.203624.91207079.davem@davemloft.net> (David Miller's message of "Tue, 10 Oct 2006 20:36:24 -0700 (PDT)") References: <20061011002656.GB30093@mellanox.co.il> <20061010.203624.91207079.davem@davemloft.net> Message-ID: David> Also, if you don't do checksumming on the card we MUST copy David> the data (be it from a user buffer, or from a filesystem David> page cache page) into a private buffer since if the data David> changes the checksum would become invalid, as I mentioned David> in another email earlier. Yes, I get that now -- I replied to Michael's email before I read yours. David> Therefore, since we have to copy anyways, it always is David> better to checksum in parallel with the copy. Yes. David> So the whole idea of SG without hw-checksum support is David> without much merit at all. Well, on IB it is possible to implement a netdevice (IPoIB connected mode, I assume that's what Michael is working on) with a large MTU (64KB is a number thrown around, but really there's not any limit) but no HW checksum capability. Doing that in a practical way means we need to allow non-linear skbs to be passed in. On the other hand I'm not sure how useful such a netdevice would be -- will non-sendfile() paths generate big packets even if the MTU is 64KB? Maybe GSO gives us all the real advantages of this anyway? - R. From davem at davemloft.net Tue Oct 10 20:45:28 2006 From: davem at davemloft.net (David Miller) Date: Tue, 10 Oct 2006 20:45:28 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: References: <20061010.203624.91207079.davem@davemloft.net> Message-ID: <20061010.204528.90823856.davem@davemloft.net> From: Roland Dreier Date: Tue, 10 Oct 2006 20:42:20 -0700 > On the other hand I'm not sure how useful such a netdevice would be -- > will non-sendfile() paths generate big packets even if the MTU is 64KB? non-sendfile() paths will generate big packets just fine, as long as the application is providing that much data. From rdreier at cisco.com Tue Oct 10 20:49:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 10 Oct 2006 20:49:09 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061010.204528.90823856.davem@davemloft.net> (David Miller's message of "Tue, 10 Oct 2006 20:45:28 -0700 (PDT)") References: <20061010.203624.91207079.davem@davemloft.net> <20061010.204528.90823856.davem@davemloft.net> Message-ID: David> non-sendfile() paths will generate big packets just fine, David> as long as the application is providing that much data. OK, cool. Will the big packets be non-linear skbs? Because then it would make sense for a device with a huge MTU to want to accept them without linearizing them, even if it had to copy them to checksum the data. Otherwise with fragmented memory it would be impossible to handle such big packets at all. - R. From davem at davemloft.net Tue Oct 10 20:50:49 2006 From: davem at davemloft.net (David Miller) Date: Tue, 10 Oct 2006 20:50:49 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: References: <20061010.204528.90823856.davem@davemloft.net> Message-ID: <20061010.205049.132929620.davem@davemloft.net> From: Roland Dreier Date: Tue, 10 Oct 2006 20:49:09 -0700 > David> non-sendfile() paths will generate big packets just fine, > David> as long as the application is providing that much data. > > OK, cool. Will the big packets be non-linear skbs? If you had SG enabled (and thus checksumming offload too) then yes you'll get a non-linear SKB. Otherwise, without SG, you'll get a fully linear SKB. From sean.hefty at intel.com Tue Oct 10 21:59:17 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 10 Oct 2006 21:59:17 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011011025.GC30093@mellanox.co.il> Message-ID: <000001c6ecf2$017874b0$9dd0180a@amr.corp.intel.com> >No, I was only speaking about IPoIB part of the patch. >I didn't look at atomic usage inside the new module yet. I'm not clear on what you code you're referring to. No new atomics or locks were added to ipoib. I just tried to rely solely on the flags to determine state. >You might be right. But I wander whether we'll regret it later that >we switched to the slower generic thing when we already had a stable, >streamlined version. To be direct, I'm not sure that I'd call the ipoib multicast either streamlined or stable. A fix against it just recently went by, and given the complexity of its use of bit flags, thread, mutex, locks, and pointers to track state, the code is fairly difficult to follow, so I'm not surprised that it's taken a while to stabilize. That doesn't mean that these patches are bug free, but filtering patches simply on the basis of whether or not they change ipoib doesn't seem like the right approach to take. By using the ib_multicast, we eliminate about 50 lines of code from ipoib. >Speaking of which - there seems to be some liner scans of outstanding >requests which could be a scalability problem if there's a >large number of these, isn't that right? I'm really not expecting a large number of outstanding requests, but can you reference the lines in the patch where there's a linear scan? >I don't follow. ib_free_multicast seems to be called only when we >leave all mcast groups. So it seems callbacks can now run >after we do stop_thread which I think will lead to crashes. The multicast groups are freed during dev_flush(), which does occur after stop_thread has done its cleanup. The callbacks use state checks (against the flags) to avoid queuing work when it shouldn't. Is there a more specific problem that you see? >But I still don't understand - if everyone must use the refcounted API, >why do we need a separate module and an exported API for something >that is just an implementation detail? The existing API does not distinguish which leave request matches which join request. This matters to the user because the context for joins will be different. The most sensible way to handle this is to give the user a handle back for their join request, which is what this API does. The ib_sa is basically wrappers around send_mad with an address handle to the SA. It performs a single purpose, and I think it makes more sense to keep it to its original intent, since somewhere, something actually has to send and receive MADs from the SA. We could just as easily argue that the multicast handling should be buried under the ib_mad interface, since that's actually exposed to userspace, but I don't think that that's the right approach either. When a join request completes, the user is not guaranteed to get a MAD, unless one is artificially generated. >Multiple distinct mcmember_rec queries could get us the same mcast group. >Say MTU > 256 and MTU > 512 look different but actually will get >same group in practice. > >Say 2 clients ask for these 2 queries. >What will be in the ib_sa_mcmember_rec in this case? The module currently does not handle queries in as complex a way as the SA. The current matching is limited to equality comparisons against the local mcmember record. (See cmp_rec() in multicast.c.) If there's a need to expand the comparisons, that can be done. - Sean From ardavis at ichips.intel.com Tue Oct 10 22:50:41 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 10 Oct 2006 22:50:41 -0700 Subject: [openib-general] OFED 1.1 RC7 In-Reply-To: <4525271E.8070000@dev.mellanox.co.il> References: <4525271E.8070000@dev.mellanox.co.il> Message-ID: <452C8631.7040200@ichips.intel.com> Aviram Gutman wrote: >OFED-1.1-rc7 is available on >https://openib.org/svn/gen2/branches/1.1/ofed/releases/ >File: OFED-1.1-rc7.tgz >Please report any issues in bugzilla http://openib.org/bugzilla/ > > > Aviram, Can you verify that the sean_cm_drep_on_not_found.patch is actually applied in RC7? Our delayed disconnect problems still exist. I don't see the new symbol "cm_issue_drep" in ib_cm.ko on our RC7 installed systems so I don't think the patch applied. Thanks, -arlin From eitan at mellanox.co.il Wed Oct 11 00:33:12 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 09:33:12 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> References: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> Message-ID: <452C9E38.8080607@mellanox.co.il> Hi I also agree ref-counting of multicast membership is required. However, inventing a new API is not required and actually harmful as it delays availability by requiring every application out there to migrate to yet another API. Instead it would be preferable to implement this as part of mad.c such that every SA MCMemberRecord is being counted. As we enforce all QP1 traffic to go through mad.c I do not see why inventing new API is the right way to go. Note that an application that wish to use dedicated UD QP for performing the Join/Leave can still do so - bypassing the new API and API (and mad.c all together). However, a simple "non-compliant but safer" mode of the SM/SA could be made to avoid serving such requests as they do not come from QP1. EZ Sean Hefty wrote: >The IB SA tracks multicast join / leave requests on a per port basis. >In order to support multiple users of the same multicast group from >the same port, we need to perform local reference counting on each >of the nodes. > >Add an ib_multicast module to perform reference counting of multicast >join / leave requests. Modify ib_ipoib to use the multicast module. > >Signed-off-by: Sean Hefty >--- >diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile >index 163d991..76cc988 100644 >--- a/drivers/infiniband/core/Makefile >+++ b/drivers/infiniband/core/Makefile >@@ -1,6 +1,7 @@ > infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o > > obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ >+ ib_multicast.o \ > ib_cm.o iw_cm.o $(infiniband-y) > obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o > obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o >@@ -12,6 +13,8 @@ ib_mad-y := mad.o smi.o agent.o mad_rm > > ib_sa-y := sa_query.o > >+ib_multicast-y := multicast.o >+ > ib_cm-y := cm.o > > iw_cm-y := iwcm.o >diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c >new file mode 100755 >index 0000000..e7204b4 >--- /dev/null >+++ b/drivers/infiniband/core/multicast.c >@@ -0,0 +1,795 @@ >+/* >+ * Copyright (c) 2006 Intel Corporation. All rights reserved. >+ * >+ * This software is available to you under a choice of one of two >+ * licenses. You may choose to be licensed under the terms of the GNU >+ * General Public License (GPL) Version 2, available from the file >+ * COPYING in the main directory of this source tree, or the >+ * OpenIB.org BSD license below: >+ * >+ * Redistribution and use in source and binary forms, with or >+ * without modification, are permitted provided that the following >+ * conditions are met: >+ * >+ * - Redistributions of source code must retain the above >+ * copyright notice, this list of conditions and the following >+ * disclaimer. >+ * >+ * - Redistributions in binary form must reproduce the above >+ * copyright notice, this list of conditions and the following >+ * disclaimer in the documentation and/or other materials >+ * provided with the distribution. >+ * >+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND >+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS >+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN >+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN >+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE >+ * SOFTWARE. >+ */ >+ >+#include >+#include >+#include >+#include >+#include >+#include >+#include >+ >+#include >+#include >+ >+MODULE_AUTHOR("Sean Hefty"); >+MODULE_DESCRIPTION("InfiniBand multicast membership handling"); >+MODULE_LICENSE("Dual BSD/GPL"); >+ >+static int retry_timer = 5000; /* 5 sec */ >+module_param(retry_timer, int, 0444); >+MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests."); >+ >+static int retries = 3; >+module_param(retries, int, 0444); >+MODULE_PARM_DESC(retries, "Number of times to retry a request."); >+ >+static void mcast_add_one(struct ib_device *device); >+static void mcast_remove_one(struct ib_device *device); >+ >+static struct ib_client mcast_client = { >+ .name = "ib_multicast", >+ .add = mcast_add_one, >+ .remove = mcast_remove_one >+}; >+ >+static struct ib_sa_client sa_client; >+static struct ib_event_handler event_handler; >+static struct workqueue_struct *mcast_wq; >+static union ib_gid mgid0; >+ >+struct mcast_device; >+ >+struct mcast_port { >+ struct mcast_device *dev; >+ spinlock_t lock; >+ struct rb_root table; >+ atomic_t refcount; >+ struct completion comp; >+ u8 port_num; >+}; >+ >+struct mcast_device { >+ struct ib_device *device; >+ int start_port; >+ int end_port; >+ struct mcast_port port[0]; >+}; >+ >+enum mcast_state { >+ MCAST_IDLE, >+ MCAST_JOINING, >+ MCAST_MEMBER, >+ MCAST_BUSY, >+ MCAST_ERROR >+}; >+ >+struct mcast_member; >+ >+struct mcast_group { >+ struct ib_sa_mcmember_rec rec; >+ struct rb_node node; >+ struct mcast_port *port; >+ spinlock_t lock; >+ struct work_struct work; >+ struct list_head pending_list; >+ struct list_head active_list; >+ struct mcast_member *last_join; >+ int members[3]; >+ atomic_t refcount; >+ enum mcast_state state; >+ struct ib_sa_query *query; >+ int query_id; >+}; >+ >+struct mcast_member { >+ struct ib_multicast multicast; >+ struct mcast_group *group; >+ struct list_head list; >+ enum mcast_state state; >+ atomic_t refcount; >+ struct completion comp; >+}; >+ >+static void join_handler(int status, struct ib_sa_mcmember_rec *rec, >+ void *context); >+static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, >+ void *context); >+ >+static struct mcast_group *mcast_find(struct mcast_port *port, >+ union ib_gid *mgid) >+{ >+ struct rb_node *node = port->table.rb_node; >+ struct mcast_group *group; >+ int ret; >+ >+ while (node) { >+ group = rb_entry(node, struct mcast_group, node); >+ ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid); >+ if (!ret) >+ return group; >+ >+ if (ret < 0) >+ node = node->rb_left; >+ else >+ node = node->rb_right; >+ } >+ return NULL; >+} >+ >+static struct mcast_group *mcast_insert(struct mcast_port *port, >+ struct mcast_group *group, >+ int allow_duplicates) >+{ >+ struct rb_node **link = &port->table.rb_node; >+ struct rb_node *parent = NULL; >+ struct mcast_group *cur_group; >+ int ret; >+ >+ while (*link) { >+ parent = *link; >+ cur_group = rb_entry(parent, struct mcast_group, node); >+ >+ ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw, >+ sizeof group->rec.mgid); >+ if (ret < 0) >+ link = &(*link)->rb_left; >+ else if (ret > 0) >+ link = &(*link)->rb_right; >+ else if (allow_duplicates) >+ link = &(*link)->rb_left; >+ else >+ return cur_group; >+ } >+ rb_link_node(&group->node, parent, link); >+ rb_insert_color(&group->node, &port->table); >+ return NULL; >+} >+ >+static void deref_port(struct mcast_port *port) >+{ >+ if (atomic_dec_and_test(&port->refcount)) >+ complete(&port->comp); >+} >+ >+static void release_group(struct mcast_group *group) >+{ >+ struct mcast_port *port = group->port; >+ unsigned long flags; >+ >+ spin_lock_irqsave(&port->lock, flags); >+ if (atomic_dec_and_test(&group->refcount)) { >+ rb_erase(&group->node, &port->table); >+ spin_unlock_irqrestore(&port->lock, flags); >+ kfree(group); >+ deref_port(port); >+ } else >+ spin_unlock_irqrestore(&port->lock, flags); >+} >+ >+static void deref_member(struct mcast_member *member) >+{ >+ if (atomic_dec_and_test(&member->refcount)) >+ complete(&member->comp); >+} >+ >+static void queue_join(struct mcast_member *member) >+{ >+ struct mcast_group *group = member->group; >+ unsigned long flags; >+ >+ spin_lock_irqsave(&group->lock, flags); >+ list_add(&member->list, &group->pending_list); >+ if (group->state == MCAST_IDLE) { >+ group->state = MCAST_BUSY; >+ atomic_inc(&group->refcount); >+ queue_work(mcast_wq, &group->work); >+ } >+ spin_unlock_irqrestore(&group->lock, flags); >+} >+ >+/* >+ * A multicast group has three types of members: full member, non member, and >+ * send only member. We need to keep track of the number of members of each >+ * type based on their join state. Adjust the number of members the belong to >+ * the specified join states. >+ */ >+static void adjust_membership(struct mcast_group *group, u8 join_state, int inc) >+{ >+ int i; >+ >+ for (i = 0; i < 3; i++, join_state >>= 1) >+ if (join_state & 0x1) >+ group->members[i] += inc; >+} >+ >+/* >+ * If a multicast group has zero members left for a particular join state, but >+ * the group is still a member with the SA, we need to leave that join state. >+ * Determine which join states we still belong to, but that do not have any >+ * active members. >+ */ >+static u8 get_leave_state(struct mcast_group *group) >+{ >+ u8 leave_state = 0; >+ int i; >+ >+ for (i = 0; i < 3; i++) >+ if (!group->members[i]) >+ leave_state |= (0x1 << i); >+ >+ return leave_state & group->rec.join_state; >+} >+ >+static int cmp_rec(struct ib_sa_mcmember_rec *src, >+ struct ib_sa_mcmember_rec *dst, ib_sa_comp_mask comp_mask) >+{ >+ /* MGID must already match */ >+ >+ if (comp_mask & IB_SA_MCMEMBER_REC_PORT_GID && >+ memcmp(&src->port_gid, &dst->port_gid, sizeof src->port_gid)) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_QKEY && src->qkey != dst->qkey) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_MLID && src->mlid != dst->mlid) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_MTU_SELECTOR && >+ src->mtu_selector != dst->mtu_selector) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_MTU && src->mtu != dst->mtu) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_TRAFFIC_CLASS && >+ src->traffic_class != dst->traffic_class) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_PKEY && src->pkey != dst->pkey) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_RATE_SELECTOR && >+ src->rate_selector != dst->rate_selector) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_RATE && src->rate != dst->rate) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME_SELECTOR && >+ src->packet_life_time_selector != dst->packet_life_time_selector) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME && >+ src->packet_life_time != dst->packet_life_time) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_SL && src->sl != dst->sl) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_FLOW_LABEL && >+ src->flow_label != dst->flow_label) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_HOP_LIMIT && >+ src->hop_limit != dst->hop_limit) >+ return -EINVAL; >+ if (comp_mask & IB_SA_MCMEMBER_REC_SCOPE && src->scope != dst->scope) >+ return -EINVAL; >+ >+ /* join_state checked separately, proxy_join ignored */ >+ >+ return 0; >+} >+ >+static int send_join(struct mcast_group *group, struct mcast_member *member) >+{ >+ struct mcast_port *port = group->port; >+ int ret; >+ >+ group->last_join = member; >+ ret = ib_sa_mcmember_rec_set(&sa_client, port->dev->device, >+ port->port_num, &member->multicast.rec, >+ member->multicast.comp_mask, >+ retry_timer, retries, GFP_KERNEL, >+ join_handler, group, &group->query); >+ if (ret >= 0) { >+ group->query_id = ret; >+ ret = 0; >+ } >+ return ret; >+} >+ >+static int send_leave(struct mcast_group *group, u8 leave_state) >+{ >+ struct mcast_port *port = group->port; >+ struct ib_sa_mcmember_rec rec; >+ int ret; >+ >+ rec = group->rec; >+ rec.join_state = leave_state; >+ >+ ret = ib_sa_mcmember_rec_delete(&sa_client, port->dev->device, >+ port->port_num, &rec, >+ IB_SA_MCMEMBER_REC_MGID | >+ IB_SA_MCMEMBER_REC_PORT_GID | >+ IB_SA_MCMEMBER_REC_JOIN_STATE, >+ retry_timer, retries, GFP_KERNEL, >+ leave_handler, group, &group->query); >+ if (ret >= 0) { >+ group->query_id = ret; >+ ret = 0; >+ } >+ return ret; >+} >+ >+static void join_group(struct mcast_group *group, struct mcast_member *member, >+ u8 join_state) >+{ >+ member->state = MCAST_MEMBER; >+ adjust_membership(group, join_state, 1); >+ group->rec.join_state |= join_state; >+ member->multicast.rec = group->rec; >+ member->multicast.rec.join_state = join_state; >+ list_del(&member->list); >+ list_add(&member->list, &group->active_list); >+} >+ >+static int fail_join(struct mcast_group *group, struct mcast_member *member, >+ int status) >+{ >+ spin_lock_irq(&group->lock); >+ list_del_init(&member->list); >+ spin_unlock_irq(&group->lock); >+ return member->multicast.callback(status, &member->multicast); >+} >+ >+static void process_group_error(struct mcast_group *group) >+{ >+ struct mcast_member *member; >+ int ret; >+ >+ spin_lock_irq(&group->lock); >+ while (!list_empty(&group->active_list)) { >+ member = list_entry(group->active_list.next, >+ struct mcast_member, list); >+ atomic_inc(&member->refcount); >+ list_del_init(&member->list); >+ adjust_membership(group, member->multicast.rec.join_state, -1); >+ member->state = MCAST_ERROR; >+ spin_unlock_irq(&group->lock); >+ >+ ret = member->multicast.callback(-ENETRESET, >+ &member->multicast); >+ deref_member(member); >+ if (ret) >+ ib_free_multicast(&member->multicast); >+ spin_lock_irq(&group->lock); >+ } >+ >+ group->rec.join_state = 0; >+ group->state = MCAST_BUSY; >+ spin_unlock_irq(&group->lock); >+} >+ >+static void mcast_work_handler(void *data) >+{ >+ struct mcast_group *group = data; >+ struct mcast_member *member; >+ struct ib_multicast *multicast; >+ int status, ret; >+ u8 join_state; >+ >+retest: >+ spin_lock_irq(&group->lock); >+ if (group->state == MCAST_ERROR) { >+ spin_unlock_irq(&group->lock); >+ process_group_error(group); >+ goto retest; >+ } >+ >+ while (!list_empty(&group->pending_list)) { >+ member = list_entry(group->pending_list.next, >+ struct mcast_member, list); >+ multicast = &member->multicast; >+ join_state = multicast->rec.join_state; >+ atomic_inc(&member->refcount); >+ >+ if (join_state == (group->rec.join_state & join_state)) { >+ status = cmp_rec(&group->rec, &multicast->rec, >+ multicast->comp_mask); >+ if (!status) >+ join_group(group, member, join_state); >+ else >+ list_del_init(&member->list); >+ spin_unlock_irq(&group->lock); >+ ret = multicast->callback(status, multicast); >+ } else { >+ spin_unlock_irq(&group->lock); >+ status = send_join(group, member); >+ if (!status) { >+ deref_member(member); >+ return; >+ } >+ ret = fail_join(group, member, status); >+ } >+ >+ deref_member(member); >+ if (ret) >+ ib_free_multicast(&member->multicast); >+ spin_lock_irq(&group->lock); >+ } >+ >+ join_state = get_leave_state(group); >+ if (join_state) { >+ group->rec.join_state &= ~join_state; >+ spin_unlock_irq(&group->lock); >+ if (send_leave(group, join_state)) >+ goto retest; >+ } else { >+ group->state = MCAST_IDLE; >+ spin_unlock_irq(&group->lock); >+ release_group(group); >+ } >+} >+ >+/* >+ * Fail a join request if it is still active - at the head of the pending queue. >+ */ >+static void process_join_error(struct mcast_group *group, int status) >+{ >+ struct mcast_member *member; >+ int ret; >+ >+ spin_lock_irq(&group->lock); >+ member = list_entry(group->pending_list.next, >+ struct mcast_member, list); >+ if (group->last_join == member) { >+ atomic_inc(&member->refcount); >+ list_del_init(&member->list); >+ spin_unlock_irq(&group->lock); >+ ret = member->multicast.callback(status, &member->multicast); >+ deref_member(member); >+ if (ret) >+ ib_free_multicast(&member->multicast); >+ } else >+ spin_unlock_irq(&group->lock); >+} >+ >+static void join_handler(int status, struct ib_sa_mcmember_rec *rec, >+ void *context) >+{ >+ struct mcast_group *group = context; >+ >+ if (status) >+ process_join_error(group, status); >+ else { >+ spin_lock_irq(&group->port->lock); >+ group->rec = *rec; >+ if (!memcmp(&mgid0, &group->rec.mgid, sizeof mgid0)) { >+ rb_erase(&group->node, &group->port->table); >+ mcast_insert(group->port, group, 1); >+ } >+ spin_unlock_irq(&group->port->lock); >+ } >+ mcast_work_handler(group); >+} >+ >+static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, >+ void *context) >+{ >+ mcast_work_handler(context); >+} >+ >+static struct mcast_group *acquire_group(struct mcast_port *port, >+ union ib_gid *mgid, gfp_t gfp_mask) >+{ >+ struct mcast_group *group, *cur_group; >+ unsigned long flags; >+ int is_mgid0; >+ >+ is_mgid0 = !memcmp(&mgid0, mgid, sizeof mgid0); >+ if (!is_mgid0) { >+ spin_lock_irqsave(&port->lock, flags); >+ group = mcast_find(port, mgid); >+ if (group) >+ goto found; >+ spin_unlock_irqrestore(&port->lock, flags); >+ } >+ >+ group = kzalloc(sizeof *group, gfp_mask); >+ if (!group) >+ return NULL; >+ >+ group->port = port; >+ group->rec.mgid = *mgid; >+ INIT_LIST_HEAD(&group->pending_list); >+ INIT_LIST_HEAD(&group->active_list); >+ INIT_WORK(&group->work, mcast_work_handler, group); >+ spin_lock_init(&group->lock); >+ >+ spin_lock_irqsave(&port->lock, flags); >+ cur_group = mcast_insert(port, group, is_mgid0); >+ if (cur_group) { >+ kfree(group); >+ group = cur_group; >+ } else >+ atomic_inc(&port->refcount); >+found: >+ atomic_inc(&group->refcount); >+ spin_unlock_irqrestore(&port->lock, flags); >+ return group; >+} >+ >+/* >+ * We serialize all join requests to a single group to make our lives much >+ * easier. Otherwise, two users could try to join the same group >+ * simultaneously, with different configurations, one could leave while the >+ * join is in progress, etc., which makes locking around error recovery >+ * difficult. >+ */ >+struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, >+ struct ib_sa_mcmember_rec *rec, >+ ib_sa_comp_mask comp_mask, gfp_t gfp_mask, >+ int (*callback)(int status, >+ struct ib_multicast >+ *multicast), >+ void *context) >+{ >+ struct mcast_device *dev; >+ struct mcast_member *member; >+ struct ib_multicast *multicast; >+ int ret; >+ >+ dev = ib_get_client_data(device, &mcast_client); >+ if (!dev) >+ return ERR_PTR(-ENODEV); >+ >+ member = kzalloc(sizeof *member, gfp_mask); >+ if (!member) >+ return ERR_PTR(-ENOMEM); >+ >+ member->multicast.rec = *rec; >+ member->multicast.comp_mask = comp_mask; >+ member->multicast.callback = callback; >+ member->multicast.context = context; >+ init_completion(&member->comp); >+ atomic_set(&member->refcount, 1); >+ member->state = MCAST_JOINING; >+ >+ member->group = acquire_group(&dev->port[port_num - dev->start_port], >+ &rec->mgid, gfp_mask); >+ if (!member->group) { >+ ret = -ENOMEM; >+ goto err; >+ } >+ >+ /* >+ * The user will get the multicast structure in their callback. They >+ * could then free the multicast structure before we can return from >+ * this routine. So we save the pointer to return before queuing >+ * any callback. >+ */ >+ multicast = &member->multicast; >+ queue_join(member); >+ return multicast; >+ >+err: >+ kfree(member); >+ return ERR_PTR(ret); >+} >+EXPORT_SYMBOL(ib_join_multicast); >+ >+void ib_free_multicast(struct ib_multicast *multicast) >+{ >+ struct mcast_member *member; >+ struct mcast_group *group; >+ >+ member = container_of(multicast, struct mcast_member, multicast); >+ group = member->group; >+ >+ spin_lock_irq(&group->lock); >+ if (member->state == MCAST_MEMBER) >+ adjust_membership(group, multicast->rec.join_state, -1); >+ >+ list_del_init(&member->list); >+ >+ if (group->state == MCAST_IDLE) { >+ group->state = MCAST_BUSY; >+ spin_unlock_irq(&group->lock); >+ /* Continue to hold reference on group until callback */ >+ queue_work(mcast_wq, &group->work); >+ } else { >+ spin_unlock_irq(&group->lock); >+ release_group(group); >+ } >+ >+ deref_member(member); >+ wait_for_completion(&member->comp); >+ kfree(member); >+} >+EXPORT_SYMBOL(ib_free_multicast); >+ >+int ib_get_mcmember_rec(struct ib_device *device, u8 port_num, >+ union ib_gid *mgid, struct ib_sa_mcmember_rec *rec) >+{ >+ struct mcast_device *dev; >+ struct mcast_port *port; >+ struct mcast_group *group; >+ unsigned long flags; >+ int ret = 0; >+ >+ dev = ib_get_client_data(device, &mcast_client); >+ if (!dev) >+ return -ENODEV; >+ >+ port = &dev->port[port_num - dev->start_port]; >+ if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) { >+ spin_lock_irqsave(&port->lock, flags); >+ group = mcast_find(port, mgid); >+ if (group) >+ *rec = group->rec; >+ else >+ ret = -EADDRNOTAVAIL; >+ spin_unlock_irqrestore(&port->lock, flags); >+ } else { >+ memset(rec, 0, sizeof *rec); >+ ib_get_cached_gid(device, port_num, 0, &rec->port_gid); >+ rec->pkey = 0xFFFF; >+ get_random_bytes(&rec->qkey, sizeof rec->qkey); >+ rec->join_state = 1; >+ } >+ >+ return ret; >+} >+EXPORT_SYMBOL(ib_get_mcmember_rec); >+ >+static void mcast_groups_lost(struct mcast_port *port) >+{ >+ struct mcast_group *group; >+ struct rb_node *node; >+ unsigned long flags; >+ >+ spin_lock_irqsave(&port->lock, flags); >+ for (node = rb_first(&port->table); node; node = rb_next(node)) { >+ group = rb_entry(node, struct mcast_group, node); >+ spin_lock(&group->lock); >+ if (group->state == MCAST_IDLE) { >+ atomic_inc(&group->refcount); >+ queue_work(mcast_wq, &group->work); >+ } >+ group->state = MCAST_ERROR; >+ spin_unlock(&group->lock); >+ } >+ spin_unlock_irqrestore(&port->lock, flags); >+} >+ >+static void mcast_event_handler(struct ib_event_handler *handler, >+ struct ib_event *event) >+{ >+ struct mcast_device *dev; >+ >+ dev = ib_get_client_data(event->device, &mcast_client); >+ if (!dev) >+ return; >+ >+ switch (event->event) { >+ case IB_EVENT_PORT_ERR: >+ case IB_EVENT_LID_CHANGE: >+ case IB_EVENT_SM_CHANGE: >+ case IB_EVENT_CLIENT_REREGISTER: >+ mcast_groups_lost(&dev->port[event->element.port_num - >+ dev->start_port]); >+ break; >+ default: >+ break; >+ } >+} >+ >+static void mcast_add_one(struct ib_device *device) >+{ >+ struct mcast_device *dev; >+ struct mcast_port *port; >+ int i; >+ >+ if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) >+ return; >+ >+ dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port, >+ GFP_KERNEL); >+ if (!dev) >+ return; >+ >+ if (device->node_type == RDMA_NODE_IB_SWITCH) >+ dev->start_port = dev->end_port = 0; >+ else { >+ dev->start_port = 1; >+ dev->end_port = device->phys_port_cnt; >+ } >+ >+ for (i = 0; i <= dev->end_port - dev->start_port; i++) { >+ port = &dev->port[i]; >+ port->dev = dev; >+ port->port_num = dev->start_port + i; >+ spin_lock_init(&port->lock); >+ port->table = RB_ROOT; >+ init_completion(&port->comp); >+ atomic_set(&port->refcount, 1); >+ } >+ >+ dev->device = device; >+ ib_set_client_data(device, &mcast_client, dev); >+ >+ INIT_IB_EVENT_HANDLER(&event_handler, device, mcast_event_handler); >+ ib_register_event_handler(&event_handler); >+} >+ >+static void mcast_remove_one(struct ib_device *device) >+{ >+ struct mcast_device *dev; >+ struct mcast_port *port; >+ int i; >+ >+ dev = ib_get_client_data(device, &mcast_client); >+ if (!dev) >+ return; >+ >+ ib_unregister_event_handler(&event_handler); >+ flush_workqueue(mcast_wq); >+ >+ for (i = 0; i < dev->end_port - dev->start_port; i++) { >+ port = &dev->port[i]; >+ deref_port(port); >+ wait_for_completion(&port->comp); >+ } >+ >+ kfree(dev); >+} >+ >+static int __init mcast_init(void) >+{ >+ int ret; >+ >+ mcast_wq = create_singlethread_workqueue("ib_mcast_wq"); >+ if (!mcast_wq) >+ return -ENOMEM; >+ >+ ib_sa_register_client(&sa_client); >+ >+ ret = ib_register_client(&mcast_client); >+ if (ret) >+ goto err; >+ return 0; >+ >+err: >+ ib_sa_unregister_client(&sa_client); >+ destroy_workqueue(mcast_wq); >+ return ret; >+} >+ >+static void __exit mcast_cleanup(void) >+{ >+ ib_unregister_client(&mcast_client); >+ ib_sa_unregister_client(&sa_client); >+ destroy_workqueue(mcast_wq); >+} >+ >+module_init(mcast_init); >+module_exit(mcast_cleanup); >diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c >b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c >index 3faa182..b993cb1 100644 >--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c >+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c >@@ -45,6 +45,8 @@ #include > > #include > >+#include >+ > #include "ipoib.h" > > #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG >@@ -60,14 +62,11 @@ static DEFINE_MUTEX(mcast_mutex); > /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */ > struct ipoib_mcast { > struct ib_sa_mcmember_rec mcmember; >+ struct ib_multicast *mc; > struct ipoib_ah *ah; > > struct rb_node rb_node; > struct list_head list; >- struct completion done; >- >- int query_id; >- struct ib_sa_query *query; > > unsigned long created; > unsigned long backoff; >@@ -299,18 +298,22 @@ static int ipoib_mcast_join_finish(struc > return 0; > } > >-static void >+static int > ipoib_mcast_sendonly_join_complete(int status, >- struct ib_sa_mcmember_rec *mcmember, >- void *mcast_ptr) >+ struct ib_multicast *multicast) > { >- struct ipoib_mcast *mcast = mcast_ptr; >+ struct ipoib_mcast *mcast = multicast->context; > struct net_device *dev = mcast->dev; > struct ipoib_dev_priv *priv = netdev_priv(dev); > >+ /* We trap for port events ourselves. */ >+ if (status == -ENETRESET) >+ return 0; >+ > if (!status) >- ipoib_mcast_join_finish(mcast, mcmember); >- else { >+ status = ipoib_mcast_join_finish(mcast, &multicast->rec); >+ >+ if (status) { > if (mcast->logcount++ < 20) > ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " > IPOIB_GID_FMT ", status %d\n", >@@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s > spin_unlock_irq(&priv->tx_lock); > > /* Clear the busy flag so we try again */ >- clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); >- mcast->query = NULL; >+ status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, >+ &mcast->flags); > } >- >- complete(&mcast->done); >+ return status; > } > > static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast) >@@ -359,35 +361,32 @@ #endif > rec.port_gid = priv->local_gid; > rec.pkey = cpu_to_be16(priv->pkey); > >- init_completion(&mcast->done); >- >- ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, &rec, >- IB_SA_MCMEMBER_REC_MGID | >- IB_SA_MCMEMBER_REC_PORT_GID | >- IB_SA_MCMEMBER_REC_PKEY | >- IB_SA_MCMEMBER_REC_JOIN_STATE, >- 1000, GFP_ATOMIC, >- ipoib_mcast_sendonly_join_complete, >- mcast, &mcast->query); >- if (ret < 0) { >- ipoib_warn(priv, "ib_sa_mcmember_rec_set failed (ret = %d)\n", >+ mcast->mc = ib_join_multicast(priv->ca, priv->port, &rec, >+ IB_SA_MCMEMBER_REC_MGID | >+ IB_SA_MCMEMBER_REC_PORT_GID | >+ IB_SA_MCMEMBER_REC_PKEY | >+ IB_SA_MCMEMBER_REC_JOIN_STATE, >+ GFP_ATOMIC, >+ ipoib_mcast_sendonly_join_complete, >+ mcast); >+ if (IS_ERR(mcast->mc)) { >+ ret = PTR_ERR(mcast->mc); >+ clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); >+ ipoib_warn(priv, "ib_join_multicast failed (ret = %d)\n", > ret); > } else { > ipoib_dbg_mcast(priv, "no multicast record for " IPOIB_GID_FMT > ", starting join\n", > IPOIB_GID_ARG(mcast->mcmember.mgid)); >- >- mcast->query_id = ret; > } > > return ret; > } > >-static void ipoib_mcast_join_complete(int status, >- struct ib_sa_mcmember_rec *mcmember, >- void *mcast_ptr) >+static int ipoib_mcast_join_complete(int status, >+ struct ib_multicast *multicast) > { >- struct ipoib_mcast *mcast = mcast_ptr; >+ struct ipoib_mcast *mcast = multicast->context; > struct net_device *dev = mcast->dev; > struct ipoib_dev_priv *priv = netdev_priv(dev); > >@@ -395,23 +394,24 @@ static void ipoib_mcast_join_complete(in > " (status %d)\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), status); > >- if (!status && !ipoib_mcast_join_finish(mcast, mcmember)) { >+ /* We trap for port events ourselves. */ >+ if (status == -ENETRESET) >+ return 0; >+ >+ if (!status) >+ status = ipoib_mcast_join_finish(mcast, &multicast->rec); >+ >+ if (!status) { > mcast->backoff = 1; > mutex_lock(&mcast_mutex); > if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) > queue_work(ipoib_workqueue, &priv->mcast_task); > mutex_unlock(&mcast_mutex); >- complete(&mcast->done); >- return; >- } >- >- if (status == -EINTR) { >- complete(&mcast->done); >- return; >+ return 0; > } > >- if (status && mcast->logcount++ < 20) { >- if (status == -ETIMEDOUT || status == -EINTR) { >+ if (mcast->logcount++ < 20) { >+ if (status == -ETIMEDOUT) { > ipoib_dbg_mcast(priv, "multicast join failed for " IPOIB_GID_FMT > ", status %d\n", > IPOIB_GID_ARG(mcast->mcmember.mgid), >@@ -428,23 +428,18 @@ static void ipoib_mcast_join_complete(in > if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) > mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS; > >- mutex_lock(&mcast_mutex); >+ /* Clear the busy flag so we try again */ >+ status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); > >+ mutex_lock(&mcast_mutex); > spin_lock_irq(&priv->lock); >- mcast->query = NULL; >- >- if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { >- if (status == -ETIMEDOUT) >- queue_work(ipoib_workqueue, &priv->mcast_task); >- else >- queue_delayed_work(ipoib_workqueue, &priv->mcast_task, >- mcast->backoff * HZ); >- } else >- complete(&mcast->done); >+ if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) >+ queue_delayed_work(ipoib_workqueue, &priv->mcast_task, >+ mcast->backoff * HZ); > spin_unlock_irq(&priv->lock); > mutex_unlock(&mcast_mutex); > >- return; >+ return status; > } > > static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast, >@@ -493,15 +488,14 @@ static void ipoib_mcast_join(struct net_ > rec.hop_limit = priv->broadcast->mcmember.hop_limit; > } > >- init_completion(&mcast->done); >- >- ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, >- &rec, comp_mask, mcast->backoff * 1000, >- GFP_ATOMIC, ipoib_mcast_join_complete, >- mcast, &mcast->query); >- >- if (ret < 0) { >- ipoib_warn(priv, "ib_sa_mcmember_rec_set failed, status %d\n", ret); >+ set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); >+ mcast->mc = ib_join_multicast(priv->ca, priv->port, &rec, comp_mask, >+ GFP_KERNEL, ipoib_mcast_join_complete, >+ mcast); >+ if (IS_ERR(mcast->mc)) { >+ clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); >+ ret = PTR_ERR(mcast->mc); >+ ipoib_warn(priv, "ib_join_multicast failed, status %d\n", ret); > > mcast->backoff *= 2; > if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) >@@ -513,8 +507,7 @@ static void ipoib_mcast_join(struct net_ > &priv->mcast_task, > mcast->backoff * HZ); > mutex_unlock(&mcast_mutex); >- } else >- mcast->query_id = ret; >+ } > } > > void ipoib_mcast_join_task(void *dev_ptr) >@@ -538,7 +531,7 @@ void ipoib_mcast_join_task(void *dev_ptr > priv->local_rate = attr.active_speed * > ib_width_enum_to_int(attr.active_width); > } else >- ipoib_warn(priv, "ib_query_port failed\n"); >+ ipoib_warn(priv, "ib_query_port failed\n"); > } > > if (!priv->broadcast) { >@@ -565,7 +558,8 @@ void ipoib_mcast_join_task(void *dev_ptr > } > > if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { >- ipoib_mcast_join(dev, priv->broadcast, 0); >+ if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) >+ ipoib_mcast_join(dev, priv->broadcast, 0); > return; > } > >@@ -620,26 +614,9 @@ int ipoib_mcast_start_thread(struct net_ > return 0; > } > >-static void wait_for_mcast_join(struct ipoib_dev_priv *priv, >- struct ipoib_mcast *mcast) >-{ >- spin_lock_irq(&priv->lock); >- if (mcast && mcast->query) { >- ib_sa_cancel_query(mcast->query_id, mcast->query); >- mcast->query = NULL; >- spin_unlock_irq(&priv->lock); >- ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n", >- IPOIB_GID_ARG(mcast->mcmember.mgid)); >- wait_for_completion(&mcast->done); >- } >- else >- spin_unlock_irq(&priv->lock); >-} >- > int ipoib_mcast_stop_thread(struct net_device *dev, int flush) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); >- struct ipoib_mcast *mcast; > > ipoib_dbg_mcast(priv, "stopping multicast thread\n"); > >@@ -655,52 +632,27 @@ int ipoib_mcast_stop_thread(struct net_d > if (flush) > flush_workqueue(ipoib_workqueue); > >- wait_for_mcast_join(priv, priv->broadcast); >- >- list_for_each_entry(mcast, &priv->multicast_list, list) >- wait_for_mcast_join(priv, mcast); >- > return 0; > } > > static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); >- struct ib_sa_mcmember_rec rec = { >- .join_state = 1 >- }; > int ret = 0; > >- if (!test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) >- return 0; >- >- ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", >- IPOIB_GID_ARG(mcast->mcmember.mgid)); >- >- rec.mgid = mcast->mcmember.mgid; >- rec.port_gid = priv->local_gid; >- rec.pkey = cpu_to_be16(priv->pkey); >+ if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { >+ ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", >+ IPOIB_GID_ARG(mcast->mcmember.mgid)); > >- /* Remove ourselves from the multicast group */ >- ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), >- &mcast->mcmember.mgid); >- if (ret) >- ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); >+ /* Remove ourselves from the multicast group */ >+ ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), >+ &mcast->mcmember.mgid); >+ if (ret) >+ ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); >+ } > >- /* >- * Just make one shot at leaving and don't wait for a reply; >- * if we fail, too bad. >- */ >- ret = ib_sa_mcmember_rec_delete(&ipoib_sa_client, priv->ca, priv->port, &rec, >- IB_SA_MCMEMBER_REC_MGID | >- IB_SA_MCMEMBER_REC_PORT_GID | >- IB_SA_MCMEMBER_REC_PKEY | >- IB_SA_MCMEMBER_REC_JOIN_STATE, >- 0, GFP_ATOMIC, NULL, >- mcast, &mcast->query); >- if (ret < 0) >- ipoib_warn(priv, "ib_sa_mcmember_rec_delete failed " >- "for leave (result = %d)\n", ret); >+ if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) >+ ib_free_multicast(mcast->mc); > > return 0; > } >@@ -753,7 +705,7 @@ void ipoib_mcast_send(struct net_device > dev_kfree_skb_any(skb); > } > >- if (mcast->query) >+ if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) > ipoib_dbg_mcast(priv, "no address vector, " > "but multicast join already started\n"); > else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) >@@ -910,7 +862,6 @@ void ipoib_mcast_restart_task(void *dev_ > > /* We have to cancel outside of the spinlock */ > list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { >- wait_for_mcast_join(priv, mcast); > ipoib_mcast_leave(mcast->dev, mcast); > ipoib_mcast_free(mcast); > } >diff --git a/include/rdma/ib_multicast.h b/include/rdma/ib_multicast.h >new file mode 100755 >index 0000000..423b754 >--- /dev/null >+++ b/include/rdma/ib_multicast.h >@@ -0,0 +1,102 @@ >+/* >+ * Copyright (c) 2006 Intel Corporation. All rights reserved. >+ * >+ * This software is available to you under a choice of one of two >+ * licenses. You may choose to be licensed under the terms of the GNU >+ * General Public License (GPL) Version 2, available from the file >+ * COPYING in the main directory of this source tree, or the >+ * OpenIB.org BSD license below: >+ * >+ * Redistribution and use in source and binary forms, with or >+ * without modification, are permitted provided that the following >+ * conditions are met: >+ * >+ * - Redistributions of source code must retain the above >+ * copyright notice, this list of conditions and the following >+ * disclaimer. >+ * >+ * - Redistributions in binary form must reproduce the above >+ * copyright notice, this list of conditions and the following >+ * disclaimer in the documentation and/or other materials >+ * provided with the distribution. >+ * >+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND >+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS >+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN >+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN >+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE >+ * SOFTWARE. >+ */ >+ >+#ifndef IB_MULTICAST_H >+#define IB_MULTICAST_H >+ >+#include >+ >+struct ib_multicast { >+ struct ib_sa_mcmember_rec rec; >+ ib_sa_comp_mask comp_mask; >+ int (*callback)(int status, >+ struct ib_multicast *multicast); >+ void *context; >+}; >+ >+/** >+ * ib_join_multicast - Initiates a join request to the specified multicast >+ * group. >+ * @device: Device associated with the multicast group. >+ * @port_num: Port on the specified device to associate with the multicast >+ * group. >+ * @rec: SA multicast member record specifying group attributes. >+ * @comp_mask: Component mask indicating which group attributes of %rec are >+ * valid. >+ * @gfp_mask: GFP mask for memory allocations. >+ * @callback: User callback invoked once the join operation completes. >+ * @context: User specified context stored with the ib_multicast structure. >+ * >+ * This call initiates a multicast join request with the SA for the specified >+ * multicast group. If the join operation is started successfully, it returns >+ * an ib_multicast structure that is used to track the multicast operation. >+ * Users must free this structure by calling ib_free_multicast, even if the >+ * join operation later fails. (The callback status is non-zero.) >+ */ >+struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, >+ struct ib_sa_mcmember_rec *rec, >+ ib_sa_comp_mask comp_mask, gfp_t gfp_mask, >+ int (*callback)(int status, >+ struct ib_multicast >+ *multicast), >+ void *context); >+ >+/** >+ * ib_free_multicast - Frees the multicast tracking structure, and releases >+ * any reference on the multicast group. >+ * @multicast: Multicast tracking structure allocated by ib_join_multicast. >+ * >+ * This call blocks until the connection identifier is destroyed. It may >+ * not be called from within the multicast callback; however, returning a non- >+ * zero value from the callback will result in destroying the multicast >+ * tracking structure. >+ */ >+void ib_free_multicast(struct ib_multicast *multicast); >+ >+/** >+ * ib_get_mcmember_rec - Looks up a multicast member record by its MGID and >+ * returns it if found. >+ * @device: Device associated with the multicast group. >+ * @port_num: Port on the specified device to associate with the multicast >+ * group. >+ * @mgid: optional MGID of multicast group. >+ * @rec: Location to copy SA multicast member record. >+ * >+ * If an MGID is specified, returns an existing multicast member record if >+ * one is found for the local port. If no MGID is specified, or the specified >+ * MGID is 0, returns a multicast member record filled in with default values >+ * that may be used to create a new multicast group. >+ */ >+int ib_get_mcmember_rec(struct ib_device *device, u8 port_num, >+ union ib_gid *mgid, struct ib_sa_mcmember_rec *rec); >+ >+#endif /* IB_MULTICAST_H */ > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From hnguyen at de.ibm.com Wed Oct 11 01:04:56 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 11 Oct 2006 10:04:56 +0200 Subject: [openib-general] OFED-1.1-rc7 and mthca on ppc64: compile warnings Message-ID: <200610111004.56688.hnguyen@de.ibm.com> Just saw those warnings when I compiled kernel 2.6.18 on ppc64 using OFED-1.1-rc7 code with mthca and ehca selected. CC [M] drivers/infiniband/hw/mthca/mthca_qp.o drivers/infiniband/hw/mthca/mthca_qp.c: In function `mthca_arbel_post_send': drivers/infiniband/hw/mthca/mthca_qp.c:1870: warning: `f0' might be used uninitialized in this function drivers/infiniband/hw/mthca/mthca_qp.c: In function `mthca_tavor_post_send': drivers/infiniband/hw/mthca/mthca_qp.c:1527: warning: `f0' might be used uninitialized in this function Nam Nguyen From kliteyn at dev.mellanox.co.il Wed Oct 11 01:08:28 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 11 Oct 2006 10:08:28 +0200 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmtest.c Message-ID: Hi Hal This patch fixes a bunch of ignored errors in osmtest.c (plus some cosmetics). In particular, for some reason osmtest.c doesn't treat IB_INVALID_PARAMETER as an error. I couldn't find any reasonable explanation to it. Perhaps it was usefull while writing osmtest? Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest.c =================================================================== --- osmtest.c (revision 9776) +++ osmtest.c (working copy) @@ -582,12 +582,9 @@ osmtest_query_res_cb( IN osmv_query_res_ if( p_rec->status != IB_SUCCESS ) { - if ( p_rec->status != IB_INVALID_PARAMETER ) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_query_res_cb: ERR 0003: " - "Error on query (%s)\n", ib_get_err_str( p_rec->status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_query_res_cb: ERR 0003: " + "Error on query (%s)\n", ib_get_err_str( p_rec->status ) ); } OSM_LOG_EXIT( &p_osmt->log ); @@ -723,12 +720,9 @@ osmtest_validate_sa_class_port_info( IN if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_validate_sa_class_port_info: ERR 0070: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_validate_sa_class_port_info: ERR 0070: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -826,12 +820,9 @@ osmtest_get_node_rec( IN osmtest_t * con if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_node_rec: ERR 0072: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_node_rec: ERR 0072: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -913,12 +904,9 @@ osmtest_get_node_rec_by_lid( IN osmtest_ if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_node_rec_by_lid: ERR 0074: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_node_rec_by_lid: ERR 0074: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); @@ -1232,6 +1220,8 @@ osmtest_get_port_rec( IN osmtest_t * con return ( status ); } +/********************************************************************** + **********************************************************************/ ib_api_status_t osmtest_get_port_rec_by_num( IN osmtest_t * const p_osmt, IN ib_net16_t const lid, @@ -1294,12 +1284,9 @@ osmtest_get_port_rec_by_num( IN osmtest_ if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_port_rec_by_num: ERR 0078: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_port_rec_by_num: ERR 0078: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { @@ -1611,7 +1598,6 @@ osmtest_stress_path_recs_by_guid ( IN os /* next one please */ p_dst_node = ( node_t * ) cl_qmap_next( &p_dst_node->map_item ); } -/* } */ p_src_node = ( node_t * ) cl_qmap_next( &p_src_node->map_item ); } @@ -1810,7 +1796,11 @@ osmtest_wrong_sm_key_ignored( IN osmtest req.sm_key = 9999; context.result.p_result_madw = NULL; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_wrong_sm_key_ignored: " EXPECTING_ERRORS_START "\n" ); status = osmv_query_sa( p_osmt->h_bind, &req ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_wrong_sm_key_ignored: " EXPECTING_ERRORS_END "\n" ); /* since we use a wrong sm_key we should get a timeout */ if( status != IB_TIMEOUT ) @@ -2494,7 +2484,7 @@ osmtest_write_all_node_recs( status = osmtest_get_node_rec_by_lid( p_osmt, cl_ntoh16( lid ), &context ); if( status != IB_SUCCESS ) { - if ( (status != IB_INVALID_PARAMETER) && (status != IB_SA_MAD_STATUS_NO_RECORDS)) + if ( status != IB_SA_MAD_STATUS_NO_RECORDS ) { osm_log( &p_osmt->log, OSM_LOG_DEBUG, "osmtest_write_all_node_recs: ERR 0028: " @@ -2615,7 +2605,7 @@ osmtest_write_all_port_recs( IN osmtest_ &context ); if( status != IB_SUCCESS ) { - if( (status != IB_INVALID_PARAMETER) && (status != IB_SA_MAD_STATUS_NO_RECORDS)) + if( status != IB_SA_MAD_STATUS_NO_RECORDS ) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_write_all_port_recs: WRN 0122: " @@ -2690,6 +2680,7 @@ osmtest_write_all_path_recs( IN osmtest_ int num_recs, i; cl_qmap_t *p_tbl; node_t *p_src_node, *p_dst_node; + ib_api_status_t got_status = IB_SUCCESS; OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_path_recs ); @@ -2725,11 +2716,13 @@ osmtest_write_all_path_recs( IN osmtest_ if( status != IB_SUCCESS ) { - osm_log( &p_osmt->log, OSM_LOG_DEBUG, - "osmtest_write_all_path_recs: WRN 0124: " + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_write_all_path_recs: ERR 0124: " "failed to get path info from LID:0x%X To LID:0x%X (%s)\n", p_src_node->rec.lid, p_dst_node->rec.lid, ib_get_err_str( status ) ); + /* remember the first error status */ + got_status = ( got_status == IB_SUCCESS ) ? status : got_status; } else { @@ -2757,6 +2750,9 @@ osmtest_write_all_path_recs( IN osmtest_ p_src_node = ( node_t * ) cl_qmap_next( &p_src_node->map_item ); } + if ( got_status != IB_SUCCESS ) + status = got_status; + /* * Return the IB query MAD to the pool as necessary. */ @@ -4252,6 +4248,8 @@ osmtest_validate_all_node_recs( IN osmte return ( status ); } +/********************************************************************** + **********************************************************************/ static ib_api_status_t osmtest_validate_all_guidinfo_recs( IN osmtest_t * const p_osmt ) { @@ -4457,12 +4455,9 @@ osmtest_get_link_rec_by_lid( IN osmtest_ if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_link_rec_by_lid: ERR 007B: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_link_rec_by_lid: ERR 007B: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); @@ -4546,12 +4541,9 @@ osmtest_get_guidinfo_rec_by_lid( IN osmt if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_guidinfo_rec_by_lid: ERR 007D: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_guidinfo_rec_by_lid: ERR 007D: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); @@ -4636,12 +4628,9 @@ osmtest_get_pkeytbl_rec_by_lid( IN osmte if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_pkeytbl_rec_by_lid: ERR 007F: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_pkeytbl_rec_by_lid: ERR 007F: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); @@ -4725,12 +4714,9 @@ osmtest_get_lft_rec_by_lid( IN osmtest_t if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_get_lft_rec_by_lid: ERR 008B: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_lft_rec_by_lid: ERR 008B: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); @@ -4803,12 +4789,9 @@ osmtest_sminfo_record_request( if( status != IB_SUCCESS ) { - if (status != IB_INVALID_PARAMETER) - { - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmtest_sminfo_record_request: ERR 008D: " - "ib_query failed (%s)\n", ib_get_err_str( status ) ); - } + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_sminfo_record_request: ERR 008D: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { p_mad = osm_madw_get_mad_ptr( p_context->result.p_result_madw ); @@ -4985,6 +4968,7 @@ osmtest_validate_single_node_rec_lid( IN "osmtest_validate_single_node_rec_lid: ERR 0107: " "osmtest_validate_node_data failed (%s)\n", ib_get_err_str( status ) ); + goto Exit; } } @@ -5069,6 +5053,7 @@ osmtest_validate_single_path_rec_guid_pa size_t num_recs; osmv_query_req_t req; uint32_t i; + boolean_t got_error = FALSE; OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_single_path_rec_guid_pair ); @@ -5146,6 +5131,7 @@ osmtest_validate_single_path_rec_guid_pa ", received 0x%016" PRIx64 "\n", cl_ntoh64( p_pair->dest_guid ), cl_ntoh64( p_rec->dgid.unicast.interface_id ) ); + got_error = TRUE; } if( p_rec->sgid.unicast.interface_id != p_pair->src_guid ) @@ -5157,6 +5143,7 @@ osmtest_validate_single_path_rec_guid_pa ", received 0x%016" PRIx64 ".\n", cl_ntoh64( p_pair->src_guid ), cl_ntoh64( p_rec->sgid.unicast.interface_id ) ); + got_error = TRUE; } status = osmtest_validate_path_rec( p_osmt, p_rec ); @@ -5166,10 +5153,14 @@ osmtest_validate_single_path_rec_guid_pa "osmtest_validate_single_path_rec_guid_pair: ERR 0114: " "osmtest_validate_path_rec failed (%s)\n", ib_get_err_str( status ) ); + got_error = TRUE; } - if (status != IB_SUCCESS ) + if ( got_error || (status != IB_SUCCESS) ) { osm_dump_path_record( &p_osmt->log, p_rec, OSM_LOG_VERBOSE ); + if ( status == IB_SUCCESS ) + status = IB_ERROR; + goto Exit; } } @@ -5319,6 +5310,14 @@ osmtest_validate_single_node_recs( IN os status = osmtest_validate_single_node_rec_lid( p_osmt, (ib_net16_t) cl_qmap_key ((cl_map_item_t*)p_node), p_node ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_validate_single_node_recs: ERR 011A: " + "osmtest_validate_single_node_rec_lid (%s)\n", + ib_get_err_str( status ) ); + goto Exit; + } cnt++; p_node = ( node_t * ) cl_qmap_next( &p_node->map_item ); } @@ -5376,6 +5375,14 @@ osmtest_validate_single_port_recs( IN os while( p_port != ( port_t * ) cl_qmap_end( p_port_key_tbl ) ) { status = osmtest_validate_single_port_rec_lid( p_osmt, p_port ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_validate_single_port_recs: ERR 011B: " + "osmtest_validate_single_port_rec_lid (%s)\n", + ib_get_err_str( status ) ); + goto Exit; + } cnt++; p_port = ( port_t * ) cl_qmap_next( &p_port->map_item ); } @@ -6664,6 +6671,7 @@ osmtest_parse_port( IN osmtest_t * const "osmtest_parse_port: ERR 0126: " "LID must be specified for defined ports\n" ); port_delete( p_port ); + status = IB_ERROR; goto Exit; } @@ -6689,6 +6697,7 @@ osmtest_parse_path( IN osmtest_t * const boolean_t done = FALSE; path_t *p_path; const osmtest_token_t *p_tok; + boolean_t got_error = FALSE; OSM_LOG_ENTER( &p_osmt->log, osmtest_parse_path ); @@ -6728,6 +6737,7 @@ osmtest_parse_path( IN osmtest_t * const "osmtest_parse_path: ERR 0128: " "Ignoring line %u with unknown token: %s\n", *p_line_num, &line[offset] ); + got_error = TRUE; continue; } @@ -6833,10 +6843,16 @@ osmtest_parse_path( IN osmtest_t * const "Ignoring line %u with unknown token: %s\n", *p_line_num, &line[offset] ); + got_error = TRUE; break; } } + if ( got_error ) + { + status = IB_ERROR; + goto Exit; + } /* * Make sure the user specified enough information, then * add this object to the database. @@ -6844,6 +6860,7 @@ osmtest_parse_path( IN osmtest_t * const if( osmtest_path_rec_kay_is_valid( p_osmt, p_path ) == FALSE ) { path_delete( p_path ); + status = IB_ERROR; goto Exit; } @@ -6868,6 +6885,7 @@ osmtest_parse_link( IN osmtest_t * const char line[OSMTEST_MAX_LINE_LEN]; boolean_t done = FALSE; const osmtest_token_t *p_tok; + boolean_t got_error = FALSE; OSM_LOG_ENTER( &p_osmt->log, osmtest_parse_link); @@ -6904,6 +6922,7 @@ osmtest_parse_link( IN osmtest_t * const "osmtest_parse_link: ERR 012B: " "Ignoring line %u with unknown token: %s\n", *p_line_num, &line[offset] ); + got_error = TRUE; continue; } @@ -6934,11 +6953,14 @@ osmtest_parse_link( IN osmtest_t * const "osmtest_parse_link: ERR 012C: " "Ignoring line %u with unknown token: %s\n", *p_line_num, &line[offset] ); - + got_error = TRUE; break; } } + if ( got_error ) + status = IB_ERROR; + Exit: OSM_LOG_EXIT( &p_osmt->log ); return ( status ); @@ -6955,6 +6977,7 @@ osmtest_create_db( IN osmtest_t * const char line[OSMTEST_MAX_LINE_LEN]; uint32_t line_num = 0; const osmtest_token_t *p_tok; + boolean_t got_error = FALSE; OSM_LOG_ENTER( &p_osmt->log, osmtest_create_db ); @@ -6988,6 +7011,7 @@ osmtest_create_db( IN osmtest_t * const osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_create_db: ERR 0131: " "Ignoring line %u: %s\n", line_num, &line[offset] ); + got_error = TRUE; continue; } @@ -7023,9 +7047,13 @@ osmtest_create_db( IN osmtest_t * const osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmtest_create_db: ERR 0132: " "Ignoring line %u: %s\n", line_num, &line[offset] ); + got_error = TRUE; break; } + if ( got_error ) + status = IB_ERROR; + if( status != IB_SUCCESS ) { osm_log( &p_osmt->log, OSM_LOG_ERROR, From vlad at mellanox.co.il Wed Oct 11 02:00:21 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 11 Oct 2006 11:00:21 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 In-Reply-To: <452C8631.7040200@ichips.intel.com> References: <4525271E.8070000@dev.mellanox.co.il> <452C8631.7040200@ichips.intel.com> Message-ID: <1160557221.5804.15.camel@vladsk-laptop> Hi Arlin, This patch is in OFED-1.1-rc7 and applied during installation. Regards, Vladimir On Tue, 2006-10-10 at 22:50 -0700, Arlin Davis wrote: > Aviram Gutman wrote: > > >OFED-1.1-rc7 is available on > >https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > >File: OFED-1.1-rc7.tgz > >Please report any issues in bugzilla http://openib.org/bugzilla/ > > > > > > > Aviram, > > Can you verify that the sean_cm_drep_on_not_found.patch is actually > applied in RC7? Our delayed disconnect problems still exist. > > I don't see the new symbol "cm_issue_drep" in ib_cm.ko on our RC7 > installed systems so I don't think the patch applied. > > Thanks, > > -arlin > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg From mst at mellanox.co.il Wed Oct 11 02:05:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 11:05:04 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061010.191547.83619974.davem@davemloft.net> References: <20061010.191547.83619974.davem@davemloft.net> Message-ID: <20061011090504.GC2938@mellanox.co.il> Quoting r. David Miller : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" > Date: Wed, 11 Oct 2006 02:13:38 +0200 > > > Maybe I can patch linux to allow SG without checksum? > > Dave, maybe you could drop a hint or two on whether this is worthwhile > > and what are the issues that need addressing to make this work? > > > > I imagine it's not just the matter of changing net/core/dev.c :). > > You can't, it's a quality of implementation issue. We sendfile() > pages directly out of the filesystem page cache without any > blocking of modifications to the page contents, and the only way > that works is if the card computes the checksum for us. > > If we sendfile() a page directly, we must compute a correct checksum > no matter what the contents. We can't do this on the cpu before the > data hits the device because another thread of execution can go in and > modify the page contents which would invalidate the checksum and thus > invalidating the packet. We cannot allow this. > > Blocking modifications is too expensive, so that's not an option > either. > But copying still works fine, does it not? Dave, could you clarify this please? ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { ssize_t res; struct sock *sk = sock->sk; if (!(sk->sk_route_caps & NETIF_F_SG) || !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) return sock_no_sendpage(sock, page, offset, size, flags); So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, data will be copied over rather than sent directly. So why does dev.c have to force set NETIF_F_SG to off then? -- MST From steve at chygwyn.com Wed Oct 11 02:09:26 2006 From: steve at chygwyn.com (Steven Whitehouse) Date: Wed, 11 Oct 2006 10:09:26 +0100 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011090504.GC2938@mellanox.co.il> References: <20061010.191547.83619974.davem@davemloft.net> <20061011090504.GC2938@mellanox.co.il> Message-ID: <20061011090926.GA15393@fogou.chygwyn.com> Hi, On Wed, Oct 11, 2006 at 11:05:04AM +0200, Michael S. Tsirkin wrote: > Quoting r. David Miller : > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > From: "Michael S. Tsirkin" > > Date: Wed, 11 Oct 2006 02:13:38 +0200 > > > > > Maybe I can patch linux to allow SG without checksum? > > > Dave, maybe you could drop a hint or two on whether this is worthwhile > > > and what are the issues that need addressing to make this work? > > > > > > I imagine it's not just the matter of changing net/core/dev.c :). > > > > You can't, it's a quality of implementation issue. We sendfile() > > pages directly out of the filesystem page cache without any > > blocking of modifications to the page contents, and the only way > > that works is if the card computes the checksum for us. > > > > If we sendfile() a page directly, we must compute a correct checksum > > no matter what the contents. We can't do this on the cpu before the > > data hits the device because another thread of execution can go in and > > modify the page contents which would invalidate the checksum and thus > > invalidating the packet. We cannot allow this. > > > > Blocking modifications is too expensive, so that's not an option > > either. > > > I would argue that SG does make sense without checksum for protocols that don't need/use a checksum. DECnet for example could do zero-copy without caring about the checksum since it doesn't have one. One of these days I'll get around to writing that bit of code :-) > But copying still works fine, does it not? > Dave, could you clarify this please? > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > size_t size, int flags) > { > ssize_t res; > struct sock *sk = sock->sk; > > if (!(sk->sk_route_caps & NETIF_F_SG) || > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > return sock_no_sendpage(sock, page, offset, size, flags); > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > data will be copied over rather than sent directly. > So why does dev.c have to force set NETIF_F_SG to off then? > I agree with that analysis, Steve. From davem at davemloft.net Wed Oct 11 02:20:15 2006 From: davem at davemloft.net (David Miller) Date: Wed, 11 Oct 2006 02:20:15 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011090504.GC2938@mellanox.co.il> References: <20061010.191547.83619974.davem@davemloft.net> <20061011090504.GC2938@mellanox.co.il> Message-ID: <20061011.022015.63051509.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Wed, 11 Oct 2006 11:05:04 +0200 > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > data will be copied over rather than sent directly. > So why does dev.c have to force set NETIF_F_SG to off then? Because it's more efficient to copy into a linear destination buffer of an SKB than page sub-chunks when doing checksum+copy. From mst at mellanox.co.il Wed Oct 11 02:46:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 11:46:49 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011.022015.63051509.davem@davemloft.net> References: <20061011.022015.63051509.davem@davemloft.net> Message-ID: <20061011094649.GD2701@mellanox.co.il> Quoting r. David Miller : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" > Date: Wed, 11 Oct 2006 11:05:04 +0200 > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > data will be copied over rather than sent directly. > > So why does dev.c have to force set NETIF_F_SG to off then? > > Because it's more efficient to copy into a linear destination > buffer of an SKB than page sub-chunks when doing checksum+copy. > Thanks for the explanation. Obviously its true as long as you can allocate the skb that big. I think you won't realistically be able to get 64K in a linear SKB on a busy system, though, is not that right? OTOH, having large MTU (e.g. 64K) helps performance a lot since it reduces receive side processing overhead. So, if I understand what you are saying correctly, things do work correctly (just slower for small skb) if NETIF_F_SG is set bug clear, it seems that all we need to do is drop the following in dev.c: /* Fix illegal SG+CSUM combinations. */ if ((dev->features & NETIF_F_SG) && !(dev->features & NETIF_F_ALL_CSUM)) { printk(KERN_NOTICE "%s: Dropping NETIF_F_SG since no checksum feature.\n", dev->name); dev->features &= ~NETIF_F_SG; } is that right? -- MST From pasha at dev.mellanox.co.il Wed Oct 11 03:09:19 2006 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 11 Oct 2006 12:09:19 +0200 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 In-Reply-To: References: Message-ID: <452CC2CF.2000104@dev.mellanox.co.il> On some of our SUSE 10 machines i found the 127.0.0.2 ip, but it was pointing to some random Linux site (linux.org) and has no effect on mpi runs. In you case the ip point to _real_ machine, it very strange. Scott Weitzenkamp (sweitzen) wrote: > Aha, I found something in /etc/hosts, thanks for the hint. > > 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 > > If I comment this line out, MVAPICH works fine. Does Mellanox have this > entry in /etc/hosts? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > >> -----Original Message----- >> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >> Sent: Thursday, October 05, 2006 5:59 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Aviram Gutman; OpenFabricsEWG; openib >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >> OFED 1.1 rc6 with SLES10 x86_64 >> >>> I see it for all MVAPICH tests, it's 100% consistent. >> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test >> over mvapich >> on SUSE10 platform ? >> Please check /etc/hosts file on your machines, it should be >> exactly the >> same on all nodes. >> >> Regards, >> Pasha >> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Server Virtualization Business Unit >>> Cisco Systems >>> >>> >>>> -----Original Message----- >>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >>>> Sent: Tuesday, October 03, 2006 3:37 AM >>>> To: Scott Weitzenkamp (sweitzen) >>>> Cc: Aviram Gutman; OpenFabricsEWG; openib >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>> OFED 1.1 rc6 with SLES10 x86_64 >>>> >>>> Hi Scott, >>>> Unfortunately was not able to reproduce the failure on our >> platforms. >>>> Do you see the problem with all tests or with the specific only ? >>>> Is it consistent problem ? >>>> >>>> Regards, >>>> Pasha >>>> >>>> Scott Weitzenkamp (sweitzen) wrote: >>>>> $ uname -a >>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 >>>> 18:25:39 UTC 2006 >>>>> x86_64 >>>>> x86_64 x86_64 GNU/Linux >>>>> $ >>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >>>>> 192.168.2.46 192.168.2.49 hostname >>>>> svbu-qa1850-4 >>>>> svbu-qa1850-3 >>>>> $ >>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >>>>> 192.168.2.46 192.168.2.49 >>>>> >>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench >>> marks-2.2/ >>>>> osu_latency >>>>> >>>>> The last command just hangs. Can I try your binary RPMs? >>>>> >>>>> Scott Weitzenkamp >>>>> SQA and Release Manager >>>>> Server Virtualization Business Unit >>>>> Cisco Systems >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] >>>>>> Sent: Sunday, October 01, 2006 2:29 AM >>>>>> To: Scott Weitzenkamp (sweitzen) >>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>>>> OFED 1.1 rc6 with SLES10 x86_64 >>>>>> >>>>>> Can you please elaborate on MVAPICH issues, can you send >>>>>> command line? >>>>>> We ran it here on 32 Opteron nodes each quad core and also >>>> rigorous >>>>>> tests on the many other nodes? >>>>>> >>>>>> >>>>>> >>>>>> Scott Weitzenkamp (sweitzen) wrote: >>>>>>> We are just getting started with OFED testing on SLES10, first >>>>>>> platform is x86_64. >>>>>>> >>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are >>>>>> working so far. >>>>>>> MVAPICH with OSU benchmarks just hang. This same >>>> hardware works >>>>>>> fine with OFED and RHEL4 U3. >>>>>>> >>>>>>> Has anyone else seen this? >>>>>>> >>>>>>> Scott Weitzenkamp >>>>>>> SQA and Release Manager >>>>>>> Server Virtualization Business Unit >>>>>>> Cisco Systems >>>>>>> >>>>>>> >>>>>> -------------------------------------------------------------- >>>>>> ---------- >>>>>>> _______________________________________________ >>>>>>> openfabrics-ewg mailing list >>>>>>> openfabrics-ewg at openib.org >>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg >>>>>>> >>>> -- >>>> Pavel Shamis (Pasha) >>>> Software Engineer >>>> Mellanox Technologies LTD. >>>> pasha at mellanox.co.il >>>> >> >> -- >> Pavel Shamis (Pasha) >> Software Engineer >> Mellanox Technologies LTD. >> pasha at mellanox.co.il >> > -- Pavel Shamis (Pasha) Software Engineer Mellanox Technologies LTD. pasha at mellanox.co.il From mst at mellanox.co.il Wed Oct 11 03:20:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 12:20:36 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000001c6ecf2$017874b0$9dd0180a@amr.corp.intel.com> References: <000001c6ecf2$017874b0$9dd0180a@amr.corp.intel.com> Message-ID: <20061011102036.GB3706@mellanox.co.il> Quoting r. Sean Hefty : > >Multiple distinct mcmember_rec queries could get us the same mcast group. > >Say MTU > 256 and MTU > 512 look different but actually will get > >same group in practice. > > > >Say 2 clients ask for these 2 queries. > >What will be in the ib_sa_mcmember_rec in this case? > > The module currently does not handle queries in as complex a way as the SA. > The current matching is limited to equality comparisons against the local > mcmember record. (See cmp_rec() in multicast.c.) If there's a need to expand > the comparisons, that can be done. Yes, I think supporting more queries is required in real world. For heterogenious clusters, it seems we already need something like IB_SA_LTE at least for the rate selector. It is somewhat unfortunate that we are duplicating the SA logic at the endnode in kernel memory here - current sa module has the advantage in that it just packs all data into a mad and sends it out. Something to think about. -- MST From kliteyn at dev.mellanox.co.il Wed Oct 11 03:23:34 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 11 Oct 2006 12:23:34 +0200 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c Message-ID: Hi Hal Fixing a few problems in the multicast test flow, plus some cosmetics. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmt_multicast.c =================================================================== --- osmt_multicast.c (revision 9776) +++ osmt_multicast.c (working copy) @@ -318,6 +318,9 @@ osmt_send_mcast_request( IN osmtest_t * if( status != IB_SUCCESS ) { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_send_mcast_request: ERR 0226: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); if( status == IB_REMOTE_ERROR ) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -409,7 +412,7 @@ osmt_init_mc_query_rec(IN osmtest_t * c * - Try full delete (JoinState and should be 0) * - Wait for trap 67. * - Try joining (not full mem) again to see the group was deleted. - * (should fail) + * (should fail - o15.0.1.13) * o15.0.1.15: * - Try deletion of the IPoIB MCG and get: ERR_REQ_INVALID * o15.0.1.16: @@ -548,6 +551,10 @@ osmt_run_mcast_flow( IN osmtest_t * cons p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); } + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Found %d non-IPoIB MC Groups.\n", mcg_outside_test_cnt); + if (IPoIBIsFound) { /* o15-0.2.4 - Check a join request to already created MCG */ @@ -638,11 +645,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons goto Exit; } - else - { - mtu_phys = 0; - rate_phys = 0; - } /* We do not want to leave the MCG since its IPoIB */ } @@ -658,10 +660,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons mc_req_rec.mlid = invalid_mlid; comp_mask = IB_MCR_COMPMASK_MLID; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 0xee, /* User Defined query Get */ &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status == IB_SUCCESS) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -688,10 +695,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons memset(&mc_req_rec.port_gid.unicast.interface_id, 0, sizeof(ib_net64_t)); comp_mask = IB_MCR_COMPMASK_GID; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 0xee, /* User Defined query Get */ &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status == IB_SUCCESS) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -713,9 +725,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Join with insufficiant comp mask qkey & pkey (o15.0.1.3)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - /* no MGID */ memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ @@ -733,6 +742,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -758,9 +769,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Join with insufficient comp mask - sl (15.0.1.3)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - /* no MGID */ memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ @@ -778,12 +786,14 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status != IB_REMOTE_ERROR || (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) != @@ -811,9 +821,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Join with insufficient comp mask - flow label (o15.0.1.3)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -829,6 +836,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -857,9 +866,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Join with insufficiant comp mask - tclass (o15.0.1.3)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER) ; @@ -875,6 +881,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -904,9 +912,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Join with insufficient comp mask - tclass qkey (o15.0.1.3)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - /* no MGID */ /* memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); */ /* Request Join */ @@ -924,6 +929,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -968,10 +975,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1007,10 +1019,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1046,10 +1063,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1065,7 +1087,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* Checking above max value of MTU which is impossible */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " - "Checking Join with unrealistic mtu (o15.0.1.8)...\n" + "Checking Join with unrealistic mtu : \n\t\tmore than 4096 -" + " max (o15.0.1.8)...\n" ); /* impossible requested mtu */ @@ -1082,10 +1105,16 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_TCLASS | /* all above are required */ IB_MCR_COMPMASK_MTU_SEL | IB_MCR_COMPMASK_MTU; + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1098,11 +1127,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons goto Exit; } - osm_log( &p_osmt->log, OSM_LOG_INFO, - "osmt_run_mcast_flow: " - "Checking Join with unrealistic mtu (o15.0.1.8)...\n" - ); - /* Checking above max value of MTU which is impossible */ + /* Checking below min value of MTU which is impossible */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " "Checking Join with unrealistic mtu : \n\t\tless than 256 -" @@ -1124,10 +1149,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_MTU_SEL | IB_MCR_COMPMASK_MTU; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1160,10 +1190,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_MTU_SEL | IB_MCR_COMPMASK_MTU; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1199,10 +1234,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_LIFE | IB_MCR_COMPMASK_LIFE_SEL; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + if (status != IB_REMOTE_ERROR || res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -1224,8 +1264,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "osmt_run_mcast_flow: " "Checking Create given MGID=0 skip service level (o15.0.1.4)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -1248,6 +1286,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -1303,8 +1343,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Create given MGID=0 skip Qkey and Pkey (o15.0.1.4)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -1327,6 +1365,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -1356,8 +1396,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons "osmt_run_mcast_flow: " "Checking Create given MGID=0 skip TClass (o15.0.1.4)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -1381,6 +1419,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_RATE_SEL | IB_MCR_COMPMASK_RATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -1592,6 +1632,12 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* Good Flow - mgid is 0 while giving all required fields for join : P_Key, Q_Key, SL, FlowLabel, Tclass */ /* Using Exact feasible MTU & RATE */ + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, + "osmt_run_mcast_flow: " + "Using Exact feasible MTU & RATE: " + "MTU = 0x%02X, RATE = 0x%02X\n", + mtu_phys, rate_phys); + mc_req_rec.mtu = mtu_phys; mc_req_rec.rate = rate_phys; @@ -1640,6 +1686,11 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* Good Flow - mgid is 0 while giving all required fields for join : P_Key, Q_Key, SL, FlowLabel, Tclass */ /* Using Exact feasible RATE */ + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, + "osmt_run_mcast_flow: " + "Using Exact feasible RATE: 0x%02X\n", + rate_phys); + mc_req_rec.rate = rate_phys; comp_mask = @@ -1684,6 +1735,11 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* Good Flow - mgid is 0 while giving all required fields for join : P_Key, Q_Key, SL, FlowLabel, Tclass */ /* Using Exact feasible MTU */ + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, + "osmt_run_mcast_flow: " + "Using Exact feasible MTU: 0x%02X\n", + mtu_phys); + mc_req_rec.mtu = mtu_phys; comp_mask = @@ -1748,7 +1804,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons goto Exit; } - /* Lets try another multicast request */ + /* Good Flow - mgid is 0 while giving all required fields for join : P_Key, Q_Key, SL, FlowLabel, Tclass */ + /* Using feasible GREATER_THAN 0 packet lifitime */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " "Checking Create given MGID=0 (o15.0.1.4)...\n" @@ -1808,14 +1865,54 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* o15.0.1.6: */ /* - Create a new MCG with valid requested MGID. */ + osmt_init_mc_query_rec(p_osmt, &mc_req_rec); + mc_req_rec.mgid = good_mgid; osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " - "Checking Create given MGID=0x%016" PRIx64 " : " + "Checking Create given valid MGID=0x%016" PRIx64 " : " + "0x%016" PRIx64 " (o15.0.1.6)...\n", + cl_ntoh64(mc_req_rec.mgid.unicast.prefix), + cl_ntoh64(mc_req_rec.mgid.unicast.interface_id)); + + /* Before creation, need to check that this group doesn't exist */ + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Verifying that MCGroup with this MGID doesn't exist by trying to Join it (o15.0.1.13)...\n"); + + ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_NON_MEMBER); + + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); + status = osmt_send_mcast_request( p_osmt, 1, /* join */ + &mc_req_rec, + comp_mask, + &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); + + if ((status != IB_REMOTE_ERROR) || + (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: ERR 0301: " + "Tried joining group that shouldn't have existed - got %s/%s\n", + ib_get_err_str( status ), + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) + ); + status = IB_ERROR; + goto Exit; + } + + /* Set State to full member to allow group creation */ + ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); + + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Now creating group with given valid MGID=0x%016" PRIx64 " : " "0x%016" PRIx64 " (o15.0.1.6)...\n", cl_ntoh64(mc_req_rec.mgid.unicast.prefix), cl_ntoh64(mc_req_rec.mgid.unicast.interface_id)); - mc_req_rec.mgid = good_mgid; status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, @@ -2133,6 +2230,18 @@ osmt_run_mcast_flow( IN osmtest_t * cons goto Exit; } + /* Save the mlid created in test_created_mlids map */ + p_recvd_rec = (ib_member_rec_t*)ib_sa_mad_get_payload_ptr( &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, + "osmt_run_mcast_flow: " + "Created MGID:0x%016" PRIx64 " : " + "0x%016" PRIx64 " MLID:0x%04X\n", + cl_ntoh64( p_recvd_rec->mgid.unicast.prefix ), + cl_ntoh64( p_recvd_rec->mgid.unicast.interface_id ), + cl_ntoh16( p_recvd_rec->mlid )); + cl_map_insert(&test_created_mlids, + cl_ntoh16(p_recvd_rec->mlid), p_recvd_rec ); + /* Lets try another invalid join scope state */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " @@ -2651,7 +2760,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* - Try joining (not full mem) again to see the group was deleted. (should fail) */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " - "Checking Delete by trying to Join deleted group (o15.0.1.14)...\n" + "Checking Delete by trying to Join deleted group (o15.0.1.13)...\n" ); osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); @@ -2844,9 +2953,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* impossible requested mtu always greater than exist in MCG */ - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - mc_req_rec.mtu = IB_MTU_LEN_4096 | IB_PATH_SELECTOR_GREATER_THAN << 6; memcpy(&mc_req_rec.mgid,&tmp_mgid,sizeof(ib_gid_t)); ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -2857,6 +2963,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_MCR_COMPMASK_MTU_SEL | IB_MCR_COMPMASK_MTU; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, From mst at mellanox.co.il Wed Oct 11 05:09:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 14:09:49 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> Message-ID: <20061011120949.GA4927@mellanox.co.il> IPoIB discussion aside, some more comments on the API: > +struct ib_multicast { > + struct ib_sa_mcmember_rec rec; > + ib_sa_comp_mask comp_mask; > + int (*callback)(int status, > + struct ib_multicast *multicast); > + void *context; > +}; Why is ib_sa_mcmember_rec exposed? As was discussed separately, a single mcast group might match mutiple distinct queries. So need to either specify which values are valid, or use a different structure here. > +/** > + * ib_join_multicast - Initiates a join request to the specified multicast > + * group. > + * @device: Device associated with the multicast group. > + * @port_num: Port on the specified device to associate with the multicast > + * group. > + * @rec: SA multicast member record specifying group attributes. > + * @comp_mask: Component mask indicating which group attributes of %rec are > + * valid. > + * @gfp_mask: GFP mask for memory allocations. > + * @callback: User callback invoked once the join operation completes. Is callback invoked only once? If no should document. > + * @context: User specified context stored with the ib_multicast structure. > + * > + * This call initiates a multicast join request with the SA for the specified > + * multicast group. If the join operation is started successfully, it returns > + * an ib_multicast structure that is used to track the multicast operation. > + * Users must free this structure by calling ib_free_multicast, even if the > + * join operation later fails. (The callback status is non-zero.) > + */ > +struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, > + struct ib_sa_mcmember_rec *rec, > + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, > + int (*callback)(int status, > + struct ib_multicast > + *multicast), What are the values for status? I see that it's not just 0 or non-zero - IPoIB needs to treat some of them magically for some reason. Must document. >From IPoIB code it seems that e.g. port state change will cause join to fail. Correct? Why is that? Why not rely on timeout for that? > + void *context); We used to have a timeout value - I think this is something ULP might want control over. Of course if we are already in the group you can return immediately. -- MST From mst at mellanox.co.il Wed Oct 11 05:33:49 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 14:33:49 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000001c6ecf2$017874b0$9dd0180a@amr.corp.intel.com> References: <20061011011025.GC30093@mellanox.co.il> <000001c6ecf2$017874b0$9dd0180a@amr.corp.intel.com> Message-ID: <20061011123349.GB4927@mellanox.co.il> OK, I'll talk about the ipoib change part here. Quoting r. Sean Hefty : > >You might be right. But I wander whether we'll regret it later that > >we switched to the slower generic thing when we already had a stable, > >streamlined version. > > To be direct, I'm not sure that I'd call the ipoib multicast either > streamlined or stable. Have to disagree on the stability count here - and this goes for all of IPoIB. If you look at the git log you'll see that all fixes were one-liners or updates due to API changes since probably 2.6.17. code in 2.6.17 has also been backported and in field use since IBG2 too (around Jan 2006) and from what I hear from field IPoIB is exceptionally stable there. git log -p v2.6.17.. -- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | diffstat b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 20 +++++++--- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 46 +++++++++++------------ 2 files changed, 38 insertions(+), 28 deletions(-) > A fix against it just recently went by, Do you refer to the strict compliance in d0df6d6d4539241179a1ef5394787825bf05bbce? Come on. > and given the > complexity of its use of bit flags, thread, mutex, locks, and pointers to > track state, the code is fairly difficult to follow, so I'm not surprised that > it's taken a while to stabilize. I think cleaning up code is always a good thing. So, please do not get discouraged by my comments. However, it would be nice to have the cleanup be 1. done in small steps so we can review each one 2. separate from the switch to the multicast module Thanks, -- MST From ogerlitz at voltaire.com Wed Oct 11 05:41:49 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 11 Oct 2006 14:41:49 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061010234933.GA29632@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> Message-ID: <452CE68D.8040709@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : >> Add an ib_multicast module to perform reference counting of multicast >> join / leave requests. Modify ib_ipoib to use the multicast module. >> Signed-off-by: Sean Hefty > On the ipoib change - whya re we doing it at all? > ib_ipoib does not actually need the multicast refcounting, does it? > I would be worried about doing major changes in ipoib multicast code, at this > point. The ipoib change is a must to allow for user space consumers to join/leave groups whose MGID is "of IP multicast origin". So one process can have its IP mulitcast traffic carried out by ipoib and another process can use librdmacm to join the same multicast group and attach a QP to the same MGID. Or. From eitan at mellanox.co.il Wed Oct 11 05:43:40 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 14:43:40 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CE68D.8040709@voltaire.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> Message-ID: <452CE6FC.5090303@mellanox.co.il> Or Gerlitz wrote: > Michael S. Tsirkin wrote: > >> Quoting r. Sean Hefty : >> >>> Add an ib_multicast module to perform reference counting of multicast >>> join / leave requests. Modify ib_ipoib to use the multicast module. >>> Signed-off-by: Sean Hefty > On the ipoib change - whya re we doing it at all? >> ib_ipoib does not actually need the multicast refcounting, does it? >> I would be worried about doing major changes in ipoib multicast code, at this >> point. >> > > The ipoib change is a must to allow for user space consumers to > join/leave groups whose MGID is "of IP multicast origin". > > So one process can have its IP mulitcast traffic carried out by ipoib > and another process can use librdmacm to join the same multicast group > and attach a QP to the same MGID. > If the tracking (ref counting) was done at the MAD level - no change to IPoIB would have been required ... > Or. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Oct 11 05:52:48 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 14:52:48 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CE68D.8040709@voltaire.com> References: <452CE68D.8040709@voltaire.com> Message-ID: <20061011125248.GA5181@mellanox.co.il> Quoting r. Or Gerlitz : > The ipoib change is a must to allow for user space consumers to > join/leave groups whose MGID is "of IP multicast origin". > > So one process can have its IP mulitcast traffic carried out by ipoib > and another process can use librdmacm to join the same multicast group > and attach a QP to the same MGID. Why is this even a good idea? If you are looking for reasons using mutlicast module in ipoib is good, I would say blocking unpriviledged userspace from joining IPoIB GID and snoopig on all mcast traffic sounds like a better idea. BTW, Sean, I think this is something we need for the ucma multicast part to go in. I would imagine kernel components could set some kind of flag on mcast join to make them exclusive. API currently does not allow for that. And why the rush? Is the new module used at all yet? Let's see it get some use before switching a basic component over. Finally, the patch in question also seems to introduce more cleanups and such. It would be less controversial if it was just an API change. -- MST From halr at voltaire.com Wed Oct 11 06:29:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Oct 2006 09:29:02 -0400 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmtest.c In-Reply-To: References: Message-ID: <1160573341.32093.51183.camel@hal.voltaire.com> On Wed, 2006-10-11 at 04:08, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes a bunch of ignored errors in osmtest.c > (plus some cosmetics). > In particular, for some reason osmtest.c doesn't treat > IB_INVALID_PARAMETER as an error. > I couldn't find any reasonable explanation to it. Now that you point this out, I don't see why it didn't either. > Perhaps it was usefull while writing osmtest? Don't know. This predates my involvement. > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From ogerlitz at voltaire.com Wed Oct 11 06:41:51 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 11 Oct 2006 15:41:51 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CE6FC.5090303@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> Message-ID: <452CF49F.40903@voltaire.com> Eitan Zahavi wrote: > If the tracking (ref counting) was done at the MAD level - no change to > IPoIB would have been required ... Maybe. You could also implement all the ib stack core in one module... The openib designers have chosen not to do so and rather break it into smaller modules namely ib_core, ib_mad, ib_sa, ib_cm, ib_addr and rdma_cm, with the architecture at hand adding ib_multicast makes sense. Or. From ogerlitz at voltaire.com Wed Oct 11 06:48:43 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 11 Oct 2006 15:48:43 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011125248.GA5181@mellanox.co.il> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> Message-ID: <452CF63B.9030003@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> The ipoib change is a must to allow for user space consumers to >> join/leave groups whose MGID is "of IP multicast origin". >> So one process can have its IP mulitcast traffic carried out by ipoib >> and another process can use librdmacm to join the same multicast group >> and attach a QP to the same MGID. > And why the rush? Is the new module used at all yet? Its not a rush its a move for enabling user space code that can offload IP Multicast. We have a library doing that which is coded over the gen1 stack and is now in porting for the gen2 stack. > Let's see it get some use before switching a basic component over. This porting relies on Sean's work to allow for joining/leaving "IP Multicast" mcast groups from user space. And anyway, when you expose something to user space, you might not "see" the usage as it can be not an open source one. Or. From mst at mellanox.co.il Wed Oct 11 07:04:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 16:04:56 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CF63B.9030003@voltaire.com> References: <452CF63B.9030003@voltaire.com> Message-ID: <20061011140456.GB4888@mellanox.co.il> Quoting r. Or Gerlitz : > > Let's see it get some use before switching a basic component over. > > This porting relies on Sean's work to allow for joining/leaving "IP > Multicast" mcast groups from user space. And anyway, when you expose > something to user space, you might not "see" the usage as it can be not > an open source one. I don't know whether a single closd-source application shows something is useful. Anyway, you don't need code upstream just to do development. Pushing userspace-interfacing code upstream before there are actual users leads to ABI instability and other such issues. -- MST From ogerlitz at voltaire.com Wed Oct 11 07:15:43 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 11 Oct 2006 16:15:43 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011140456.GB4888@mellanox.co.il> References: <452CF63B.9030003@voltaire.com> <20061011140456.GB4888@mellanox.co.il> Message-ID: <452CFC8F.7060905@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> This porting relies on Sean's work to allow for joining/leaving "IP >> Multicast" mcast groups from user space. And anyway, when you expose >> something to user space, you might not "see" the usage as it can be not >> an open source one. > I don't know whether a single closd-source application shows something > is useful. Anyway, you don't need code upstream just to do development. > Pushing userspace-interfacing code upstream before there are actual users > leads to ABI instability and other such issues. The only current rdma cm user space code i am aware to is uDAPL. Intel MPI is a closed source package using uDAPL which drove much (almost 100%) of the uDAPL and uCMA testing and the uCMA is candidate to be submitted for 2.6.20 (ie ~3 months from today). I have seen some feedback provided on the uCMA UD Mcast code by Steve/Tom and as i said we are going to use it immediately so more feedback would be provided over this list. As of the NO synchronization (to remove doubt, i am not in favor for such sycn and in favor of not maintaining the ib kernel code in the svn!) between the openib SVN and Roland's GIT tree, to make the testing of uCMA consumers robust, applying this patch set on the for-2.6.20 branch of the IB git tree is essential. Or. From or.gerlitz at gmail.com Wed Oct 11 07:21:41 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 11 Oct 2006 16:21:41 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061009174705.GG26849@mellanox.co.il> References: <20061009174705.GG26849@mellanox.co.il> Message-ID: <15ddcffd0610110721k12d1fb4ckb6e13b0819179d04@mail.gmail.com> On 10/9/06, Michael S. Tsirkin wrote: > I'm trying to build a network device driver supporting a very large MTU (around 64K) > on top of an infiniband connection, and I've hit a couple of issues I'd > appreciate some feedback on: Does it mean you are implementing IPoIB RC? Cool ... Or. From eitan at mellanox.co.il Wed Oct 11 07:17:48 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 16:17:48 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CF49F.40903@voltaire.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> Message-ID: <452CFD0C.2050709@mellanox.co.il> Hi Or, Maybe I did not explain myself right. The idea is not to implement it in the mad.c code but rather to implement it at the lowest level: The problem with a new API is that a single ULP/applications which does direct umad or QP1 access will break the reference count. Implementing at the lowest level - I.e. by sniffing QP1 packets - would be enforced for all applications/ULPs. Or Gerlitz wrote: > Eitan Zahavi wrote: > >> If the tracking (ref counting) was done at the MAD level - no change to >> IPoIB would have been required ... >> > > Maybe. > > You could also implement all the ib stack core in one module... > The openib designers have chosen not to do so and rather break it into > smaller modules namely ib_core, ib_mad, ib_sa, ib_cm, ib_addr and > rdma_cm, with the architecture at hand adding ib_multicast makes sense. > > Or. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Oct 11 07:29:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 16:29:13 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CFC8F.7060905@voltaire.com> References: <452CFC8F.7060905@voltaire.com> Message-ID: <20061011142913.GE4888@mellanox.co.il> Quoting r. Or Gerlitz : > to make the testing > of uCMA consumers robust, applying this patch set on the for-2.6.20 > branch of the IB git tree is essential. Not really. Just build a git tree and put whatever you want there. -- MST From halr at voltaire.com Wed Oct 11 07:30:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Oct 2006 10:30:42 -0400 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: References: Message-ID: <1160577041.32093.53620.camel@hal.voltaire.com> On Wed, 2006-10-11 at 06:23, Yevgeny Kliteynik wrote: > Hi Hal > > Fixing a few problems in the multicast test flow, > plus some cosmetics. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. See question below... > Index: osmt_multicast.c > =================================================================== > --- osmt_multicast.c (revision 9776) > +++ osmt_multicast.c (working copy) [snip...] > @@ -1808,14 +1865,54 @@ osmt_run_mcast_flow( IN osmtest_t * cons > > /* o15.0.1.6: */ > /* - Create a new MCG with valid requested MGID. */ > + osmt_init_mc_query_rec(p_osmt, &mc_req_rec); > + mc_req_rec.mgid = good_mgid; > > osm_log( &p_osmt->log, OSM_LOG_INFO, > "osmt_run_mcast_flow: " > - "Checking Create given MGID=0x%016" PRIx64 " : " > + "Checking Create given valid MGID=0x%016" PRIx64 " : " > + "0x%016" PRIx64 " (o15.0.1.6)...\n", > + cl_ntoh64(mc_req_rec.mgid.unicast.prefix), > + cl_ntoh64(mc_req_rec.mgid.unicast.interface_id)); > + > + /* Before creation, need to check that this group doesn't exist */ > + osm_log( &p_osmt->log, OSM_LOG_INFO, > + "osmt_run_mcast_flow: " > + "Verifying that MCGroup with this MGID doesn't exist by trying to Join it (o15.0.1.13)...\n"); > + > + ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_NON_MEMBER); > + > + osm_log( &p_osmt->log, OSM_LOG_ERROR, > + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); > + status = osmt_send_mcast_request( p_osmt, 1, /* join */ > + &mc_req_rec, > + comp_mask, > + &res_sa_mad ); > + osm_log( &p_osmt->log, OSM_LOG_ERROR, > + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); > + > + if ((status != IB_REMOTE_ERROR) || > + (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) > + { > + osm_log( &p_osmt->log, OSM_LOG_ERROR, > + "osmt_run_mcast_flow: ERR 0301: " > + "Tried joining group that shouldn't have existed - got %s/%s\n", > + ib_get_err_str( status ), > + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) > + ); > + status = IB_ERROR; > + goto Exit; > + } In the event this works, the SA is potentially left with some bad state because of this. Should the join be removed for this case ? -- Hal From rdreier at cisco.com Wed Oct 11 07:59:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 07:59:55 -0700 Subject: [openib-general] OFED-1.1-rc7 and mthca on ppc64: compile warnings In-Reply-To: <200610111004.56688.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's message of "Wed, 11 Oct 2006 10:04:56 +0200") References: <200610111004.56688.hnguyen@de.ibm.com> Message-ID: > Just saw those warnings when I compiled kernel 2.6.18 on ppc64 using > OFED-1.1-rc7 code with mthca and ehca selected. > CC [M] drivers/infiniband/hw/mthca/mthca_qp.o > drivers/infiniband/hw/mthca/mthca_qp.c: In function `mthca_arbel_post_send': > drivers/infiniband/hw/mthca/mthca_qp.c:1870: warning: `f0' might be used uninitialized in this function > drivers/infiniband/hw/mthca/mthca_qp.c: In function `mthca_tavor_post_send': > drivers/infiniband/hw/mthca/mthca_qp.c:1527: warning: `f0' might be used uninitialized in this function Yes, those are false positives (a minor compiler bug I guess). - R. From mst at mellanox.co.il Wed Oct 11 08:01:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 17:01:03 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011090926.GA15393@fogou.chygwyn.com> References: <20061011090926.GA15393@fogou.chygwyn.com> Message-ID: <20061011150103.GF4888@mellanox.co.il> Quoting Steven Whitehouse : > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > size_t size, int flags) > > { > > ssize_t res; > > struct sock *sk = sock->sk; > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > data will be copied over rather than sent directly. > > So why does dev.c have to force set NETIF_F_SG to off then? > > > I agree with that analysis, So, would you Ack something like the following then? ====================== Enabling NETIF_F_SG without NETIF_F_ALL_CSUM actually seems to work fine by doing an old-fashioned data copy in tcp_sendpage. And for devices that do not calculate IP checksum in hardware (e.g. InfiniBand) calculating the checksum for all packets in network driver is worse than have the CPU piggyback the checksum compitation with the copy process. Finally, note that NETIF_F_SG is necessary to be able to allocate skbs > PAGE_SIZE on busy systems. So, let's allow that combination, again, for drivers that want it. Signed-off-by: Michael S. Tsirkin --- diff --git a/net/core/dev.c b/net/core/dev.c index d4a1ec3..2d731a0 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2930,14 +2930,6 @@ #endif } } - /* Fix illegal SG+CSUM combinations. */ - if ((dev->features & NETIF_F_SG) && - !(dev->features & NETIF_F_ALL_CSUM)) { - printk(KERN_NOTICE "%s: Dropping NETIF_F_SG since no checksum feature.\n", - dev->name); - dev->features &= ~NETIF_F_SG; - } - /* TSO requires that SG is present as well. */ if ((dev->features & NETIF_F_TSO) && !(dev->features & NETIF_F_SG)) { -- MST From greg.lindahl at qlogic.com Wed Oct 11 08:18:43 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Wed, 11 Oct 2006 08:18:43 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <15ddcffd0610110721k12d1fb4ckb6e13b0819179d04@mail.gmail.com> References: <20061009174705.GG26849@mellanox.co.il> <15ddcffd0610110721k12d1fb4ckb6e13b0819179d04@mail.gmail.com> Message-ID: <20061011151843.GC3491@greglaptop> On Wed, Oct 11, 2006 at 04:21:41PM +0200, Or Gerlitz wrote: > On 10/9/06, Michael S. Tsirkin wrote: > > > I'm trying to build a network device driver supporting a very large MTU (around 64K) > > on top of an infiniband connection, and I've hit a couple of issues I'd > > appreciate some feedback on: > > Does it mean you are implementing IPoIB RC? Cool ... The ipath_ether device, which was submitted but rejected, has a 64k MTU using UD. -- greg From ogerlitz at voltaire.com Wed Oct 11 08:37:20 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 11 Oct 2006 17:37:20 +0200 Subject: [openib-general] two OFED uDAPL issues Message-ID: <452D0FB0.1030206@voltaire.com> Arlin, I see now that the uDAPL CMA provider code uses the MTU 1:1 as returned by the SM in the path, so if the env is made of the Mellanox PCI-X HCA there can be big BW drop, etc... we have discussed that. I wonder how are you overcoming this when running Intel MPI w. OFED 1.0? I understand in OFED 1.1 there is this tavor_quirk in both the cma and the opensm, but i am not aware to any such hack in OFED 1.0. Also, i understand that OFED includes the uDAPL **SCM** provider, is it really tested/supported? if yes, i don't think it needs to be. It adds the overhead of one TCP connection per IB connection, creates two codes bases to maintain, makes the CMA less tested, you named it. If its not tested/supported sure we must not provide it. If you agree would you approach the OFED maintainers to remove the SCM provider from the udapl OFED 1.1 RPM? Or. From sweitzen at cisco.com Wed Oct 11 08:37:26 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 11 Oct 2006 08:37:26 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 Message-ID: Vlad, do you have symbol cm_issue_drep in any .ko files, because I don't. Looks like the patch is not getting compiled in for some reason. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Vladimir Sokolovsky [mailto:vlad at mellanox.co.il] > Sent: Wednesday, October 11, 2006 2:00 AM > To: Arlin Davis > Cc: Aviram Gutman; Scott Weitzenkamp (sweitzen); Supalov, > Alexander; Magro, Bill; EWG; Openib-General at Openib.Org > Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1 RC7 > > Hi Arlin, > This patch is in OFED-1.1-rc7 and applied during installation. > > > Regards, > Vladimir > > On Tue, 2006-10-10 at 22:50 -0700, Arlin Davis wrote: > > Aviram Gutman wrote: > > > > >OFED-1.1-rc7 is available on > > >https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > >File: OFED-1.1-rc7.tgz > > >Please report any issues in bugzilla http://openib.org/bugzilla/ > > > > > > > > > > > Aviram, > > > > Can you verify that the sean_cm_drep_on_not_found.patch is actually > > applied in RC7? Our delayed disconnect problems still exist. > > > > I don't see the new symbol "cm_issue_drep" in ib_cm.ko on our RC7 > > installed systems so I don't think the patch applied. > > > > Thanks, > > > > -arlin > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > From sweitzen at cisco.com Wed Oct 11 08:39:11 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 11 Oct 2006 08:39:11 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: You checked SUSE 10 or SLES 10, aren't those different distros? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] > Sent: Wednesday, October 11, 2006 3:09 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > On some of our SUSE 10 machines i found the 127.0.0.2 ip, > but it was pointing to some random Linux site (linux.org) > and has no effect on mpi runs. > In you case the ip point to _real_ machine, it very strange. > > Scott Weitzenkamp (sweitzen) wrote: > > Aha, I found something in /etc/hosts, thanks for the hint. > > > > 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 > > > > If I comment this line out, MVAPICH works fine. Does > Mellanox have this > > entry in /etc/hosts? > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > >> -----Original Message----- > >> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >> Sent: Thursday, October 05, 2006 5:59 AM > >> To: Scott Weitzenkamp (sweitzen) > >> Cc: Aviram Gutman; OpenFabricsEWG; openib > >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >> OFED 1.1 rc6 with SLES10 x86_64 > >> > >>> I see it for all MVAPICH tests, it's 100% consistent. > >> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test > >> over mvapich > >> on SUSE10 platform ? > >> Please check /etc/hosts file on your machines, it should be > >> exactly the > >> same on all nodes. > >> > >> Regards, > >> Pasha > >> > >>> Scott Weitzenkamp > >>> SQA and Release Manager > >>> Server Virtualization Business Unit > >>> Cisco Systems > >>> > >>> > >>>> -----Original Message----- > >>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >>>> Sent: Tuesday, October 03, 2006 3:37 AM > >>>> To: Scott Weitzenkamp (sweitzen) > >>>> Cc: Aviram Gutman; OpenFabricsEWG; openib > >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>> > >>>> Hi Scott, > >>>> Unfortunately was not able to reproduce the failure on our > >> platforms. > >>>> Do you see the problem with all tests or with the specific only ? > >>>> Is it consistent problem ? > >>>> > >>>> Regards, > >>>> Pasha > >>>> > >>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>> $ uname -a > >>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 > >>>> 18:25:39 UTC 2006 > >>>>> x86_64 > >>>>> x86_64 x86_64 GNU/Linux > >>>>> $ > >>>> > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >>>>> 192.168.2.46 192.168.2.49 hostname > >>>>> svbu-qa1850-4 > >>>>> svbu-qa1850-3 > >>>>> $ > >>>> > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >>>>> 192.168.2.46 192.168.2.49 > >>>>> > >>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench > >>> marks-2.2/ > >>>>> osu_latency > >>>>> > >>>>> The last command just hangs. Can I try your binary RPMs? > >>>>> > >>>>> Scott Weitzenkamp > >>>>> SQA and Release Manager > >>>>> Server Virtualization Business Unit > >>>>> Cisco Systems > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > >>>>>> Sent: Sunday, October 01, 2006 2:29 AM > >>>>>> To: Scott Weitzenkamp (sweitzen) > >>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>>>> > >>>>>> Can you please elaborate on MVAPICH issues, can you send > >>>>>> command line? > >>>>>> We ran it here on 32 Opteron nodes each quad core and also > >>>> rigorous > >>>>>> tests on the many other nodes? > >>>>>> > >>>>>> > >>>>>> > >>>>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>>>> We are just getting started with OFED testing on > SLES10, first > >>>>>>> platform is x86_64. > >>>>>>> > >>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > >>>>>> working so far. > >>>>>>> MVAPICH with OSU benchmarks just hang. This same > >>>> hardware works > >>>>>>> fine with OFED and RHEL4 U3. > >>>>>>> > >>>>>>> Has anyone else seen this? > >>>>>>> > >>>>>>> Scott Weitzenkamp > >>>>>>> SQA and Release Manager > >>>>>>> Server Virtualization Business Unit > >>>>>>> Cisco Systems > >>>>>>> > >>>>>>> > >>>>>> -------------------------------------------------------------- > >>>>>> ---------- > >>>>>>> _______________________________________________ > >>>>>>> openfabrics-ewg mailing list > >>>>>>> openfabrics-ewg at openib.org > >>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg > >>>>>>> > >>>> -- > >>>> Pavel Shamis (Pasha) > >>>> Software Engineer > >>>> Mellanox Technologies LTD. > >>>> pasha at mellanox.co.il > >>>> > >> > >> -- > >> Pavel Shamis (Pasha) > >> Software Engineer > >> Mellanox Technologies LTD. > >> pasha at mellanox.co.il > >> > > > > > -- > Pavel Shamis (Pasha) > Software Engineer > Mellanox Technologies LTD. > pasha at mellanox.co.il > From mshefty at ichips.intel.com Wed Oct 11 08:46:20 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 08:46:20 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011102036.GB3706@mellanox.co.il> References: <000001c6ecf2$017874b0$9dd0180a@amr.corp.intel.com> <20061011102036.GB3706@mellanox.co.il> Message-ID: <452D11CC.5050203@ichips.intel.com> Michael S. Tsirkin wrote: > It is somewhat unfortunate that we are duplicating the SA logic > at the endnode in kernel memory here - current sa module has > the advantage in that it just packs all data into a mad and sends it out. > Something to think about. What alternative do you propose? The SA does reference counting on a per port basis, so joins must be handled locally. How can you avoid duplicating that logic? If a user truly only wants to query for an existing group, those requests must still go to the SA. - Sean From pasha at dev.mellanox.co.il Wed Oct 11 08:48:45 2006 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 11 Oct 2006 17:48:45 +0200 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 In-Reply-To: References: Message-ID: <452D125D.2060506@dev.mellanox.co.il> I mean SLES10. (yes it's different distros) Scott Weitzenkamp (sweitzen) wrote: > You checked SUSE 10 or SLES 10, aren't those different distros? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > >> -----Original Message----- >> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] >> Sent: Wednesday, October 11, 2006 3:09 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >> OFED 1.1 rc6 with SLES10 x86_64 >> >> On some of our SUSE 10 machines i found the 127.0.0.2 ip, >> but it was pointing to some random Linux site (linux.org) >> and has no effect on mpi runs. >> In you case the ip point to _real_ machine, it very strange. >> >> Scott Weitzenkamp (sweitzen) wrote: >> >>> Aha, I found something in /etc/hosts, thanks for the hint. >>> >>> 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 >>> >>> If I comment this line out, MVAPICH works fine. Does >>> >> Mellanox have this >> >>> entry in /etc/hosts? >>> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Server Virtualization Business Unit >>> Cisco Systems >>> >>> >>> >>>> -----Original Message----- >>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >>>> Sent: Thursday, October 05, 2006 5:59 AM >>>> To: Scott Weitzenkamp (sweitzen) >>>> Cc: Aviram Gutman; OpenFabricsEWG; openib >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>> OFED 1.1 rc6 with SLES10 x86_64 >>>> >>>> >>>>> I see it for all MVAPICH tests, it's 100% consistent. >>>>> >>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test >>>> over mvapich >>>> on SUSE10 platform ? >>>> Please check /etc/hosts file on your machines, it should be >>>> exactly the >>>> same on all nodes. >>>> >>>> Regards, >>>> Pasha >>>> >>>> >>>>> Scott Weitzenkamp >>>>> SQA and Release Manager >>>>> Server Virtualization Business Unit >>>>> Cisco Systems >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >>>>>> Sent: Tuesday, October 03, 2006 3:37 AM >>>>>> To: Scott Weitzenkamp (sweitzen) >>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>>>> OFED 1.1 rc6 with SLES10 x86_64 >>>>>> >>>>>> Hi Scott, >>>>>> Unfortunately was not able to reproduce the failure on our >>>>>> >>>> platforms. >>>> >>>>>> Do you see the problem with all tests or with the specific only ? >>>>>> Is it consistent problem ? >>>>>> >>>>>> Regards, >>>>>> Pasha >>>>>> >>>>>> Scott Weitzenkamp (sweitzen) wrote: >>>>>> >>>>>>> $ uname -a >>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 >>>>>>> >>>>>> 18:25:39 UTC 2006 >>>>>> >>>>>>> x86_64 >>>>>>> x86_64 x86_64 GNU/Linux >>>>>>> $ >>>>>>> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >> >>>>>>> 192.168.2.46 192.168.2.49 hostname >>>>>>> svbu-qa1850-4 >>>>>>> svbu-qa1850-3 >>>>>>> $ >>>>>>> >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >> >>>>>>> 192.168.2.46 192.168.2.49 >>>>>>> >>>>>>> >>>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench >>>>>> >>>>> marks-2.2/ >>>>> >>>>>>> osu_latency >>>>>>> >>>>>>> The last command just hangs. Can I try your binary RPMs? >>>>>>> >>>>>>> Scott Weitzenkamp >>>>>>> SQA and Release Manager >>>>>>> Server Virtualization Business Unit >>>>>>> Cisco Systems >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] >>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM >>>>>>>> To: Scott Weitzenkamp (sweitzen) >>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il >>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>>>>>> OFED 1.1 rc6 with SLES10 x86_64 >>>>>>>> >>>>>>>> Can you please elaborate on MVAPICH issues, can you send >>>>>>>> command line? >>>>>>>> We ran it here on 32 Opteron nodes each quad core and also >>>>>>>> >>>>>> rigorous >>>>>> >>>>>>>> tests on the many other nodes? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Scott Weitzenkamp (sweitzen) wrote: >>>>>>>> >>>>>>>>> We are just getting started with OFED testing on >>>>>>>>> >> SLES10, first >> >>>>>>>>> platform is x86_64. >>>>>>>>> >>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are >>>>>>>>> >>>>>>>> working so far. >>>>>>>> >>>>>>>>> MVAPICH with OSU benchmarks just hang. This same >>>>>>>>> >>>>>> hardware works >>>>>> >>>>>>>>> fine with OFED and RHEL4 U3. >>>>>>>>> >>>>>>>>> Has anyone else seen this? >>>>>>>>> >>>>>>>>> Scott Weitzenkamp >>>>>>>>> SQA and Release Manager >>>>>>>>> Server Virtualization Business Unit >>>>>>>>> Cisco Systems >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> -------------------------------------------------------------- >>>>>>>> ---------- >>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> openfabrics-ewg mailing list >>>>>>>>> openfabrics-ewg at openib.org >>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg >>>>>>>>> >>>>>>>>> >>>>>> -- >>>>>> Pavel Shamis (Pasha) >>>>>> Software Engineer >>>>>> Mellanox Technologies LTD. >>>>>> pasha at mellanox.co.il >>>>>> >>>>>> >>>> -- >>>> Pavel Shamis (Pasha) >>>> Software Engineer >>>> Mellanox Technologies LTD. >>>> pasha at mellanox.co.il >>>> >>>> >> -- >> Pavel Shamis (Pasha) >> Software Engineer >> Mellanox Technologies LTD. >> pasha at mellanox.co.il >> >> > > -- Pavel Shamis (Pasha) Software Engineer Mellanox Technologies LTD. pasha at mellanox.co.il From mst at mellanox.co.il Wed Oct 11 08:53:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 17:53:46 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D11CC.5050203@ichips.intel.com> References: <452D11CC.5050203@ichips.intel.com> Message-ID: <20061011155346.GH4888@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > Michael S. Tsirkin wrote: > > It is somewhat unfortunate that we are duplicating the SA logic > > at the endnode in kernel memory here - current sa module has > > the advantage in that it just packs all data into a mad and sends it out. > > Something to think about. > > What alternative do you propose? The SA does reference counting on a per port > basis, so joins must be handled locally. How can you avoid duplicating that logic? > > If a user truly only wants to query for an existing group, those requests must > still go to the SA. sorry, I don't have any clever ideas at the moment. -- MST From sweitzen at cisco.com Wed Oct 11 08:53:59 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 11 Oct 2006 08:53:59 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: We've installed four SLES10 machines so far, and they all have the "127.0.0.2 " entry. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] > Sent: Wednesday, October 11, 2006 8:49 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > I mean SLES10. > (yes it's different distros) > > Scott Weitzenkamp (sweitzen) wrote: > > You checked SUSE 10 or SLES 10, aren't those different distros? > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > > > >> -----Original Message----- > >> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] > >> Sent: Wednesday, October 11, 2006 3:09 AM > >> To: Scott Weitzenkamp (sweitzen) > >> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib > >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >> OFED 1.1 rc6 with SLES10 x86_64 > >> > >> On some of our SUSE 10 machines i found the 127.0.0.2 ip, > >> but it was pointing to some random Linux site (linux.org) > >> and has no effect on mpi runs. > >> In you case the ip point to _real_ machine, it very strange. > >> > >> Scott Weitzenkamp (sweitzen) wrote: > >> > >>> Aha, I found something in /etc/hosts, thanks for the hint. > >>> > >>> 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 > >>> > >>> If I comment this line out, MVAPICH works fine. Does > >>> > >> Mellanox have this > >> > >>> entry in /etc/hosts? > >>> > >>> Scott Weitzenkamp > >>> SQA and Release Manager > >>> Server Virtualization Business Unit > >>> Cisco Systems > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >>>> Sent: Thursday, October 05, 2006 5:59 AM > >>>> To: Scott Weitzenkamp (sweitzen) > >>>> Cc: Aviram Gutman; OpenFabricsEWG; openib > >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>> > >>>> > >>>>> I see it for all MVAPICH tests, it's 100% consistent. > >>>>> > >>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test > >>>> over mvapich > >>>> on SUSE10 platform ? > >>>> Please check /etc/hosts file on your machines, it should be > >>>> exactly the > >>>> same on all nodes. > >>>> > >>>> Regards, > >>>> Pasha > >>>> > >>>> > >>>>> Scott Weitzenkamp > >>>>> SQA and Release Manager > >>>>> Server Virtualization Business Unit > >>>>> Cisco Systems > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >>>>>> Sent: Tuesday, October 03, 2006 3:37 AM > >>>>>> To: Scott Weitzenkamp (sweitzen) > >>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib > >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>>>> > >>>>>> Hi Scott, > >>>>>> Unfortunately was not able to reproduce the failure on our > >>>>>> > >>>> platforms. > >>>> > >>>>>> Do you see the problem with all tests or with the > specific only ? > >>>>>> Is it consistent problem ? > >>>>>> > >>>>>> Regards, > >>>>>> Pasha > >>>>>> > >>>>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>>> > >>>>>>> $ uname -a > >>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 > >>>>>>> > >>>>>> 18:25:39 UTC 2006 > >>>>>> > >>>>>>> x86_64 > >>>>>>> x86_64 x86_64 GNU/Linux > >>>>>>> $ > >>>>>>> > >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >> > >>>>>>> 192.168.2.46 192.168.2.49 hostname > >>>>>>> svbu-qa1850-4 > >>>>>>> svbu-qa1850-3 > >>>>>>> $ > >>>>>>> > >> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >> > >>>>>>> 192.168.2.46 192.168.2.49 > >>>>>>> > >>>>>>> > >>>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench > >>>>>> > >>>>> marks-2.2/ > >>>>> > >>>>>>> osu_latency > >>>>>>> > >>>>>>> The last command just hangs. Can I try your binary RPMs? > >>>>>>> > >>>>>>> Scott Weitzenkamp > >>>>>>> SQA and Release Manager > >>>>>>> Server Virtualization Business Unit > >>>>>>> Cisco Systems > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > >>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM > >>>>>>>> To: Scott Weitzenkamp (sweitzen) > >>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > >>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>>>>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>>>>>> > >>>>>>>> Can you please elaborate on MVAPICH issues, can you send > >>>>>>>> command line? > >>>>>>>> We ran it here on 32 Opteron nodes each quad core and also > >>>>>>>> > >>>>>> rigorous > >>>>>> > >>>>>>>> tests on the many other nodes? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>>>>> > >>>>>>>>> We are just getting started with OFED testing on > >>>>>>>>> > >> SLES10, first > >> > >>>>>>>>> platform is x86_64. > >>>>>>>>> > >>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > >>>>>>>>> > >>>>>>>> working so far. > >>>>>>>> > >>>>>>>>> MVAPICH with OSU benchmarks just hang. This same > >>>>>>>>> > >>>>>> hardware works > >>>>>> > >>>>>>>>> fine with OFED and RHEL4 U3. > >>>>>>>>> > >>>>>>>>> Has anyone else seen this? > >>>>>>>>> > >>>>>>>>> Scott Weitzenkamp > >>>>>>>>> SQA and Release Manager > >>>>>>>>> Server Virtualization Business Unit > >>>>>>>>> Cisco Systems > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > -------------------------------------------------------------- > >>>>>>>> ---------- > >>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> openfabrics-ewg mailing list > >>>>>>>>> openfabrics-ewg at openib.org > >>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg > >>>>>>>>> > >>>>>>>>> > >>>>>> -- > >>>>>> Pavel Shamis (Pasha) > >>>>>> Software Engineer > >>>>>> Mellanox Technologies LTD. > >>>>>> pasha at mellanox.co.il > >>>>>> > >>>>>> > >>>> -- > >>>> Pavel Shamis (Pasha) > >>>> Software Engineer > >>>> Mellanox Technologies LTD. > >>>> pasha at mellanox.co.il > >>>> > >>>> > >> -- > >> Pavel Shamis (Pasha) > >> Software Engineer > >> Mellanox Technologies LTD. > >> pasha at mellanox.co.il > >> > >> > > > > > > > -- > Pavel Shamis (Pasha) > Software Engineer > Mellanox Technologies LTD. > pasha at mellanox.co.il > From mshefty at ichips.intel.com Wed Oct 11 08:57:10 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 08:57:10 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CFD0C.2050709@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> Message-ID: <452D1456.1050009@ichips.intel.com> Eitan Zahavi wrote: > Maybe I did not explain myself right. > The idea is not to implement it in the mad.c code but rather to > implement it at the lowest level: > The problem with a new API is that a single ULP/applications which does > direct umad or QP1 access will break the reference count. > > Implementing at the lowest level - I.e. by sniffing QP1 packets - would > be enforced for all applications/ULPs. Implementing this in the MAD layer is not the right solution. Something, somewhere must send MADs and control access to QP0/1. Something, somewhere must send MADs to the SA, and must be layered over the previous something. Something, somewhere needs to perform reference counting on multicast groups and request join/leave requests to the SA, and must be layered over the previous something. Pushing all of this functionality down into a single module does not remove the need for the different layers of functionality. It just produces an overly complex module. - Sean From pasha at dev.mellanox.co.il Wed Oct 11 09:02:46 2006 From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha)) Date: Wed, 11 Oct 2006 18:02:46 +0200 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 In-Reply-To: References: Message-ID: <452D15A6.4070000@dev.mellanox.co.il> Here is some link about SuSE's bugs related to 127.0.0.2 https://bugzilla.novell.com/show_bug.cgi?id=165269 Check your SuEe auto-install stuff. It is possible that you have some broken configuration in it. Scott Weitzenkamp (sweitzen) wrote: > We've installed four SLES10 machines so far, and they all have the > "127.0.0.2 " entry. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > >> -----Original Message----- >> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] >> Sent: Wednesday, October 11, 2006 8:49 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >> OFED 1.1 rc6 with SLES10 x86_64 >> >> I mean SLES10. >> (yes it's different distros) >> >> Scott Weitzenkamp (sweitzen) wrote: >>> You checked SUSE 10 or SLES 10, aren't those different distros? >>> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Server Virtualization Business Unit >>> Cisco Systems >>> >>> >>> >>>> -----Original Message----- >>>> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] >>>> Sent: Wednesday, October 11, 2006 3:09 AM >>>> To: Scott Weitzenkamp (sweitzen) >>>> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>> OFED 1.1 rc6 with SLES10 x86_64 >>>> >>>> On some of our SUSE 10 machines i found the 127.0.0.2 ip, >>>> but it was pointing to some random Linux site (linux.org) >>>> and has no effect on mpi runs. >>>> In you case the ip point to _real_ machine, it very strange. >>>> >>>> Scott Weitzenkamp (sweitzen) wrote: >>>> >>>>> Aha, I found something in /etc/hosts, thanks for the hint. >>>>> >>>>> 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 >>>>> >>>>> If I comment this line out, MVAPICH works fine. Does >>>>> >>>> Mellanox have this >>>> >>>>> entry in /etc/hosts? >>>>> >>>>> Scott Weitzenkamp >>>>> SQA and Release Manager >>>>> Server Virtualization Business Unit >>>>> Cisco Systems >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >>>>>> Sent: Thursday, October 05, 2006 5:59 AM >>>>>> To: Scott Weitzenkamp (sweitzen) >>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>>>> OFED 1.1 rc6 with SLES10 x86_64 >>>>>> >>>>>> >>>>>>> I see it for all MVAPICH tests, it's 100% consistent. >>>>>>> >>>>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test >>>>>> over mvapich >>>>>> on SUSE10 platform ? >>>>>> Please check /etc/hosts file on your machines, it should be >>>>>> exactly the >>>>>> same on all nodes. >>>>>> >>>>>> Regards, >>>>>> Pasha >>>>>> >>>>>> >>>>>>> Scott Weitzenkamp >>>>>>> SQA and Release Manager >>>>>>> Server Virtualization Business Unit >>>>>>> Cisco Systems >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] >>>>>>>> Sent: Tuesday, October 03, 2006 3:37 AM >>>>>>>> To: Scott Weitzenkamp (sweitzen) >>>>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib >>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>>>>>> OFED 1.1 rc6 with SLES10 x86_64 >>>>>>>> >>>>>>>> Hi Scott, >>>>>>>> Unfortunately was not able to reproduce the failure on our >>>>>>>> >>>>>> platforms. >>>>>> >>>>>>>> Do you see the problem with all tests or with the >> specific only ? >>>>>>>> Is it consistent problem ? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Pasha >>>>>>>> >>>>>>>> Scott Weitzenkamp (sweitzen) wrote: >>>>>>>> >>>>>>>>> $ uname -a >>>>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 >>>>>>>>> >>>>>>>> 18:25:39 UTC 2006 >>>>>>>> >>>>>>>>> x86_64 >>>>>>>>> x86_64 x86_64 GNU/Linux >>>>>>>>> $ >>>>>>>>> >>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >>>> >>>>>>>>> 192.168.2.46 192.168.2.49 hostname >>>>>>>>> svbu-qa1850-4 >>>>>>>>> svbu-qa1850-3 >>>>>>>>> $ >>>>>>>>> >>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 >>>> >>>>>>>>> 192.168.2.46 192.168.2.49 >>>>>>>>> >>>>>>>>> >>>>>>>> /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench >>>>>>>> >>>>>>> marks-2.2/ >>>>>>> >>>>>>>>> osu_latency >>>>>>>>> >>>>>>>>> The last command just hangs. Can I try your binary RPMs? >>>>>>>>> >>>>>>>>> Scott Weitzenkamp >>>>>>>>> SQA and Release Manager >>>>>>>>> Server Virtualization Business Unit >>>>>>>>> Cisco Systems >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] >>>>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM >>>>>>>>>> To: Scott Weitzenkamp (sweitzen) >>>>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il >>>>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on >>>>>>>>>> OFED 1.1 rc6 with SLES10 x86_64 >>>>>>>>>> >>>>>>>>>> Can you please elaborate on MVAPICH issues, can you send >>>>>>>>>> command line? >>>>>>>>>> We ran it here on 32 Opteron nodes each quad core and also >>>>>>>>>> >>>>>>>> rigorous >>>>>>>> >>>>>>>>>> tests on the many other nodes? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Scott Weitzenkamp (sweitzen) wrote: >>>>>>>>>> >>>>>>>>>>> We are just getting started with OFED testing on >>>>>>>>>>> >>>> SLES10, first >>>> >>>>>>>>>>> platform is x86_64. >>>>>>>>>>> >>>>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are >>>>>>>>>>> >>>>>>>>>> working so far. >>>>>>>>>> >>>>>>>>>>> MVAPICH with OSU benchmarks just hang. This same >>>>>>>>>>> >>>>>>>> hardware works >>>>>>>> >>>>>>>>>>> fine with OFED and RHEL4 U3. >>>>>>>>>>> >>>>>>>>>>> Has anyone else seen this? >>>>>>>>>>> >>>>>>>>>>> Scott Weitzenkamp >>>>>>>>>>> SQA and Release Manager >>>>>>>>>>> Server Virtualization Business Unit >>>>>>>>>>> Cisco Systems >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> -------------------------------------------------------------- >>>>>>>>>> ---------- >>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> openfabrics-ewg mailing list >>>>>>>>>>> openfabrics-ewg at openib.org >>>>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> -- >>>>>>>> Pavel Shamis (Pasha) >>>>>>>> Software Engineer >>>>>>>> Mellanox Technologies LTD. >>>>>>>> pasha at mellanox.co.il >>>>>>>> >>>>>>>> >>>>>> -- >>>>>> Pavel Shamis (Pasha) >>>>>> Software Engineer >>>>>> Mellanox Technologies LTD. >>>>>> pasha at mellanox.co.il >>>>>> >>>>>> >>>> -- >>>> Pavel Shamis (Pasha) >>>> Software Engineer >>>> Mellanox Technologies LTD. >>>> pasha at mellanox.co.il >>>> >>>> >>> >> >> -- >> Pavel Shamis (Pasha) >> Software Engineer >> Mellanox Technologies LTD. >> pasha at mellanox.co.il >> > -- Pavel Shamis (Pasha) Software Engineer Mellanox Technologies LTD. pasha at mellanox.co.il From mshefty at ichips.intel.com Wed Oct 11 09:08:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 09:08:15 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011125248.GA5181@mellanox.co.il> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> Message-ID: <452D16EF.9040200@ichips.intel.com> Michael S. Tsirkin wrote: > Why is this even a good idea? > If you are looking for reasons using mutlicast module in ipoib is good, I would > say blocking unpriviledged userspace from joining IPoIB GID and snoopig on all > mcast traffic sounds like a better idea. BTW, Sean, I think this is something > we need for the ucma multicast part to go in. I would imagine kernel components > could set some kind of flag on mcast join to make them exclusive. > API currently does not allow for that. http://openib.org/pipermail/openib-general/2006-March/019147.html Why is it a bad idea? The architecture allows this. However, none of the proposed patches allows a userspace app to join an ipoib multicast group. And an application that talks directly to the SA via MADs puts us no worse off than before. > And why the rush? Is the new module used at all yet? > Let's see it get some use before switching a basic component over. This module was added to svn in April. The request was to begin the process for queuing it to 2.6.20, which is likely 2-3 more months out. I hardly call that a rush. > Finally, the patch in question also seems to introduce more cleanups and > such. It would be less controversial if it was just an API change. The cleanups are part of the change. Once ib_join_multicast() has been invoked, ib_free_multicast() must be called exactly once. Proper state tracking for this is required. - Sean From mst at mellanox.co.il Wed Oct 11 09:17:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 18:17:18 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D16EF.9040200@ichips.intel.com> References: <452D16EF.9040200@ichips.intel.com> Message-ID: <20061011161718.GC13872@mellanox.co.il> Quoting r. Sean Hefty : > > Finally, the patch in question also seems to introduce more cleanups and > > such. It would be less controversial if it was just an API change. > > The cleanups are part of the change. Once ib_join_multicast() has been invoked, > ib_free_multicast() must be called exactly once. Proper state tracking for this > is required. We already do state tracking to join/leave groups properly. It would be easier to review this change if you left killing off mcast->done and other such improvements for a separate patch. -- MST From vlad at mellanox.co.il Wed Oct 11 09:12:50 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 11 Oct 2006 18:12:50 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 In-Reply-To: References: Message-ID: <1160583170.7035.7.camel@vladsk-laptop> Hi Scott, You can check OFED compilation log file to see if this patch was applied and compiled. To get the relevant log file: ls -ltr /tmp/OFED*log | tail -2 One of them will be the compilation log. Search for sean_cm_drep_on_not_found.patch inside... Regards, Vladimir On Wed, 2006-10-11 at 08:37 -0700, Scott Weitzenkamp (sweitzen) wrote: > Vlad, do you have symbol cm_issue_drep in any .ko files, because I > don't. Looks like the patch is not getting compiled in for some reason. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: Vladimir Sokolovsky [mailto:vlad at mellanox.co.il] > > Sent: Wednesday, October 11, 2006 2:00 AM > > To: Arlin Davis > > Cc: Aviram Gutman; Scott Weitzenkamp (sweitzen); Supalov, > > Alexander; Magro, Bill; EWG; Openib-General at Openib.Org > > Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1 RC7 > > > > Hi Arlin, > > This patch is in OFED-1.1-rc7 and applied during installation. > > > > > > Regards, > > Vladimir > > > > On Tue, 2006-10-10 at 22:50 -0700, Arlin Davis wrote: > > > Aviram Gutman wrote: > > > > > > >OFED-1.1-rc7 is available on > > > >https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > > >File: OFED-1.1-rc7.tgz > > > >Please report any issues in bugzilla http://openib.org/bugzilla/ > > > > > > > > > > > > > > > Aviram, > > > > > > Can you verify that the sean_cm_drep_on_not_found.patch is actually > > > applied in RC7? Our delayed disconnect problems still exist. > > > > > > I don't see the new symbol "cm_issue_drep" in ib_cm.ko on our RC7 > > > installed systems so I don't think the patch applied. > > > > > > Thanks, > > > > > > -arlin > > > > > > _______________________________________________ > > > openfabrics-ewg mailing list > > > openfabrics-ewg at openib.org > > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg From mshefty at ichips.intel.com Wed Oct 11 09:20:10 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 09:20:10 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011120949.GA4927@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061011120949.GA4927@mellanox.co.il> Message-ID: <452D19BA.2000600@ichips.intel.com> Michael S. Tsirkin wrote: >>+struct ib_multicast { >>+ struct ib_sa_mcmember_rec rec; >>+ ib_sa_comp_mask comp_mask; >>+ int (*callback)(int status, >>+ struct ib_multicast *multicast); >>+ void *context; >>+}; > > > Why is ib_sa_mcmember_rec exposed? Because the user needs to get the actual mcmember record somewhere. > So need to either specify which values are valid, > or use a different structure here. It's the mcmember record. All of the values are valid. > Is callback invoked only once? If no should document. The callback may be invoked multiple times. >>+struct ib_multicast *ib_join_multicast(struct ib_device *device, u8 port_num, >>+ struct ib_sa_mcmember_rec *rec, >>+ ib_sa_comp_mask comp_mask, gfp_t gfp_mask, >>+ int (*callback)(int status, >>+ struct ib_multicast >>+ *multicast), > > What are the values for status? It depends on the failure. Typically it matches the value returned by ib_sa. This can be documented. > I see that it's not just 0 or non-zero - IPoIB needs to treat some > of them magically for some reason. I'm not sure what you mean by magically. It traps for ENETRESET, with a comment that it traps for port events itself, and ETIMEOUT. > We used to have a timeout value - I think this is something ULP might want > control over. Of course if we are already in the group you can return > immediately. Multiple join requests may be funneled into a single MAD. A timeout value makes no sense with multiple users. - Sean From ogerlitz at voltaire.com Wed Oct 11 09:22:45 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 11 Oct 2006 18:22:45 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D16EF.9040200@ichips.intel.com> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452D16EF.9040200@ichips.intel.com> Message-ID: <452D1A55.7050703@voltaire.com> Sean Hefty wrote: > Why is it a bad idea? The architecture allows this. However, none of > the proposed patches allows a userspace app to join an ipoib multicast > group. Just to make sure, by "ipoib multicast group" you mean an MGID which is derives from an IP Multicast address and the routing info on the device this address/subnets is routed to (ie the ipoib device pkey which you can get from the broadcast MAC of this device)? How and Why does the proposed code enforces that? Or. From eitan at mellanox.co.il Wed Oct 11 09:20:31 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 18:20:31 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D1456.1050009@ichips.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> Message-ID: <452D19CF.30006@mellanox.co.il> The point is not the fact you need to layer. But you can not enforce ref counting by adding a top layer everybody can bypass. It simply breaks on the first client that goes directly to the lower level. Do you have a solution for this problem? The layer you need to add is BELOW the current interface not ABOVE it. EZ Sean Hefty wrote: > Eitan Zahavi wrote: > >> Maybe I did not explain myself right. >> The idea is not to implement it in the mad.c code but rather to >> implement it at the lowest level: >> The problem with a new API is that a single ULP/applications which does >> direct umad or QP1 access will break the reference count. >> >> Implementing at the lowest level - I.e. by sniffing QP1 packets - would >> be enforced for all applications/ULPs. >> > > Implementing this in the MAD layer is not the right solution. Something, > somewhere must send MADs and control access to QP0/1. Something, somewhere must > send MADs to the SA, and must be layered over the previous something. > Something, somewhere needs to perform reference counting on multicast groups and > request join/leave requests to the SA, and must be layered over the previous > something. > > Pushing all of this functionality down into a single module does not remove the > need for the different layers of functionality. It just produces an overly > complex module. > > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Wed Oct 11 09:24:26 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 18:24:26 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D16EF.9040200@ichips.intel.com> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452D16EF.9040200@ichips.intel.com> Message-ID: <452D1ABA.8050706@mellanox.co.il> Sean Hefty wrote: > Michael S. Tsirkin wrote: > >> Why is this even a good idea? >> If you are looking for reasons using mutlicast module in ipoib is good, I would >> say blocking unpriviledged userspace from joining IPoIB GID and snoopig on all >> mcast traffic sounds like a better idea. BTW, Sean, I think this is something >> we need for the ucma multicast part to go in. I would imagine kernel components >> could set some kind of flag on mcast join to make them exclusive. >> API currently does not allow for that. >> > > http://openib.org/pipermail/openib-general/2006-March/019147.html > > Why is it a bad idea? The architecture allows this. However, none of the > proposed patches allows a userspace app to join an ipoib multicast group. And > an application that talks directly to the SA via MADs puts us no worse off than > before. > > This is incorrect. The SA can track requests from multiple different QPs. It could actually also enforce using QP1 for the requests. But it can not differentiate requests from same port different applications comming on QP1... So if you did catch all QP1 traffic and refcount at that level you will be 100% clean. >> And why the rush? Is the new module used at all yet? >> Let's see it get some use before switching a basic component over. >> > > This module was added to svn in April. The request was to begin the process for > queuing it to 2.6.20, which is likely 2-3 more months out. I hardly call that a > rush. > > >> Finally, the patch in question also seems to introduce more cleanups and >> such. It would be less controversial if it was just an API change. >> > > The cleanups are part of the change. Once ib_join_multicast() has been invoked, > ib_free_multicast() must be called exactly once. Proper state tracking for this > is required. > > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Wed Oct 11 09:32:31 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 09:32:31 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011161718.GC13872@mellanox.co.il> References: <452D16EF.9040200@ichips.intel.com> <20061011161718.GC13872@mellanox.co.il> Message-ID: <452D1C9F.8050106@ichips.intel.com> Michael S. Tsirkin wrote: > We already do state tracking to join/leave groups properly. > It would be easier to review this change if you left killing off > mcast->done and other such improvements for a separate patch. The multicast module does reference counting and completion handling for us. Migrating ipoib to use ib_multicast, but keeping unnecessary duplicated functionality around in ipoib doesn't make sense. Also, the callback can be invoked multiple times, so a 'done' flag isn't useful to track if the callback has been invoked. - Sean From mshefty at ichips.intel.com Wed Oct 11 09:35:14 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 09:35:14 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D1A55.7050703@voltaire.com> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452D16EF.9040200@ichips.intel.com> <452D1A55.7050703@voltaire.com> Message-ID: <452D1D42.9050908@ichips.intel.com> Or Gerlitz wrote: > Just to make sure, by "ipoib multicast group" you mean an MGID which is > derives from an IP Multicast address and the routing info on the device > this address/subnets is routed to (ie the ipoib device pkey which you > can get from the broadcast MAC of this device)? I mean a group that ipoib has joined. > How and Why does the proposed code enforces that? The ib_multicast module does not enforce this. The rdma_cm joins different groups, but still derives an MGID from an IP multicast address. A new kernel module could join the same groups as ipoib. - Sean From halr at voltaire.com Wed Oct 11 09:39:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 Oct 2006 12:39:56 -0400 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D1ABA.8050706@mellanox.co.il> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452D16EF.9040200@ichips.intel.com> <452D1ABA.8050706@mellanox.co.il> Message-ID: <1160584795.32093.58721.camel@hal.voltaire.com> On Wed, 2006-10-11 at 12:24, Eitan Zahavi wrote: > Sean Hefty wrote: > > Michael S. Tsirkin wrote: > > > >> Why is this even a good idea? > >> If you are looking for reasons using mutlicast module in ipoib is good, I would > >> say blocking unpriviledged userspace from joining IPoIB GID and snoopig on all > >> mcast traffic sounds like a better idea. BTW, Sean, I think this is something > >> we need for the ucma multicast part to go in. I would imagine kernel components > >> could set some kind of flag on mcast join to make them exclusive. > >> API currently does not allow for that. > >> > > > > http://openib.org/pipermail/openib-general/2006-March/019147.html > > > > Why is it a bad idea? The architecture allows this. However, none of the > > proposed patches allows a userspace app to join an ipoib multicast group. And > > an application that talks directly to the SA via MADs puts us no worse off than > > before. > > > > > This is incorrect. The SA can track requests from multiple different > QPs. It could actually also enforce using QP1 for the requests. But it > can not differentiate requests from same port different applications > comming on QP1... > So if you did catch all QP1 traffic and refcount at that level you will > be 100% clean. What about redirection ? -- Hal > > > >> And why the rush? Is the new module used at all yet? > >> Let's see it get some use before switching a basic component over. > >> > > > > This module was added to svn in April. The request was to begin the process for > > queuing it to 2.6.20, which is likely 2-3 more months out. I hardly call that a > > rush. > > > > > >> Finally, the patch in question also seems to introduce more cleanups and > >> such. It would be less controversial if it was just an API change. > >> > > > > The cleanups are part of the change. Once ib_join_multicast() has been invoked, > > ib_free_multicast() must be called exactly once. Proper state tracking for this > > is required. > > > > - Sean > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Wed Oct 11 09:47:01 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 09:47:01 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D19CF.30006@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> Message-ID: <452D2005.60905@ichips.intel.com> Eitan Zahavi wrote: > The point is not the fact you need to layer. But you can not enforce ref > counting by adding a top layer everybody can bypass. > It simply breaks on the first client that goes directly to the lower level. > Do you have a solution for this problem? > The layer you need to add is BELOW the current interface not ABOVE it. I disagree that there's an issue here. Should we push the snooping down into ib_post_send()? Into each driver's post_send()? We're trusting kernel clients to call the right interfaces. If we want to start worrying about kernel clients trying to bypass layers, there's nothing that prevents some client from allocating QP0 or 1 for itself (provided that it gets loaded first), snooping MADs and modifying their contents, redirecting requests, registering with the MAD layer to receive MADs of all types, sending MADs out other QPs, etc. We should focus on coming up with the right architecture. And the functionality that needs to reference count when to send join/leave requests to the SA is above whatever functionality is needed to actually send the MAD. - Sean From eitan at mellanox.co.il Wed Oct 11 09:43:43 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 18:43:43 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <1160584795.32093.58721.camel@hal.voltaire.com> References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452D16EF.9040200@ichips.intel.com> <452D1ABA.8050706@mellanox.co.il> <1160584795.32093.58721.camel@hal.voltaire.com> Message-ID: <452D1F3F.9010904@mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2006-10-11 at 12:24, Eitan Zahavi wrote: > >> Sean Hefty wrote: >> >>> Michael S. Tsirkin wrote: >>> >>> >>>> Why is this even a good idea? >>>> If you are looking for reasons using mutlicast module in ipoib is good, I would >>>> say blocking unpriviledged userspace from joining IPoIB GID and snoopig on all >>>> mcast traffic sounds like a better idea. BTW, Sean, I think this is something >>>> we need for the ucma multicast part to go in. I would imagine kernel components >>>> could set some kind of flag on mcast join to make them exclusive. >>>> API currently does not allow for that. >>>> >>>> >>> http://openib.org/pipermail/openib-general/2006-March/019147.html >>> >>> Why is it a bad idea? The architecture allows this. However, none of the >>> proposed patches allows a userspace app to join an ipoib multicast group. And >>> an application that talks directly to the SA via MADs puts us no worse off than >>> before. >>> >>> >>> >> This is incorrect. The SA can track requests from multiple different >> QPs. It could actually also enforce using QP1 for the requests. But it >> can not differentiate requests from same port different applications >> comming on QP1... >> So if you did catch all QP1 traffic and refcount at that level you will >> be 100% clean. >> > > What about redirection ? > > The SA can redirect itself to any QP. But it can enforce Join/Leave to always use QP1. I do not see how redirection apply here. Also we could follow the InformInfo scheme were if a QP has registered the SA will track it and not Leave the group unless the last QP registered from a given port has left. So the hole we have today can be closed IFF we are able to catch and ref count all QP1 Join/Leave. EZ > -- Hal > > >> >>>> And why the rush? Is the new module used at all yet? >>>> Let's see it get some use before switching a basic component over. >>>> >>>> >>> This module was added to svn in April. The request was to begin the process for >>> queuing it to 2.6.20, which is likely 2-3 more months out. I hardly call that a >>> rush. >>> >>> >>> >>>> Finally, the patch in question also seems to introduce more cleanups and >>>> such. It would be less controversial if it was just an API change. >>>> >>>> >>> The cleanups are part of the change. Once ib_join_multicast() has been invoked, >>> ib_free_multicast() must be called exactly once. Proper state tracking for this >>> is required. >>> >>> - Sean >>> >>> _______________________________________________ >>> openib-general mailing list >>> openib-general at openib.org >>> http://openib.org/mailman/listinfo/openib-general >>> >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>> >>> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From caitlinb at broadcom.com Wed Oct 11 10:01:01 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 11 Oct 2006 10:01:01 -0700 Subject: [openib-general] [RFC] [PATCH 6/7] rdma_cm 2.6.20: add support for RDMA_PS_UDP In-Reply-To: <000701c6ecc4$2200a2d0$c0d4180a@amr.corp.intel.com> References: <000101c6ecbd$ace60e50$c0d4180a@amr.corp.intel.com> <000701c6ecc4$2200a2d0$c0d4180a@amr.corp.intel.com> Message-ID: <469958e00610111001m4dca996bl6d48e53b4d35569c@mail.gmail.com> On 10/10/06, Sean Hefty wrote: > Add missing support for RDMA_PS_UDP. This allows the use of UD QPs > through the rdma_cm, which provides address translation services > over IB, even if not all RDMA transports support UD. > To the best of my knowledge every single iWARP RNIC fully supports unreliable datagrams using an existing API (SOCK_DGRAM). From bugzilla-daemon at openib.org Wed Oct 11 10:10:09 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 11 Oct 2006 10:10:09 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061011171009.8C6AA2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rolandd at cisco.com OS/Version|SLES 10 |All ------- Comment #1 from sweitzen at cisco.com 2006-10-11 10:10 ------- I tried OFED 1.1 rc7 on RHEL4 U3 x86_64, using two hosts each with dual port HCAs. I am looping a script that turns off and back on IB ports on a Cisco IB switchsuch that there will be IPoIB failover every 20 seconds on one of the hosts. I ran ping and netserver on host 1, and netperf on host2. After a few hours, host 1 gets an Oops ib1: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet general protection fault: 0000 [1] SMP CPU 1 Modules linked in: ib_sdp(U) rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_ipoib<7>Losing some ticks... checking if CPU frequency changed. (U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib _core(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_a cl sunrpc ds yenta_socket pcmcia_core dm_mirror dm_multipath dm_mod button batte ry ac uhci_hcd ehci_hcd hw_random shpchp e1000 floppy sg ext3 jbd aic79xx sd_mod scsi_mod Pid: 7155, comm: ib_mad1 Not tainted 2.6.9-34.ELsmp RIP: 0010:[] {_spin_lock_irqsave+12}<4>warni ng: many lost ticks. Your time source seems to be instable or some driver is hogging interupts rip mwait_idle+0x56/0x7c RSP: 0018:00000101bccd1c58 EFLAGS: 00010086 RAX: 00000101bccd1cb8 RBX: 0000ffff1b60167f RCX: ffffffffa00e547d RDX: dead4ead00000001 RSI: 0000000000000000 RDI: 0000ffff1b60167f RBP: 00000101b9c0f480 R08: 0000000000000003 R09: 00000101b9c0f4a0 R10: ffffffff8040a900 R11: ffffffff8040a900 R12: 0000ffff1b60167f R13: 00000000fffffffc R14: 0000000000000000 R15: 0000ffff1b6012ff FS: 0000000000000000(0000) GS:ffffffff804d7b80(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003e2678f4b0 CR3: 00000000bff28000 CR4: 00000000000006e0 Process ib_mad1 (pid: 7155, threadinfo 00000101bccd0000, task 00000101b94cd030) Stack: 0000000000000000 0000000000000286 0000000000000000 ffffffffa011195b 00000101beca7000 0000000000000002 00000101beca7380 0000000000000246 0000000000000246 ffffffff802ab017 Call Trace:{:ib_ipoib:path_rec_completion+450} {dev_queue_xmit+525} {:ib_sa:ib_sa_pa th_rec_callback+64} {:ib_sa:send_handler+74} {:ib_mad:ib_ mad_complete_send_wr+418} {:ib_mad:ib_mad_completion_handler+979} {:ib_mad:ib_mad_completion_handler+0} {worker_thread+419} {default_wake_fun ction+0} {default_wake_function+0} {keventd_cr eate_kthread+0} {worker_thread+0} {keventd_create_kth read+0} {kthread+200} {child_rip+8} {keventd_create_kthread+0} {kthread+0 } {child_rip+0} Code: 81 7f 04 ad 4e ad de 74 1f 48 8b 74 24 18 48 c7 c7 ed f2 31 RIP {_spin_lock_irqsave+12} RSP <00000101bccd1c58> <0>Kernel panic - not syncing: Oops ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eitan at mellanox.co.il Wed Oct 11 10:12:23 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 19:12:23 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D2005.60905@ichips.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> Message-ID: <452D25F7.1010902@mellanox.co.il> User level code can be run by root. It can access QP1 and bypass your nice API. Also you ignore current kernel implementations that exist and already perform QP1 access via the SA client code in the kernel. Sean Hefty wrote: > Eitan Zahavi wrote: > >> The point is not the fact you need to layer. But you can not enforce ref >> counting by adding a top layer everybody can bypass. >> It simply breaks on the first client that goes directly to the lower level. >> Do you have a solution for this problem? >> The layer you need to add is BELOW the current interface not ABOVE it. >> > > I disagree that there's an issue here. Should we push the snooping down into > ib_post_send()? Into each driver's post_send()? > No you do not need to sniff post_send. Just QP1 calls. > We're trusting kernel clients to call the right interfaces. If we want to start > worrying about kernel clients trying to bypass layers, there's nothing that > prevents some client from allocating QP0 or 1 for itself (provided that it gets > loaded first), snooping MADs and modifying their contents, redirecting requests, > registering with the MAD layer to receive MADs of all types, sending MADs out > other QPs, etc. > No you are forcing every kernel client out there to do another unneeded migration to the new API and still not protect it correctly against multicast disconnects. > We should focus on coming up with the right architecture. And the functionality > that needs to reference count when to send join/leave requests to the SA is > above whatever functionality is needed to actually send the MAD. > > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Wed Oct 11 10:47:57 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 10:47:57 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D25F7.1010902@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> Message-ID: <452D2E4D.8000902@ichips.intel.com> Eitan Zahavi wrote: > User level code can be run by root. It can access QP1 and bypass your nice > API. Also you ignore current kernel implementations that exist and already > perform QP1 access via the SA client code in the kernel. Yep - and that same app can, today, delete the multicast groups used by ipoib. As long as a MAD can actually be sent out of QP1, we have this problem. The existing APIs were not ignored. The ib_multicast module makes direct calls to ib_sa. > No you are forcing every kernel client out there to do another unneeded > migration to the new API and still not protect it correctly against multicast > disconnects. There's exactly 1 kernel client that uses multicast groups, and a patch was provided to migrate that client. - Sean From bugzilla-daemon at openib.org Wed Oct 11 11:24:00 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 11 Oct 2006 11:24:00 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061011182400.9B99B2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 mst at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Version|1.1rc6 |1.1rc7 ------- Comment #2 from mst at mellanox.co.il 2006-10-11 11:23 ------- Roland, can you look into this please? I dont think we want to take the risk to change ipoib at this point, but still relevant for upstream. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From krause at cup.hp.com Wed Oct 11 11:21:43 2006 From: krause at cup.hp.com (Michael Krause) Date: Wed, 11 Oct 2006 11:21:43 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011094649.GD2701@mellanox.co.il> References: <20061011.022015.63051509.davem@davemloft.net> <20061011094649.GD2701@mellanox.co.il> Message-ID: <6.2.0.14.2.20061011111146.03292b50@esmail.cup.hp.com> At 02:46 AM 10/11/2006, Michael S. Tsirkin wrote: >Quoting r. David Miller : > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > From: "Michael S. Tsirkin" > > Date: Wed, 11 Oct 2006 11:05:04 +0200 > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > data will be copied over rather than sent directly. > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > Because it's more efficient to copy into a linear destination > > buffer of an SKB than page sub-chunks when doing checksum+copy. > > > >Thanks for the explanation. >Obviously its true as long as you can allocate the skb that big. >I think you won't realistically be able to get 64K in a >linear SKB on a busy system, though, is not that right? > >OTOH, having large MTU (e.g. 64K) helps performance a lot since it reduces >receive side processing overhead. One thing to keep in mind is while it may help performance in a micro-benchmark, the system performance or the QoS impacts to other flows can be negatively impacted depending upon implementation. For example, consider multiple messages interleaving (heaven help implementations that are not able to interleave multiple messages) on either the transmit or receive HCA / RNIC and how the time-to-completion of any message is extended out in time as a result of the interleave. The effective throughput in terms of useful units of work can be lower as a result. The same effect can be observed when there are a significant number connections in a device being simultaneously processed. Also, if the copy-checksum is not performed on the processor where the application resides, then the performance can also be negatively impacted (want to have the right cache hot when initiated or concluded). While the aggregate computational performance of systems may be increasing at a significant rate (set aside the per core vs. aggregate core debate), the memory performance gains are much less. If you examine the longer term trends, there may be a flattening out of memory performance improvements by 2009/10 without some major changes in the way controllers and subsystems are designed. Mike From sweitzen at cisco.com Wed Oct 11 12:56:26 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 11 Oct 2006 12:56:26 -0700 Subject: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64 Message-ID: We aren't using SLES auto-install. But I did google for "SLES 127.0.0.2" and found this at http://www.novell.com/documentation/novellaudit20/readme/novellaudit20_r eadme.html: 2.8 SLES 10 hosts File SLES 10 includes two localhost entries in the /etc/hosts file: 127.0.0.1 and 127.0.0.2 . The steps for installing Oracle10g on SLES10 at http://wiki.novell.com/index.php/Oracle10g_R2_Database_on_SLES10_for_i38 6_Step-by-Step_1 also reference commenting out the 127.0.0.2 line. Please add this step to the OFED 1.1 MVAPICH release notes. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] > Sent: Wednesday, October 11, 2006 9:03 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > Here is some link about SuSE's bugs related to 127.0.0.2 > https://bugzilla.novell.com/show_bug.cgi?id=165269 > > Check your SuEe auto-install stuff. It is possible that you have some > broken configuration in it. > > > Scott Weitzenkamp (sweitzen) wrote: > > We've installed four SLES10 machines so far, and they all have the > > "127.0.0.2 " entry. > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > >> -----Original Message----- > >> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] > >> Sent: Wednesday, October 11, 2006 8:49 AM > >> To: Scott Weitzenkamp (sweitzen) > >> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib > >> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >> OFED 1.1 rc6 with SLES10 x86_64 > >> > >> I mean SLES10. > >> (yes it's different distros) > >> > >> Scott Weitzenkamp (sweitzen) wrote: > >>> You checked SUSE 10 or SLES 10, aren't those different distros? > >>> > >>> Scott Weitzenkamp > >>> SQA and Release Manager > >>> Server Virtualization Business Unit > >>> Cisco Systems > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: Pavel Shamis (Pasha) [mailto:pasha at dev.mellanox.co.il] > >>>> Sent: Wednesday, October 11, 2006 3:09 AM > >>>> To: Scott Weitzenkamp (sweitzen) > >>>> Cc: Pavel Shamis (Pasha); Aviram Gutman; OpenFabricsEWG; openib > >>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>> > >>>> On some of our SUSE 10 machines i found the 127.0.0.2 ip, > >>>> but it was pointing to some random Linux site (linux.org) > >>>> and has no effect on mpi runs. > >>>> In you case the ip point to _real_ machine, it very strange. > >>>> > >>>> Scott Weitzenkamp (sweitzen) wrote: > >>>> > >>>>> Aha, I found something in /etc/hosts, thanks for the hint. > >>>>> > >>>>> 127.0.0.2 svbu-qa1850-3.cisco.com svbu-qa1850-3 > >>>>> > >>>>> If I comment this line out, MVAPICH works fine. Does > >>>>> > >>>> Mellanox have this > >>>> > >>>>> entry in /etc/hosts? > >>>>> > >>>>> Scott Weitzenkamp > >>>>> SQA and Release Manager > >>>>> Server Virtualization Business Unit > >>>>> Cisco Systems > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >>>>>> Sent: Thursday, October 05, 2006 5:59 AM > >>>>>> To: Scott Weitzenkamp (sweitzen) > >>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib > >>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>>>> > >>>>>> > >>>>>>> I see it for all MVAPICH tests, it's 100% consistent. > >>>>>>> > >>>>>> MVAPICH tests are osu_benchmarks (bw/lt/etc..) or all test > >>>>>> over mvapich > >>>>>> on SUSE10 platform ? > >>>>>> Please check /etc/hosts file on your machines, it should be > >>>>>> exactly the > >>>>>> same on all nodes. > >>>>>> > >>>>>> Regards, > >>>>>> Pasha > >>>>>> > >>>>>> > >>>>>>> Scott Weitzenkamp > >>>>>>> SQA and Release Manager > >>>>>>> Server Virtualization Business Unit > >>>>>>> Cisco Systems > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] > >>>>>>>> Sent: Tuesday, October 03, 2006 3:37 AM > >>>>>>>> To: Scott Weitzenkamp (sweitzen) > >>>>>>>> Cc: Aviram Gutman; OpenFabricsEWG; openib > >>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>>>>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>>>>>> > >>>>>>>> Hi Scott, > >>>>>>>> Unfortunately was not able to reproduce the failure on our > >>>>>>>> > >>>>>> platforms. > >>>>>> > >>>>>>>> Do you see the problem with all tests or with the > >> specific only ? > >>>>>>>> Is it consistent problem ? > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Pasha > >>>>>>>> > >>>>>>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>>>>> > >>>>>>>>> $ uname -a > >>>>>>>>> Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 > >>>>>>>>> > >>>>>>>> 18:25:39 UTC 2006 > >>>>>>>> > >>>>>>>>> x86_64 > >>>>>>>>> x86_64 x86_64 GNU/Linux > >>>>>>>>> $ > >>>>>>>>> > >>>> > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >>>> > >>>>>>>>> 192.168.2.46 192.168.2.49 hostname > >>>>>>>>> svbu-qa1850-4 > >>>>>>>>> svbu-qa1850-3 > >>>>>>>>> $ > >>>>>>>>> > >>>> > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 > >>>> > >>>>>>>>> 192.168.2.46 192.168.2.49 > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_bench > >>>>>>>> > >>>>>>> marks-2.2/ > >>>>>>> > >>>>>>>>> osu_latency > >>>>>>>>> > >>>>>>>>> The last command just hangs. Can I try your binary RPMs? > >>>>>>>>> > >>>>>>>>> Scott Weitzenkamp > >>>>>>>>> SQA and Release Manager > >>>>>>>>> Server Virtualization Business Unit > >>>>>>>>> Cisco Systems > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: Aviram Gutman [mailto:aviram at dev.mellanox.co.il] > >>>>>>>>>> Sent: Sunday, October 01, 2006 2:29 AM > >>>>>>>>>> To: Scott Weitzenkamp (sweitzen) > >>>>>>>>>> Cc: OpenFabricsEWG; openib; pasha at mellanox.co.il > >>>>>>>>>> Subject: Re: [openfabrics-ewg] problems running MVAPICH on > >>>>>>>>>> OFED 1.1 rc6 with SLES10 x86_64 > >>>>>>>>>> > >>>>>>>>>> Can you please elaborate on MVAPICH issues, can you send > >>>>>>>>>> command line? > >>>>>>>>>> We ran it here on 32 Opteron nodes each quad core and also > >>>>>>>>>> > >>>>>>>> rigorous > >>>>>>>> > >>>>>>>>>> tests on the many other nodes? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Scott Weitzenkamp (sweitzen) wrote: > >>>>>>>>>> > >>>>>>>>>>> We are just getting started with OFED testing on > >>>>>>>>>>> > >>>> SLES10, first > >>>> > >>>>>>>>>>> platform is x86_64. > >>>>>>>>>>> > >>>>>>>>>>> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > >>>>>>>>>>> > >>>>>>>>>> working so far. > >>>>>>>>>> > >>>>>>>>>>> MVAPICH with OSU benchmarks just hang. This same > >>>>>>>>>>> > >>>>>>>> hardware works > >>>>>>>> > >>>>>>>>>>> fine with OFED and RHEL4 U3. > >>>>>>>>>>> > >>>>>>>>>>> Has anyone else seen this? > >>>>>>>>>>> > >>>>>>>>>>> Scott Weitzenkamp > >>>>>>>>>>> SQA and Release Manager > >>>>>>>>>>> Server Virtualization Business Unit > >>>>>>>>>>> Cisco Systems > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >> -------------------------------------------------------------- > >>>>>>>>>> ---------- > >>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> openfabrics-ewg mailing list > >>>>>>>>>>> openfabrics-ewg at openib.org > >>>>>>>>>>> http://openib.org/mailman/listinfo/openfabrics-ewg > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> -- > >>>>>>>> Pavel Shamis (Pasha) > >>>>>>>> Software Engineer > >>>>>>>> Mellanox Technologies LTD. > >>>>>>>> pasha at mellanox.co.il > >>>>>>>> > >>>>>>>> > >>>>>> -- > >>>>>> Pavel Shamis (Pasha) > >>>>>> Software Engineer > >>>>>> Mellanox Technologies LTD. > >>>>>> pasha at mellanox.co.il > >>>>>> > >>>>>> > >>>> -- > >>>> Pavel Shamis (Pasha) > >>>> Software Engineer > >>>> Mellanox Technologies LTD. > >>>> pasha at mellanox.co.il > >>>> > >>>> > >>> > >> > >> -- > >> Pavel Shamis (Pasha) > >> Software Engineer > >> Mellanox Technologies LTD. > >> pasha at mellanox.co.il > >> > > > > > -- > Pavel Shamis (Pasha) > Software Engineer > Mellanox Technologies LTD. > pasha at mellanox.co.il > From kevin.ball at qlogic.com Wed Oct 11 13:12:04 2006 From: kevin.ball at qlogic.com (Kevin Ball) Date: Wed, 11 Oct 2006 13:12:04 -0700 Subject: [openib-general] Architectural differences between new SDP and old Message-ID: <1160597524.1106.20.camel@ammonite> Hi Michael, I'm investigating some of the performance differences between the new implementation of SDP and the old. The profiles are substantially different, with the new spending much more time polling than the old did. I'm not sure if this is a direct effect or secondary due to something else. Could you (or anyone else who has already looked at this) give or point me to a high-level overview of the architectural differences between the two implementations? This would save a lot of time. Thanks! -Kevin From eitan at mellanox.co.il Wed Oct 11 13:11:46 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 11 Oct 2006 22:11:46 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D2E4D.8000902@ichips.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> Message-ID: <452D5002.1060901@mellanox.co.il> Sean Hefty wrote: > Eitan Zahavi wrote: > >> User level code can be run by root. It can access QP1 and bypass your nice >> API. Also you ignore current kernel implementations that exist and already >> perform QP1 access via the SA client code in the kernel. >> > > Yep - and that same app can, today, delete the multicast groups used by ipoib. > As long as a MAD can actually be sent out of QP1, we have this problem. > How is the MAD sent by QP1 ? QP1 is owned by the mad.c isn't it??? So if it is then there is no problem sniffing it and refcounting. > The existing APIs were not ignored. The ib_multicast module makes direct calls > to ib_sa. > And there are other kernel level ULPs that use that IB_SA code and bypass ib_multicast > >> No you are forcing every kernel client out there to do another unneeded >> migration to the new API and still not protect it correctly against multicast >> disconnects. >> > > There's exactly 1 kernel client that uses multicast groups, and a patch was > provided to migrate that client. > 1 that you know about. Others did not make it into the kernel but are quite productive to those running them. Changing top API for ULPs and Clients is simpler to implement but provide wrong tradeoff for functionality that can be implemented under the hood - not burdening the rest of the world with a constant flux of API changes. > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From steve at chygwyn.com Wed Oct 11 13:11:38 2006 From: steve at chygwyn.com (Steven Whitehouse) Date: Wed, 11 Oct 2006 21:11:38 +0100 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011150103.GF4888@mellanox.co.il> References: <20061011090926.GA15393@fogou.chygwyn.com> <20061011150103.GF4888@mellanox.co.il> Message-ID: <20061011201138.GA21657@fogou.chygwyn.com> Hi, On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote: > Quoting Steven Whitehouse : > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > > size_t size, int flags) > > > { > > > ssize_t res; > > > struct sock *sk = sock->sk; > > > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > data will be copied over rather than sent directly. > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > > I agree with that analysis, > > So, would you Ack something like the following then? > In so far as I'm able to ack it, then yes, but with the following caveats: that you also need to look at the tcp code's checks for NETIF_F_SG (aside from the interface to tcp_sendpage which I think we've agreed is ok) and ensure that this patch will not change their behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size() in particular - there may be others but thats the only one I can think of off the top of my head. I think this is what davem was getting at with his comment about copy & sum for smaller packets. Also all subject to approval by davem and shemminger of course :-) My general feeling is that devices should advertise the features that they actually have and that the protocols should make the decision as to which ones to use or not depending on the combinations available (which I think is pretty much your argument). Steve. From davem at davemloft.net Wed Oct 11 13:52:01 2006 From: davem at davemloft.net (David Miller) Date: Wed, 11 Oct 2006 13:52:01 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011150103.GF4888@mellanox.co.il> References: <20061011090926.GA15393@fogou.chygwyn.com> <20061011150103.GF4888@mellanox.co.il> Message-ID: <20061011.135201.15405152.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Wed, 11 Oct 2006 17:01:03 +0200 > Quoting Steven Whitehouse : > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > > size_t size, int flags) > > > { > > > ssize_t res; > > > struct sock *sk = sock->sk; > > > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > data will be copied over rather than sent directly. > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > > I agree with that analysis, > > So, would you Ack something like the following then? I certainly don't. From mst at mellanox.co.il Wed Oct 11 13:52:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 22:52:14 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011201138.GA21657@fogou.chygwyn.com> References: <20061011090926.GA15393@fogou.chygwyn.com> <20061011150103.GF4888@mellanox.co.il> <20061011201138.GA21657@fogou.chygwyn.com> Message-ID: <20061011205214.GC15468@mellanox.co.il> Quoting r. Steven Whitehouse : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > Hi, > > On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote: > > Quoting Steven Whitehouse : > > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > > > size_t size, int flags) > > > > { > > > > ssize_t res; > > > > struct sock *sk = sock->sk; > > > > > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > > data will be copied over rather than sent directly. > > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > > > > I agree with that analysis, > > > > So, would you Ack something like the following then? > > > > In so far as I'm able to ack it, then yes, but with the following > caveats: that you also need to look at the tcp code's checks for > NETIF_F_SG (aside from the interface to tcp_sendpage which I think > we've agreed is ok) and ensure that this patch will not change their > behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size() > in particular - there may be others but thats the only one I can think > of off the top of my head. I think this is what davem was getting at > with his comment about copy & sum for smaller packets. Will do - thanks for the tips. > Also all subject to approval by davem and shemminger of course :-) Goes without saying :) > My general feeling is that devices should advertise the features that > they actually have and that the protocols should make the decision > as to which ones to use or not depending on the combinations available > (which I think is pretty much your argument). > > Steve. -- MST From shemminger at osdl.org Wed Oct 11 13:57:20 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Wed, 11 Oct 2006 13:57:20 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011201138.GA21657@fogou.chygwyn.com> References: <20061011090926.GA15393@fogou.chygwyn.com> <20061011150103.GF4888@mellanox.co.il> <20061011201138.GA21657@fogou.chygwyn.com> Message-ID: <20061011135720.303f166b@freekitty> On Wed, 11 Oct 2006 21:11:38 +0100 Steven Whitehouse wrote: > Hi, > > On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote: > > Quoting Steven Whitehouse : > > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > > > size_t size, int flags) > > > > { > > > > ssize_t res; > > > > struct sock *sk = sock->sk; > > > > > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > > data will be copied over rather than sent directly. > > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > > > > I agree with that analysis, > > > > So, would you Ack something like the following then? > > > > In so far as I'm able to ack it, then yes, but with the following > caveats: that you also need to look at the tcp code's checks for > NETIF_F_SG (aside from the interface to tcp_sendpage which I think > we've agreed is ok) and ensure that this patch will not change their > behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size() > in particular - there may be others but thats the only one I can think > of off the top of my head. I think this is what davem was getting at > with his comment about copy & sum for smaller packets. > > Also all subject to approval by davem and shemminger of course :-) > > My general feeling is that devices should advertise the features that > they actually have and that the protocols should make the decision > as to which ones to use or not depending on the combinations available > (which I think is pretty much your argument). > > Steve. > You might want to try ignoring the check in dev.c and testing to see if there is a performance gain. It wouldn't be hard to test a modified version and validate the performance change. You could even do what I suggested and use skb_checksum_help() to do inplace checksumming, as a performance test. -- Stephen Hemminger From rdreier at cisco.com Wed Oct 11 14:00:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:00:02 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CE6FC.5090303@mellanox.co.il> (Eitan Zahavi's message of "Wed, 11 Oct 2006 14:43:40 +0200") References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> Message-ID: Eitan> If the tracking (ref counting) was done at the MAD level - Eitan> no change to IPoIB would have been required ... It doesn't seem very feasible to implement a complete local copy of the SA (in the kernel no less) so that we can allow unprivileged processes to send on QP1. - R. From rdreier at cisco.com Wed Oct 11 14:01:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:01:40 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CFD0C.2050709@mellanox.co.il> (Eitan Zahavi's message of "Wed, 11 Oct 2006 16:17:48 +0200") References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> Message-ID: Eitan> Hi Or, Maybe I did not explain myself right. The idea is Eitan> not to implement it in the mad.c code but rather to Eitan> implement it at the lowest level: The problem with a new Eitan> API is that a single ULP/applications which does direct Eitan> umad or QP1 access will break the reference count. Eitan> Implementing at the lowest level - I.e. by sniffing QP1 Eitan> packets - would be enforced for all applications/ULPs. Does not seem very feasible -- for one thing you would have to attach a privilege level to every MAD somehow, and keep that all the way through the stack. And then as far as I can tell you would essentially need a complete local copy of all GS handling to decide what to do with each QP1 message. - R. From rdreier at cisco.com Wed Oct 11 14:03:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:03:56 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061011125248.GA5181@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 11 Oct 2006 14:52:48 +0200") References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> Message-ID: Michael> Why is this even a good idea? If you are looking for Michael> reasons using mutlicast module in ipoib is good, I would Michael> say blocking unpriviledged userspace from joining IPoIB Michael> GID and snoopig on all mcast traffic sounds like a better Michael> idea. BTW, Sean, I think this is something we need for Michael> the ucma multicast part to go in. I would imagine kernel Michael> components could set some kind of flag on mcast join to Michael> make them exclusive. API currently does not allow for Michael> that. I don't think this is a real concern. Already userspace can just attach a QP to any MGID it wants -- how does it hurt to let it join an arbitry MGID? Michael> And why the rush? Is the new module used at all yet? Michael> Let's see it get some use before switching a basic Michael> component over. Yes, I would like to see another consumer, just to get a sanity check that you have implemented the right thing. Michael> Finally, the patch in question also seems to introduce Michael> more cleanups and such. It would be less controversial if Michael> it was just an API change. Agree -- please do cleanups as separate patches, preferably earlier in the series (so that it's easier to back out the real change for testing) - R. From rdreier at cisco.com Wed Oct 11 14:05:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:05:14 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452CF63B.9030003@voltaire.com> (Or Gerlitz's message of "Wed, 11 Oct 2006 15:48:43 +0200") References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452CF63B.9030003@voltaire.com> Message-ID: Or> Its not a rush its a move for enabling user space code that Or> can offload IP Multicast. We have a library doing that which Or> is coded over the gen1 stack and is now in porting for the Or> gen2 stack. OK -- I would like to hear your experiences porting on top of this. - R. From rdreier at cisco.com Wed Oct 11 14:07:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:07:46 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D25F7.1010902@mellanox.co.il> (Eitan Zahavi's message of "Wed, 11 Oct 2006 19:12:23 +0200") References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> Message-ID: Eitan> User level code can be run by root. It can access QP1 and Eitan> bypass your nice API. Also you ignore current kernel Eitan> implementations that exist and already perform QP1 access Eitan> via the SA client code in the kernel. root can already do anything at all so I don't think that's an issue. We want a way for unprivileged userspace to be able to use multicast. Usually I say "just use a privileged daemon in userspace" but I think in this case we actually need coordination between the kernel and userspace to track _all_ multicast joins, so it does make sense for this to be in the kernel. kernel code can always be ported as part of the merge, and in fact that's exactly what Sean did for the only user of multicast stuff, namely IPoIB. - R. From rdreier at cisco.com Wed Oct 11 14:08:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:08:37 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D5002.1060901@mellanox.co.il> (Eitan Zahavi's message of "Wed, 11 Oct 2006 22:11:46 +0200") References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> Message-ID: Eitan> 1 that you know about. Others did not make it into the Eitan> kernel but are quite productive to those running them. What are those others? Eitan> Changing top API for ULPs and Clients is simpler to Eitan> implement but provide wrong tradeoff for functionality that Eitan> can be implemented under the hood - not burdening the rest Eitan> of the world with a constant flux of API changes. Documentation/stable_api_nonsense.txt is an interesting read. - R. From mst at mellanox.co.il Wed Oct 11 14:11:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 23:11:53 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011.135201.15405152.davem@davemloft.net> References: <20061011.135201.15405152.davem@davemloft.net> Message-ID: <20061011211153.GE15468@mellanox.co.il> Quoting r. David Miller : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" > Date: Wed, 11 Oct 2006 17:01:03 +0200 > > > Quoting Steven Whitehouse : > > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > > > size_t size, int flags) > > > > { > > > > ssize_t res; > > > > struct sock *sk = sock->sk; > > > > > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > > data will be copied over rather than sent directly. > > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > > > > I agree with that analysis, > > > > So, would you Ack something like the following then? > > I certainly don't. > Dave, sorry, you lost me. Would a new flag NETIF_F_SW_CSUM be better, to tell network core that device computes checksums in software, so we should piggyback the checksum computation with the copy process if possible? Or is there some other issue? -- MST From mshefty at ichips.intel.com Wed Oct 11 14:20:57 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 14:20:57 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> Message-ID: <452D6039.6000208@ichips.intel.com> Roland Dreier wrote: > Yes, I would like to see another consumer, just to get a sanity check > that you have implemented the right thing. I think aiming for 2.6.20 is reasonable, but I'm fine if this doesn't end up making it. I'd like to get it into a branch targeting some kernel release (even if in the -mm branch), and cease maintaining the differences in svn. > Agree -- please do cleanups as separate patches, preferably earlier in > the series (so that it's easier to back out the real change for > testing) There shouldn't be any separate cleanups. Some changes to the synchronization was needed to ensure that ib_free_multicast() was called exactly once. I went back over the changes based on some of Michael's comments, but I'll check again looking for cleanup only portions of the patch. - Sean From mst at mellanox.co.il Wed Oct 11 14:23:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 23:23:39 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011135720.303f166b@freekitty> References: <20061011135720.303f166b@freekitty> Message-ID: <20061011212339.GH15468@mellanox.co.il> Quoting r. Stephen Hemminger : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > On Wed, 11 Oct 2006 21:11:38 +0100 > Steven Whitehouse wrote: > > > Hi, > > > > On Wed, Oct 11, 2006 at 05:01:03PM +0200, Michael S. Tsirkin wrote: > > > Quoting Steven Whitehouse : > > > > > ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, > > > > > size_t size, int flags) > > > > > { > > > > > ssize_t res; > > > > > struct sock *sk = sock->sk; > > > > > > > > > > if (!(sk->sk_route_caps & NETIF_F_SG) || > > > > > !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) > > > > > return sock_no_sendpage(sock, page, offset, size, flags); > > > > > > > > > > > > > > > So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, > > > > > data will be copied over rather than sent directly. > > > > > So why does dev.c have to force set NETIF_F_SG to off then? > > > > > > > > > I agree with that analysis, > > > > > > So, would you Ack something like the following then? > > > > > > > In so far as I'm able to ack it, then yes, but with the following > > caveats: that you also need to look at the tcp code's checks for > > NETIF_F_SG (aside from the interface to tcp_sendpage which I think > > we've agreed is ok) and ensure that this patch will not change their > > behaviour, and here I'm thinking of the test in net/ipv4/tcp.c:select_size() > > in particular - there may be others but thats the only one I can think > > of off the top of my head. I think this is what davem was getting at > > with his comment about copy & sum for smaller packets. > > > > Also all subject to approval by davem and shemminger of course :-) > > > > My general feeling is that devices should advertise the features that > > they actually have and that the protocols should make the decision > > as to which ones to use or not depending on the combinations available > > (which I think is pretty much your argument). > > > > Steve. > > > > You might want to try ignoring the check in dev.c and testing > to see if there is a performance gain. It wouldn't be hard to test > a modified version and validate the performance change. Yes. With my patch, there is a huge performance gain by increasing MTU to 64K. And it seems the only way to do this is by S/G. > You could even do what I suggested and use skb_checksum_help() > to do inplace checksumming, as a performance test. I can. But as network algorithmics says (chapter 5) "Since such bus reads are expensive, the CPU might as well piggyback the checksum computation with the copy process". It speaks about onboard the adapter buffers, but memory bus reads are also much slower than CPU nowdays. So I think even if this works well in benchmark in real life single copy should better. -- MST From mshefty at ichips.intel.com Wed Oct 11 14:24:39 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 14:24:39 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D5002.1060901@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> Message-ID: <452D6117.6040400@ichips.intel.com> Eitan Zahavi wrote: > So if it is then there is no problem sniffing it and refcounting. The MADs cannot simply be sniffed and counted. MADs which affect the same multicast group should not always be sent. Join operations must be serialized against leave operations, especially when the join/leave parameters differ. A join operation to an existing group may not result in a MAD being sent, so no response from the SA is available. The act of joining or leaving a multicast group is distinct from sending a MAD. The appearance of a MAD on the wire is not always necessary. Consider that pushing this functionality down into the MAD layer also results in pushing the related ib_sa functionality into the ib_mad module as well. > And there are other kernel level ULPs that use that IB_SA code and > bypass ib_multicast There are no in tree users, which is my primary concern at the moment. The ib_sa API still exists for out of tree users, but they will as broken as they are today. > Changing top API for ULPs and Clients is simpler to implement but > provide wrong tradeoff for functionality that can be implemented under > the hood - not burdening the rest of the world with a constant flux of > API changes. I'm more concerned about getting the right API than trying to fit something into an existing API just because its there. My proposal is to have the following layers: ib_mad - sends and receives MADs on QP0/1 ib_sa - sends and receives MADs to the SA ib_multicast - manages multicast joins The alternative proposal I keep hearing is to combine these 3 layers under the existing ib_mad API. However, the behavior of that API will change. - Sean From rdreier at cisco.com Wed Oct 11 14:27:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 11 Oct 2006 14:27:10 -0700 Subject: [openib-general] IB/ipath - initialize diagpkt file on device init only In-Reply-To: <452C16E1.60005@pathscale.com> (Robert Walsh's message of "Tue, 10 Oct 2006 14:55:45 -0700") References: <452C16E1.60005@pathscale.com> Message-ID: Thanks, queued for 2.6.19 From sweitzen at cisco.com Wed Oct 11 14:26:55 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 11 Oct 2006 14:26:55 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 Message-ID: Yes, the patch is being applied. Not sure why cm_issue_drep is not there though... Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Vladimir Sokolovsky [mailto:vlad at mellanox.co.il] > Sent: Wednesday, October 11, 2006 9:13 AM > To: Scott Weitzenkamp (sweitzen) > Cc: Arlin Davis; Magro, Bill; Supalov, Alexander; > Openib-General at Openib.Org; EWG > Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1 RC7 > > Hi Scott, > You can check OFED compilation log file to see if this patch > was applied > and compiled. > > To get the relevant log file: > ls -ltr /tmp/OFED*log | tail -2 > > One of them will be the compilation log. > > Search for sean_cm_drep_on_not_found.patch inside... > > Regards, > Vladimir > > On Wed, 2006-10-11 at 08:37 -0700, Scott Weitzenkamp (sweitzen) wrote: > > Vlad, do you have symbol cm_issue_drep in any .ko files, because I > > don't. Looks like the patch is not getting compiled in for > some reason. > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > > > -----Original Message----- > > > From: Vladimir Sokolovsky [mailto:vlad at mellanox.co.il] > > > Sent: Wednesday, October 11, 2006 2:00 AM > > > To: Arlin Davis > > > Cc: Aviram Gutman; Scott Weitzenkamp (sweitzen); Supalov, > > > Alexander; Magro, Bill; EWG; Openib-General at Openib.Org > > > Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1 RC7 > > > > > > Hi Arlin, > > > This patch is in OFED-1.1-rc7 and applied during installation. > > > > > > > > > Regards, > > > Vladimir > > > > > > On Tue, 2006-10-10 at 22:50 -0700, Arlin Davis wrote: > > > > Aviram Gutman wrote: > > > > > > > > >OFED-1.1-rc7 is available on > > > > >https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > > > >File: OFED-1.1-rc7.tgz > > > > >Please report any issues in bugzilla > http://openib.org/bugzilla/ > > > > > > > > > > > > > > > > > > > Aviram, > > > > > > > > Can you verify that the sean_cm_drep_on_not_found.patch > is actually > > > > applied in RC7? Our delayed disconnect problems still exist. > > > > > > > > I don't see the new symbol "cm_issue_drep" in ib_cm.ko > on our RC7 > > > > installed systems so I don't think the patch applied. > > > > > > > > Thanks, > > > > > > > > -arlin > > > > > > > > _______________________________________________ > > > > openfabrics-ewg mailing list > > > > openfabrics-ewg at openib.org > > > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > From bugzilla-daemon at openib.org Wed Oct 11 14:32:19 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 11 Oct 2006 14:32:19 -0700 (PDT) Subject: [openib-general] [Bug 279] New: OFED: warning when building uverbs_main.c on RHEL4 U3 Message-ID: <20061011213219.CBE6E2283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=279 Summary: OFED: warning when building uverbs_main.c on RHEL4 U3 Product: OpenFabrics Linux Version: 1.1rc7 Platform: Other OS/Version: RHEL 4 Status: NEW Severity: normal Priority: P3 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: rjwalsh at pathscale.com I get this warning when building OFED-1.1 RC7 on RHEL4 U3: uverbs_main.c:127: warning: static declaration of 'get_sb_pseudo' follows non-static declaration include/linux/fs.h:1179: warning: previous declaration of 'get_sb_pseudo' was here Looks like the backport is backporting a function that isn't really needed. I managed to cause this on 2.6.9-34.ELsmp. Doesn't happen on U4, as that doesn't backport this function. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From shemminger at osdl.org Wed Oct 11 14:29:57 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Wed, 11 Oct 2006 14:29:57 -0700 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011212339.GH15468@mellanox.co.il> References: <20061011135720.303f166b@freekitty> <20061011212339.GH15468@mellanox.co.il> Message-ID: <20061011142957.5bd42784@freekitty> O > > > > You might want to try ignoring the check in dev.c and testing > > to see if there is a performance gain. It wouldn't be hard to test > > a modified version and validate the performance change. > > Yes. With my patch, there is a huge performance gain by increasing MTU to 64K. > And it seems the only way to do this is by S/G. > > > You could even do what I suggested and use skb_checksum_help() > > to do inplace checksumming, as a performance test. > > I can. But as network algorithmics says (chapter 5) > "Since such bus reads are expensive, the CPU might as well piggyback > the checksum computation with the copy process". > > It speaks about onboard the adapter buffers, but memory bus reads are also much slower > than CPU nowdays. So I think even if this works well in benchmark in real life > single copy should better. > The other alternative might be to make copy/checksum code smarter about using fragments rather than allocating a large buffer. It should avoid second order allocations (effective size > PAGESIZE). -- Stephen Hemminger From davem at davemloft.net Wed Oct 11 14:41:37 2006 From: davem at davemloft.net (David Miller) Date: Wed, 11 Oct 2006 14:41:37 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011212339.GH15468@mellanox.co.il> References: <20061011135720.303f166b@freekitty> <20061011212339.GH15468@mellanox.co.il> Message-ID: <20061011.144137.18281355.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Wed, 11 Oct 2006 23:23:39 +0200 > With my patch, there is a huge performance gain by increasing MTU to 64K. > And it seems the only way to do this is by S/G. Numbers? From mst at mellanox.co.il Wed Oct 11 14:42:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 11 Oct 2006 23:42:37 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011142957.5bd42784@freekitty> References: <20061011142957.5bd42784@freekitty> Message-ID: <20061011214237.GK15468@mellanox.co.il> Quoting r. Stephen Hemminger : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > O > > > > > > You might want to try ignoring the check in dev.c and testing > > > to see if there is a performance gain. It wouldn't be hard to test > > > a modified version and validate the performance change. > > > > Yes. With my patch, there is a huge performance gain by increasing MTU to 64K. > > And it seems the only way to do this is by S/G. > > > > > You could even do what I suggested and use skb_checksum_help() > > > to do inplace checksumming, as a performance test. > > > > I can. But as network algorithmics says (chapter 5) > > "Since such bus reads are expensive, the CPU might as well piggyback > > the checksum computation with the copy process". > > > > It speaks about onboard the adapter buffers, but memory bus reads are also much slower > > than CPU nowdays. So I think even if this works well in benchmark in real life > > single copy should better. > > > > The other alternative might be to make copy/checksum code smarter about using > fragments rather than allocating a large buffer. It should avoid second order > allocations (effective size > PAGESIZE). In my testing, it seems quite smart already - once I declare F_SG and clear F_...CSUM I start getting properly checksummed packets with lots of s/g fragments. But I'm not catching the drift - an alternative to what this would be? -- MST From bugzilla-daemon at openib.org Wed Oct 11 15:13:10 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 11 Oct 2006 15:13:10 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061011221310.703F32283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #3 from rolandd at cisco.com 2006-10-11 15:13 ------- In both cases the final crash seems to be in the call spin_lock_irqsave(&priv->lock, flags); in path_rec_completion(). This would seem to indicate some sort of memory corruption I guess. I don't know yet why dev_queue_xmit() would fail... ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Wed Oct 11 15:25:33 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 11 Oct 2006 15:25:33 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061011222533.1A7C22283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #4 from rolandd at cisco.com 2006-10-11 15:25 ------- OK, most likely dev_queue_xmit() is returning an error because the device is down, but that should be OK. I guess we have a race somewhere with up/downing the device at the same time as handling traffic. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mshefty at ichips.intel.com Wed Oct 11 16:06:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 11 Oct 2006 16:06:15 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000a01c6eccb$d51e0b30$c0d4180a@amr.corp.intel.com> References: <000a01c6eccb$d51e0b30$c0d4180a@amr.corp.intel.com> Message-ID: <452D78E7.5040906@ichips.intel.com> Sean Hefty wrote: >>>+ int (*callback)(int status, >>>+ struct ib_multicast >>>+ *multicast), >>>+ void *context); >>>+ >> >>Is this re-introducing module unload races we had with sa all over again? > > > The call returns a structure that must be freed. If the structure is freed by > returning a non-zero call to the callback, then we have the same problem that > ib_cm and rdma_cm have. Not allowing a return value from the callback is an > easy fix for that though. I looked at this in more detail. While it's easy on the ib_multicast module to not allow returning a value from the callback, it makes it more difficult on the users to handle failures. (The ipoib changes use this feature.) So, I think we want registration with the ib_multicast module similar to what was done with the ib_sa. - Sean From bugzilla-daemon at openib.org Wed Oct 11 16:35:12 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 11 Oct 2006 16:35:12 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061011233512.8F7182283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #5 from rolandd at cisco.com 2006-10-11 16:35 ------- Scott, could you add "debug_level=1" to the ib_ipoib module flags and rerun one of these tests? That will generate a boatload of logging output, but I'd just like to see the last part before a crash -- say the final 1000 lines or so. Thanks... (unfortunately I don't have an appropriate setup to reproduce this at the moment but I'd like to try and make progress...) ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ardavis at ichips.intel.com Wed Oct 11 16:36:35 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 11 Oct 2006 16:36:35 -0700 Subject: [openib-general] two OFED uDAPL issues In-Reply-To: <452D0FB0.1030206@voltaire.com> References: <452D0FB0.1030206@voltaire.com> Message-ID: <452D8003.8050901@ichips.intel.com> Or Gerlitz wrote: > Arlin, > > I see now that the uDAPL CMA provider code uses the MTU 1:1 as > returned by the SM in the path, so if the env is made of the Mellanox > PCI-X HCA there can be big BW drop, etc... we have discussed that. > > I wonder how are you overcoming this when running Intel MPI w. OFED 1.0? We are not. The problem exists with the CMA provider and OFED 1.0. > > I understand in OFED 1.1 there is this tavor_quirk in both the cma and > the opensm, but i am not aware to any such hack in OFED 1.0. > > Also, i understand that OFED includes the uDAPL **SCM** provider, is > it really tested/supported? if yes, i don't think it needs to be. It > adds the overhead of one TCP connection per IB connection, creates two > codes bases to maintain, makes the CMA less tested, you named it. > > If its not tested/supported sure we must not provide it. > > If you agree would you approach the OFED maintainers to remove the SCM > provider from the udapl OFED 1.1 RPM? You are correct. There is no need to support SCM moving forward. -arlin From krkumar2 at in.ibm.com Wed Oct 11 21:53:08 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:08 +0530 Subject: [openib-general] [PATCH 0/13] Re-write error cases in CMA routines to simplify code Message-ID: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Re-write all cma error cases to simplify code. Splitting it as multiple patches (one per routine) in case some are found not required (in which case, later ones may apply with a fuzz). Signed-off-by: Krishna Kumar -------- From krkumar2 at in.ibm.com Wed Oct 11 21:53:15 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:15 +0530 Subject: [openib-general] [PATCH 1/13] Re-write rdma_create_qp error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045315.21952.72646.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -389,16 +389,16 @@ int rdma_create_qp(struct rdma_cm_id *id break; } - if (ret) - goto err; + if (ret) { + ib_destroy_qp(qp); + goto out; + } id->qp = qp; id_priv->qp_num = qp->qp_num; id_priv->qp_type = qp->qp_type; id_priv->srq = (qp->srq != NULL); - return 0; -err: - ib_destroy_qp(qp); +out: return ret; } EXPORT_SYMBOL(rdma_create_qp); From krkumar2 at in.ibm.com Wed Oct 11 21:53:35 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:35 +0530 Subject: [openib-general] [PATCH 4/13] Re-write cma_work_handler error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045335.21952.25699.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1276,14 +1276,12 @@ static void cma_work_handler(void *data) int destroy = 0; atomic_inc(&id_priv->dev_remove); - if (!cma_comp_exch(id_priv, work->old_state, work->new_state)) - goto out; - - if (id_priv->id.event_handler(&id_priv->id, &work->event)) { - cma_exch(id_priv, CMA_DESTROYING); - destroy = 1; + if (cma_comp_exch(id_priv, work->old_state, work->new_state)) { + if (id_priv->id.event_handler(&id_priv->id, &work->event)) { + cma_exch(id_priv, CMA_DESTROYING); + destroy = 1; + } } -out: cma_release_remove(id_priv); cma_deref_id(id_priv); if (destroy) From krkumar2 at in.ibm.com Wed Oct 11 21:53:48 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:48 +0530 Subject: [openib-general] [PATCH 6/13] Re-write rdma_resolve_route error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045348.21952.83864.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1435,13 +1435,10 @@ int rdma_resolve_route(struct rdma_cm_id ret = -ENOSYS; break; } - if (ret) - goto err; - - return 0; -err: - cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED); - cma_deref_id(id_priv); + if (ret) { + cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED); + cma_deref_id(id_priv); + } return ret; } EXPORT_SYMBOL(rdma_resolve_route); From krkumar2 at in.ibm.com Wed Oct 11 21:53:41 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:41 +0530 Subject: [openib-general] [PATCH 5/13] Re-write rdma_set_ib_paths error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045341.21952.43971.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1348,13 +1348,12 @@ int rdma_set_ib_paths(struct rdma_cm_id id->route.path_rec = kmalloc(sizeof *path_rec * num_paths, GFP_KERNEL); if (!id->route.path_rec) { ret = -ENOMEM; - goto err; + cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_ADDR_RESOLVED); + } else { + ret = 0; + memcpy(id->route.path_rec, path_rec, + sizeof *path_rec * num_paths); } - - memcpy(id->route.path_rec, path_rec, sizeof *path_rec * num_paths); - return 0; -err: - cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_ADDR_RESOLVED); return ret; } EXPORT_SYMBOL(rdma_set_ib_paths); From krkumar2 at in.ibm.com Wed Oct 11 21:54:08 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:24:08 +0530 Subject: [openib-general] [PATCH 9/13] Re-write rdma_resolve_addr error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045408.21952.44068.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1600,13 +1600,10 @@ int rdma_resolve_addr(struct rdma_cm_id ret = rdma_resolve_ip(&id->route.addr.src_addr, dst_addr, &id->route.addr.dev_addr, timeout_ms, addr_handler, id_priv); - if (ret) - goto err; - - return 0; -err: - cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND); - cma_deref_id(id_priv); + if (ret) { + cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND); + cma_deref_id(id_priv); + } return ret; } EXPORT_SYMBOL(rdma_resolve_addr); From krkumar2 at in.ibm.com Wed Oct 11 21:53:28 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:28 +0530 Subject: [openib-general] [PATCH 3/13] Re-write rdma_listen error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045328.21952.36307.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1208,13 +1208,13 @@ int rdma_listen(struct rdma_cm_id *id, i switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: ret = cma_ib_listen(id_priv); - if (ret) - goto err; break; default: ret = -ENOSYS; - goto err; + break; } + if (ret) + goto err; } else cma_listen_on_all(id_priv); From krkumar2 at in.ibm.com Wed Oct 11 21:54:21 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:24:21 +0530 Subject: [openib-general] [PATCH 11/13] Re-write rdma_accept error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045421.21952.71711.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -2054,13 +2054,10 @@ int rdma_accept(struct rdma_cm_id *id, s break; } - if (ret) - goto reject; - - return 0; -reject: - cma_modify_qp_err(id); - rdma_reject(id, NULL, 0); + if (ret) { + cma_modify_qp_err(id); + rdma_reject(id, NULL, 0); + } return ret; } EXPORT_SYMBOL(rdma_accept); From krkumar2 at in.ibm.com Wed Oct 11 21:54:01 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:24:01 +0530 Subject: [openib-general] [PATCH 8/13] Re-write cma_resolve_loopback error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045401.21952.72637.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1542,8 +1542,10 @@ static int cma_resolve_loopback(struct r if (!id_priv->cma_dev) { ret = cma_bind_loopback(id_priv); - if (ret) - goto err; + if (ret) { + kfree(work); + return ret; + } } ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); @@ -1563,9 +1565,6 @@ static int cma_resolve_loopback(struct r work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED; queue_work(cma_wq, &work->work); return 0; -err: - kfree(work); - return ret; } static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, From krkumar2 at in.ibm.com Wed Oct 11 21:54:35 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:24:35 +0530 Subject: [openib-general] [PATCH 13/13] Re-write cma_init error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045435.21952.2638.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -2389,13 +2389,10 @@ static int cma_init(void) ib_sa_register_client(&sa_client); ret = ib_register_client(&cma_client); - if (ret) - goto err; - return 0; - -err: - ib_sa_unregister_client(&sa_client); - destroy_workqueue(cma_wq); + if (ret) { + ib_sa_unregister_client(&sa_client); + destroy_workqueue(cma_wq); + } return ret; } From krkumar2 at in.ibm.com Wed Oct 11 21:53:54 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:54 +0530 Subject: [openib-general] [PATCH 7/13] Re-write cma_bind_loopback error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045354.21952.77579.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1453,19 +1453,19 @@ static int cma_bind_loopback(struct rdma u8 p; mutex_lock(&lock); + if (list_empty(&dev_list)) { + ret = -ENODEV; + goto out; + } + list_for_each_entry(cma_dev, &dev_list, list) for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) if (!ib_query_port(cma_dev->device, p, &port_attr) && port_attr.state == IB_PORT_ACTIVE) goto port_found; - if (!list_empty(&dev_list)) { - p = 1; - cma_dev = list_entry(dev_list.next, struct cma_device, list); - } else { - ret = -ENODEV; - goto out; - } + p = 1; + cma_dev = list_entry(dev_list.next, struct cma_device, list); port_found: ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); From krkumar2 at in.ibm.com Wed Oct 11 21:54:14 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:24:14 +0530 Subject: [openib-general] [PATCH 10/13] Re-write rdma_connect error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045414.21952.85540.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1963,11 +1963,7 @@ int rdma_connect(struct rdma_cm_id *id, break; } if (ret) - goto err; - - return 0; -err: - cma_comp_exch(id_priv, CMA_CONNECT, CMA_ROUTE_RESOLVED); + cma_comp_exch(id_priv, CMA_CONNECT, CMA_ROUTE_RESOLVED); return ret; } EXPORT_SYMBOL(rdma_connect); From krkumar2 at in.ibm.com Wed Oct 11 21:54:28 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:24:28 +0530 Subject: [openib-general] [PATCH 12/13] Re-write cma_add_one error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045428.21952.93637.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -2288,14 +2288,15 @@ static void cma_add_one(struct ib_device struct cma_device *cma_dev; struct rdma_id_private *id_priv; + if (!device->node_guid) + return; + cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL); if (!cma_dev) return; cma_dev->device = device; cma_dev->node_guid = device->node_guid; - if (!cma_dev->node_guid) - goto err; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); @@ -2307,9 +2308,6 @@ static void cma_add_one(struct ib_device list_for_each_entry(id_priv, &listen_any_list, list) cma_listen_on_dev(id_priv, cma_dev); mutex_unlock(&lock); - return; -err: - kfree(cma_dev); } static int cma_remove_id_dev(struct rdma_id_private *id_priv) From krkumar2 at in.ibm.com Wed Oct 11 21:53:21 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 12 Oct 2006 10:23:21 +0530 Subject: [openib-general] [PATCH 2/13] Re-write cma_listen_on_dev error cases In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <20061012045321.21952.15743.sendpatchset@localhost.localdomain> diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1165,11 +1165,7 @@ static void cma_listen_on_dev(struct rdm ret = rdma_listen(id, id_priv->backlog); if (ret) - goto err; - - return; -err: - cma_destroy_listen(dev_id_priv); + cma_destroy_listen(dev_id_priv); } static void cma_listen_on_all(struct rdma_id_private *id_priv) From mst at mellanox.co.il Wed Oct 11 23:02:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 08:02:53 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D78E7.5040906@ichips.intel.com> References: <452D78E7.5040906@ichips.intel.com> Message-ID: <20061012060253.GA13181@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > Sean Hefty wrote: > >>>+ int (*callback)(int status, > >>>+ struct ib_multicast > >>>+ *multicast), > >>>+ void *context); > >>>+ > >> > >>Is this re-introducing module unload races we had with sa all over again? > > > > > > The call returns a structure that must be freed. If the structure is freed by > > returning a non-zero call to the callback, then we have the same problem that > > ib_cm and rdma_cm have. Hmm, sorry, I forgot. Could you restate what the ib_cm/rdma_cm problem is, please? Shouldn't we solve that, too? -- MST From ogerlitz at voltaire.com Wed Oct 11 23:48:47 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 12 Oct 2006 08:48:47 +0200 Subject: [openib-general] two OFED uDAPL issues In-Reply-To: <452D8003.8050901@ichips.intel.com> References: <452D0FB0.1030206@voltaire.com> <452D8003.8050901@ichips.intel.com> Message-ID: <452DE54F.7000709@voltaire.com> Arlin Davis wrote: > Or Gerlitz wrote: >> I see now that the uDAPL CMA provider code uses the MTU 1:1 as >> returned by the SM in the path, so if the env is made of the Mellanox >> PCI-X HCA there can be big BW drop, etc... we have discussed that. >> I wonder how are you overcoming this when running Intel MPI w. OFED 1.0? > We are not. The problem exists with the CMA provider and OFED 1.0. mmm, I see. >> Also, i understand that OFED includes the uDAPL **SCM** provider, is >> it really tested/supported? if yes, i don't think it needs to be. It >> adds the overhead of one TCP connection per IB connection, creates two >> codes bases to maintain, makes the CMA less tested, you named it. >> If its not tested/supported sure we must not provide it. >> If you agree would you approach the OFED maintainers to remove the SCM >> provider from the udapl OFED 1.1 RPM? > You are correct. There is no need to support SCM moving forward. Lets move now, i have got disconnect problems/bug report from customer using the OFED/uDAPL/SCM provider, and i realize now its an unsupported library! please act to remove it from OFED 1.1 and on. thanks, Or. From ogerlitz at voltaire.com Wed Oct 11 23:54:47 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 12 Oct 2006 08:54:47 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: References: <452CE68D.8040709@voltaire.com> <20061011125248.GA5181@mellanox.co.il> <452CF63B.9030003@voltaire.com> Message-ID: <452DE6B7.2060902@voltaire.com> Roland Dreier wrote: > Or> Its not a rush its a move for enabling user space code that > Or> can offload IP Multicast. We have a library doing that which > Or> is coded over the gen1 stack and is now in porting for the > Or> gen2 stack. > > OK -- I would like to hear your experiences porting on top of this. Will let you (and everybody) know. Or. From eitan at mellanox.co.il Wed Oct 11 23:56:22 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 12 Oct 2006 08:56:22 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> Message-ID: <452DE716.7010109@mellanox.co.il> Roland Dreier wrote: > Eitan> If the tracking (ref counting) was done at the MAD level - > Eitan> no change to IPoIB would have been required ... > > It doesn't seem very feasible to implement a complete local copy of > the SA (in the kernel no less) so that we can allow unprivileged > processes to send on QP1. > I was not proposing a copy of the SA. I was more suggesting tracking down specific Class/Method/Attribute combinations (with the tracking we already do for TID). So the refcount is just tracking those. I do not see why it is more complicated then the code already in ib_multicast. Just down the stack. > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Wed Oct 11 23:57:13 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 12 Oct 2006 08:57:13 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> Message-ID: <452DE749.8050101@mellanox.co.il> Roland Dreier wrote: > Eitan> 1 that you know about. Others did not make it into the > Eitan> kernel but are quite productive to those running them. > > What are those others? > CFS is an example. > Eitan> Changing top API for ULPs and Clients is simpler to > Eitan> implement but provide wrong tradeoff for functionality that > Eitan> can be implemented under the hood - not burdening the rest > Eitan> of the world with a constant flux of API changes. > > Documentation/stable_api_nonsense.txt is an interesting read. > > - R. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Thu Oct 12 00:03:32 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 12 Oct 2006 09:03:32 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452D6117.6040400@ichips.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> <452D6117.6040400@ichips.intel.com> Message-ID: <452DE8C4.2040804@mellanox.co.il> Sean Hefty wrote: > Eitan Zahavi wrote: > >> So if it is then there is no problem sniffing it and refcounting. >> > > The MADs cannot simply be sniffed and counted. MADs which affect the same > multicast group should not always be sent. Join operations must be serialized > against leave operations, especially when the join/leave parameters differ. > But you did all that extra work in ib_multicast. You just need to attach it to EVERY SubnAdmin Set or Delete of MCMemberRecord. So the argument it is complicated does not mean you need to make it with a hole in it. The hold being the fact anybody can bypass it and the ref-counting will be broken. What is the point in providing the client the illusion of safety if eventually it does not work? > A join operation to an existing group may not result in a MAD being sent, so no > response from the SA is available. The act of joining or leaving a multicast > group is distinct from sending a MAD. The appearance of a MAD on the wire is > not always necessary. > > Consider that pushing this functionality down into the MAD layer also results in > pushing the related ib_sa functionality into the ib_mad module as well. > I disagree. If you sniff at the MAD level you can simply react to the lower level messages. > >> And there are other kernel level ULPs that use that IB_SA code and >> bypass ib_multicast >> > > There are no in tree users, which is my primary concern at the moment. The > ib_sa API still exists for out of tree users, but they will as broken as they > are today. > Consider gcc was asking to support only the code that is in some distribution. You build a broken architecture (one that fails to provide the functionality under all conditions) and hide behind not knowing the code that will break it. > >> Changing top API for ULPs and Clients is simpler to implement but >> provide wrong tradeoff for functionality that can be implemented under >> the hood - not burdening the rest of the world with a constant flux of >> API changes. >> > > I'm more concerned about getting the right API than trying to fit something into > an existing API just because its there. My proposal is to have the following > layers: > > ib_mad - sends and receives MADs on QP0/1 > ib_sa - sends and receives MADs to the SA > ib_multicast - manages multicast joins > > The alternative proposal I keep hearing is to combine these 3 layers under the > existing ib_mad API. However, the behavior of that API will change. > My proposal is of building a pluggable module that is below the ib_mad that tracks SubnAdmin.Set/Delete.MCMemberRecord and reacts accordingly: ib_mad <-> ib_multicast ib_sa > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Thu Oct 12 03:57:11 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 12 Oct 2006 12:57:11 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452DE749.8050101@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> <452DE749.8050101@mellanox.co.il> Message-ID: <452E1F87.4030705@voltaire.com> Eitan Zahavi wrote: > Roland Dreier wrote: >> Eitan> 1 that you know about. Others did not make it into the >> Eitan> kernel but are quite productive to those running them. >> >> What are those others? > CFS is an example. Lustre o2ibnld is using RC only and from what i know is coded over the CMA from day one of its gen2 porting. So there is no direct ib_sa usage in it. Or. From halr at voltaire.com Thu Oct 12 04:24:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Oct 2006 07:24:29 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_link_mgr.c: Use leaf HOQLife for IB router ports Message-ID: <1160652267.32093.106609.camel@hal.voltaire.com> OpenSM/osm_link_mgr.c: Use leaf HOQLife for IB router ports Signed-off-by: Hal Rosenstock --- Index: opensm/osm_link_mgr.c =================================================================== --- opensm/osm_link_mgr.c (revision 9771) +++ opensm/osm_link_mgr.c (working copy) @@ -264,7 +264,7 @@ __osm_link_mgr_set_physp_pi( IB_NODE_TYPE_ROUTER) { ib_port_info_set_hoq_lifetime( - p_pi, p_mgr->p_subn->opt.head_of_queue_lifetime); + p_pi, p_mgr->p_subn->opt.leaf_head_of_queue_lifetime); } else if (osm_node_get_type(osm_physp_get_node_ptr(p_physp)) == IB_NODE_TYPE_SWITCH) From johnt1johnt2 at gmail.com Thu Oct 12 04:43:20 2006 From: johnt1johnt2 at gmail.com (john t) Date: Thu, 12 Oct 2006 17:13:20 +0530 Subject: [openib-general] bypassing MPT and MTT lookups Message-ID: Hi This is with respect to Mellanox HCAs. Is there a way to bypass/disable MPT (Memory Protection Table) and MTT (Memory Translation Table) lookups done by HCA? If yes which function should be called from a module to achieve this. Regards, John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Thu Oct 12 06:00:17 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 12 Oct 2006 06:00:17 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061012130017.12BD72283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #6 from chas at cmf.nrl.navy.mil 2006-10-12 06:00 ------- calling netif_stop_queue() doesnt immediately stop the transmit queue. it might be necessary to take priv->tx_lock when calling netif_stop_queue() from ipoib_stop() to ensure that ipoib_start_xmit() isnt in the middle of some work. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Thu Oct 12 05:55:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Oct 2006 08:55:25 -0400 Subject: [openib-general] [PATCH] OpenSM/osmtest.c: Fix float calculation in osmtest_stress_large_rmpp_pr Message-ID: <1160657724.32093.110260.camel@hal.voltaire.com> OpenSM/osmtest.c: Fix float calculation in osmtest_stress_large_rmpp_pr Signed-off-by: Hal Rosenstock --- Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 9795) +++ osmtest/osmtest.c (working copy) @@ -2868,7 +2868,7 @@ osmtest_stress_large_rmpp_pr( IN osmtest if (num_recs == 0) ratio = 0; else - ratio = (float)(num_queries / num_recs); + ratio = ((float)num_queries / (float)num_recs); printf( "-I- Queries to Record Ratio is %" PRIu64 " records, %" PRIu64 " queries : %.2f \n", num_recs, num_queries, ratio); print_freq = 0; From bugzilla-daemon at openib.org Thu Oct 12 06:07:06 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 12 Oct 2006 06:07:06 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061012130706.691132283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #7 from mst at mellanox.co.il 2006-10-12 06:07 ------- why is it necessary to ensure that ipoib_start_xmit() isnt in the middle of some work? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Thu Oct 12 07:17:07 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 12 Oct 2006 07:17:07 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061012141707.47F502283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #8 from chas at cmf.nrl.navy.mil 2006-10-12 07:17 ------- ipoib_start_xmit() only checks at entry to see if the queue is stopped. ipoib_start_xmit() could still unicast_arp_send() after a netif_stop_queue(). in ipoib_stop(), i guess this will be synchronized somewhat by the ipoib_flush_paths() in ipoib_ib_dev_down() which also takes priv->tx_lock but this doesnt seem intentional. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tziporet at dev.mellanox.co.il Thu Oct 12 07:44:05 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 12 Oct 2006 16:44:05 +0200 Subject: [openib-general] We wish to do the 1.1 release next week Message-ID: <452E54B5.7030500@dev.mellanox.co.il> Hi all, I am back from vacation and found you waited with the release for me :-) From a quick look at status mails I think we can do the official release next week. Please reply if there are still any blocking issues you have. Also - please update all documents till end of Monday next week. Tziporet From bugzilla-daemon at openib.org Thu Oct 12 07:54:44 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 12 Oct 2006 07:54:44 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061012145444.348AF2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #9 from mst at mellanox.co.il 2006-10-12 07:54 ------- Created an attachment (id=62) --> (http://openib.org/bugzilla/attachment.cgi?id=62&action=view) Please test this patch - does the crash happen with it? Interesting. As a test, Scott, could you pls check what happens with the attached patch applied? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Thu Oct 12 08:25:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 12 Oct 2006 08:25:33 -0700 Subject: [openib-general] bypassing MPT and MTT lookups In-Reply-To: (john t.'s message of "Thu, 12 Oct 2006 17:13:20 +0530") References: Message-ID: john> Hi This is with respect to Mellanox HCAs. Is there a way to john> bypass/disable MPT (Memory Protection Table) and MTT (Memory john> Translation Table) lookups done by HCA? If yes which john> function should be called from a module to achieve this. ib_get_dma_mr() gives you an L_Key that does not perform any translation. From dotanb at dev.mellanox.co.il Thu Oct 12 08:53:10 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 12 Oct 2006 17:53:10 +0200 Subject: [openib-general] what happens if one close the device in user level without releasing the resources? Message-ID: <452E64E6.1010608@dev.mellanox.co.il> Hi. What should happen if one opens the IB device, allocate resources and close the device? for example, if a user do the following operations in a loop: ibv_get_device_list in a loop: ibv_open_device ibv_alloc_pd ibv_create_cq ibv_close_device? should the ibv_close_device clean all of the allocated resources or it is up to the user to take care of this? thanks Dotan From mshefty at ichips.intel.com Thu Oct 12 09:45:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 09:45:24 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061012060253.GA13181@mellanox.co.il> References: <452D78E7.5040906@ichips.intel.com> <20061012060253.GA13181@mellanox.co.il> Message-ID: <452E7124.4010407@ichips.intel.com> Michael S. Tsirkin wrote: > Hmm, sorry, I forgot. > Could you restate what the ib_cm/rdma_cm problem is, please? > Shouldn't we solve that, too? The ib_multicast API needs register/unregister calls to prevent module unload races. The ib_cm and rdma_cm have the issue if a client uses the return value from the callback to destroy their cm_id's. Passive side users are fine, since listening id's cannot be destroyed using this method. Solutions are to add register/unregister calls, or to limit which id's can be destroyed from the callback. - Sean From mst at mellanox.co.il Thu Oct 12 09:56:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 18:56:42 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452E7124.4010407@ichips.intel.com> References: <452E7124.4010407@ichips.intel.com> Message-ID: <20061012165641.GC15578@mellanox.co.il> Quoting r. Sean Hefty : > The ib_cm and rdma_cm have the issue if a client uses the return value from > the callback to destroy their cm_id's. But what is the issue? some kind of race? -- MST From mshefty at ichips.intel.com Thu Oct 12 10:11:47 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 10:11:47 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452DE8C4.2040804@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> <452D6117.6040400@ichips.intel.com> <452DE8C4.2040804@mellanox.co.il> Message-ID: <452E7753.80402@ichips.intel.com> Eitan Zahavi wrote: > I disagree. If you sniff at the MAD level you can simply react to the > lower level messages. First, when designing this, I did consider using the MAD snooping ability, and changing what could be done with snooping. However, the multicast handling is not simply sniffing MADs going out on the wire and incrementing / decrementing some count. It can change or prevent a MAD from being sent. This is a fundamental change to the behavior of the ib_mad APIs. MADs are sent and tracked by their respective registered ib_mad clients. Trying to push this down into the MAD layer means that the send request from one client may now occur on some other client's registration. If that client decides to unregister in the middle of their send, the operation is canceled, and now needs to be restarted on some other registration. And even though the operation was canceled, we still need to know whether it was seen by the SA. This requires sniffing all MADs, and quickly gets extremely complex. In order to avoid issues these with which registered client is actually performing the operation, the solution is to filter multicast requests through a single registration. The ib_mad layer is complex enough as it is. (Have you tried tracing a MAD through the send path?) We don't need to push even more functionality down into it. - Sean From sean.hefty at intel.com Thu Oct 12 10:52:11 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 10:52:11 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061012165641.GC15578@mellanox.co.il> Message-ID: <000101c6ee27$25b2fc20$21258686@amr.corp.intel.com> >But what is the issue? some kind of race? If we look at just the ib_multicast patches as an example... Calling ib_join_multicast allocates a struct ib_multicast that must be freed. Here's the relevant portion of ipoib's join callback: @@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s /* Clear the busy flag so we try again */ + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, + &mcast->flags); } + return status; } The callback clears the busy flag, and frees the structure by returning a non-zero value from the callback. (This is convenient for error handling.) Let the callback thread hang around right at the return statement for a while. When ipoib is unloaded, one of the calls it makes during cleanup is ipoib_mcast_leave(), which does: if(test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) ib_free_multicast(mcast->mc); If ipoib_mcast_leave() is called at the same time that an error is reported through the callback, it's possible that the struct ib_multicast will be freed by the callback thread. But there's nothing to prevent the callback thread from executing in the ipoib code after unload has occurred. Similar issues can apply to ib_cm and rdma_cm. - Sean From mst at mellanox.co.il Thu Oct 12 11:13:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 20:13:32 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000101c6ee27$25b2fc20$21258686@amr.corp.intel.com> References: <000101c6ee27$25b2fc20$21258686@amr.corp.intel.com> Message-ID: <20061012181332.GA15881@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > >But what is the issue? some kind of race? > > If we look at just the ib_multicast patches as an example... > > Calling ib_join_multicast allocates a struct ib_multicast that must be freed. > Here's the relevant portion of ipoib's join callback: > > @@ -325,11 +328,10 @@ ipoib_mcast_sendonly_join_complete(int s > /* Clear the busy flag so we try again */ > + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, > + &mcast->flags); > } > + return status; > } > > The callback clears the busy flag, and frees the structure by returning a > non-zero value from the callback. (This is convenient for error handling.) Let > the callback thread hang around right at the return statement for a while. > > When ipoib is unloaded, one of the calls it makes during cleanup is > ipoib_mcast_leave(), which does: > > if(test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) > ib_free_multicast(mcast->mc); > > If ipoib_mcast_leave() is called at the same time that an error is reported > through the callback, it's possible that the struct ib_multicast will be freed > by the callback thread. But there's nothing to prevent the callback thread from > executing in the ipoib code after unload has occurred. > > Similar issues can apply to ib_cm and rdma_cm. > > - Sean > But unlike the sa races which were unfixable without API changes, here users can synchronize the removal of the mc object. So I think what you describe is a user error. For example SDP does lock cm_id = ssk->cm_id ssk->cm_id = NULL unlock if (cm_id) destroy(cm_id) and in callback ssk = context lock if (!ssk->cm_id) { unlock return 0; } ... -- MST From mshefty at ichips.intel.com Thu Oct 12 11:13:34 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 11:13:34 -0700 Subject: [openib-general] [PATCH 12/13] Re-write cma_add_one error cases In-Reply-To: <20061012045428.21952.93637.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> <20061012045428.21952.93637.sendpatchset@localhost.localdomain> Message-ID: <452E85CE.80505@ichips.intel.com> Krishna Kumar wrote: > @@ -2288,14 +2288,15 @@ static void cma_add_one(struct ib_device > struct cma_device *cma_dev; > struct rdma_id_private *id_priv; > > + if (!device->node_guid) > + return; I'm not sure that we even need this check anymore. All devices should have a node_guid set. Maybe we can just remove it, rather than moving it up. Btw, were version of the rdma_cm were these patches created against? - Sean From mst at mellanox.co.il Thu Oct 12 11:35:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 20:35:30 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000201c6ee2b$2dbe79e0$21258686@amr.corp.intel.com> References: <000201c6ee2b$2dbe79e0$21258686@amr.corp.intel.com> Message-ID: <20061012183530.GD15881@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > >But unlike the sa races which were unfixable without API changes, > >here users can synchronize the removal of the mc object. > >So I think what you describe is a user error. > > The user can ensure that an id is only destroyed once. What they cannot ensure > is whether their callback is still running. Ah. I get it. If my callback will return error, I must make sure I won't destroy the cm_id. But this means that I don't get the protection on destroy that was checking that callbacks have all gone. So, let's solve it in the same way we did for sa? -- MST From rdreier at cisco.com Thu Oct 12 11:38:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 12 Oct 2006 11:38:25 -0700 Subject: [openib-general] [PATCH 4/13] Re-write cma_work_handler error cases In-Reply-To: <20061012045335.21952.25699.sendpatchset@localhost.localdomain> (Krishna Kumar's message of "Thu, 12 Oct 2006 10:23:35 +0530") References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> <20061012045335.21952.25699.sendpatchset@localhost.localdomain> Message-ID: What's the motivation here? Lots of these patches look like a step backwards, for example this: > - if (!cma_comp_exch(id_priv, work->old_state, work->new_state)) > - goto out; > - > - if (id_priv->id.event_handler(&id_priv->id, &work->event)) { > - cma_exch(id_priv, CMA_DESTROYING); > - destroy = 1; > + if (cma_comp_exch(id_priv, work->old_state, work->new_state)) { > + if (id_priv->id.event_handler(&id_priv->id, &work->event)) { > + cma_exch(id_priv, CMA_DESTROYING); > + destroy = 1; > + } > } I find it much easier to read code like: if (error_happens()) goto err; if (another_error_happens()) goto err; do_stuff(); rather than if (!error_happens()) if (!another_error_happens()) do_stuff(); I really think it's preferable to use gotos to move error paths out-of-line and keep the common case as the main flow. - R. From mshefty at ichips.intel.com Thu Oct 12 11:40:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 11:40:43 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061012183530.GD15881@mellanox.co.il> References: <000201c6ee2b$2dbe79e0$21258686@amr.corp.intel.com> <20061012183530.GD15881@mellanox.co.il> Message-ID: <452E8C2B.9000902@ichips.intel.com> Michael S. Tsirkin wrote: > Ah. I get it. > If my callback will return error, I must make sure I won't destroy the cm_id. > But this means that I don't get the protection on destroy that was > checking that callbacks have all gone. > > > So, let's solve it in the same way we did for sa? Yes - I agree. I will update ib_multicast before resubmitting (I want to continue to collect other feedback first), then provide separate patches for ib_cm and rdma_cm. - Sean From mshefty at ichips.intel.com Thu Oct 12 11:11:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 11:11:18 -0700 Subject: [openib-general] [PATCH 0/13] Re-write error cases in CMA routines to simplify code In-Reply-To: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> References: <20061012045308.21952.90622.sendpatchset@localhost.localdomain> Message-ID: <452E8546.8060906@ichips.intel.com> Krishna Kumar wrote: > Re-write all cma error cases to simplify code. Splitting > it as multiple patches (one per routine) in case some are > found not required (in which case, later ones may apply > with a fuzz). Most of these seem to be a style issue. Should error handling be placed at the end of the function, or within an if (error) type check? Keeping it at the end of the function tends to make maintenance a little easier, otherwise, we end up either duplicating the error handling, or moving it back to the end of the function. I think the following patches make sense to apply: 3, 4, and 7. I'm not sure about 12. - Sean From sean.hefty at intel.com Thu Oct 12 11:21:03 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 11:21:03 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061012181332.GA15881@mellanox.co.il> Message-ID: <000201c6ee2b$2dbe79e0$21258686@amr.corp.intel.com> >But unlike the sa races which were unfixable without API changes, >here users can synchronize the removal of the mc object. >So I think what you describe is a user error. The user can ensure that an id is only destroyed once. What they cannot ensure is whether their callback is still running. >For example SDP does >lock >cm_id = ssk->cm_id >ssk->cm_id = NULL >unlock > >if (cm_id) >destroy(cm_id) If cm_id is NULL, there's no way to know if the callback is still running. >and in callback > >ssk = context >lock >if (!ssk->cm_id) { > unlock > return 0; >} I'm assuming that the rest of the code looks something like: ssk->cm_id = NULL; unlock; return 1; - Sean From mst at mellanox.co.il Thu Oct 12 12:12:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 21:12:06 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061011.144137.18281355.davem@davemloft.net> References: <20061011.144137.18281355.davem@davemloft.net> Message-ID: <20061012191206.GA16516@mellanox.co.il> Quoting r. David Miller : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" > Date: Wed, 11 Oct 2006 23:23:39 +0200 > > > With my patch, there is a huge performance gain by increasing MTU to 64K. > > And it seems the only way to do this is by S/G. > > Numbers? > I created two subnets on top of the same pair infiniband HCAs: root at sw069 ~]# ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:12.4.3.69 Bcast:12.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1382531 errors:0 dropped:0 overruns:0 frame:0 TX packets:2725206 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:71892772 (68.5 MiB) TX bytes:5290011992 (4.9 GiB) [root at sw069 ~]# ifconfig ibc0 ibc0 Link encap:UNSPEC HWaddr 00-03-04-06-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:11.4.3.69 Bcast:11.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65484 Metric:1 RX packets:115647 errors:0 dropped:0 overruns:0 frame:0 TX packets:253403 errors:0 dropped:4 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:6014720 (5.7 MiB) TX bytes:16589589008 (15.4 GiB) The other side was configured with 12.4.3.68 for MTU 65484 and 11.4.3.68 for MTU 2044. And then I just run netperf: [root at sw069 ~]# [root at sw069 ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 12.4.3.68 -c -C TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 12.4.3.68 (12.4.3.68) port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. MBytes /s % S % S us/KB us/KB 87380 16384 16384 10.00 286.45 40.20 25.28 5.482 3.448 [root at sw069 ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 11.4.3.68 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. MBytes/sec 87380 16384 16384 10.01 782.55 This is all very preliminary - but I hope you get the idea - increasing MTU is very helpful for infiniband, and infiniband adapters handle large S/G lists without problems, but the verbs do not include support for IP checksums, so these must be done in software. So what we would like, is for the infiniband network device to say "I don't support checksums, I only support S/G" and then for network layer to do the checksumming for us piggybacking on data copy at least for cases where it does perform the copy. Does this makes sense now? -- MST From mst at mellanox.co.il Thu Oct 12 12:13:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 21:13:25 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452E8C2B.9000902@ichips.intel.com> References: <452E8C2B.9000902@ichips.intel.com> Message-ID: <20061012191325.GB16516@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > Michael S. Tsirkin wrote: > > Ah. I get it. > > If my callback will return error, I must make sure I won't destroy the cm_id. > > But this means that I don't get the protection on destroy that was > > checking that callbacks have all gone. > > > > > > So, let's solve it in the same way we did for sa? > > Yes - I agree. I will update ib_multicast before resubmitting (I want to > continue to collect other feedback first), then provide separate patches for > ib_cm and rdma_cm. Another comment that I'd like not to get in the noise is that we need to handle the full set of SA queries, not just EQ. -- MST From mshefty at ichips.intel.com Thu Oct 12 12:25:45 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 12:25:45 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> Message-ID: <452E96B9.4030604@ichips.intel.com> Roland Dreier wrote: > We want a way for unprivileged userspace to be able to use multicast. > Usually I say "just use a privileged daemon in userspace" but I think > in this case we actually need coordination between the kernel and > userspace to track _all_ multicast joins, so it does make sense for > this to be in the kernel. On this same thought, do you have an idea of an interface that you'd accept to export raw IB multicast support up to userspace? - Sean From sean.hefty at intel.com Thu Oct 12 12:49:49 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 12:49:49 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <20061012191325.GB16516@mellanox.co.il> Message-ID: <000301c6ee37$94688e90$21258686@amr.corp.intel.com> >Another comment that I'd like not to get in the noise is that we need >to handle the full set of SA queries, not just EQ. I think that functionality can be added separately, but ib_multicast is only intended to handle Set / Delete methods. Get and GetTable methods would still go through the ib_sa. Are you referring to more complex Set (join) operations or actual Get queries? - Sean From mst at mellanox.co.il Thu Oct 12 13:07:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 12 Oct 2006 22:07:26 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000301c6ee37$94688e90$21258686@amr.corp.intel.com> References: <000301c6ee37$94688e90$21258686@amr.corp.intel.com> Message-ID: <20061012200726.GD16516@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port > > >Another comment that I'd like not to get in the noise is that we need > >to handle the full set of SA queries, not just EQ. > > I think that functionality can be added separately, but ib_multicast is only > intended to handle Set / Delete methods. Get and GetTable methods would still > go through the ib_sa. Are you referring to more complex Set (join) operations > or actual Get queries? AFAIK both Set/Delete and Get quries have selectors, multicast module seems to only support EQ selectors. The reason to do this is for heterogenous networks. -- MST From rdreier at cisco.com Thu Oct 12 13:09:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 12 Oct 2006 13:09:59 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452E96B9.4030604@ichips.intel.com> (Sean Hefty's message of "Thu, 12 Oct 2006 12:25:45 -0700") References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452E96B9.4030604@ichips.intel.com> Message-ID: Sean> On this same thought, do you have an idea of an interface Sean> that you'd accept to export raw IB multicast support up to Sean> userspace? I'm not sure -- who are the customers for this? What do they want to do really? - R. From sean.hefty at intel.com Thu Oct 12 13:34:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 12 Oct 2006 13:34:04 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: Message-ID: <000401c6ee3d$c2ce8d10$21258686@amr.corp.intel.com> >I'm not sure -- who are the customers for this? What do they want to >do really? The main customers I'm aware of are MPI developers. For direct IB multicast usage, the national labs are converting some of their MPI algorithms to use IB multicast. Their current approach is to create a multicast group without specifying an MGID. Once the group is created, the group information will be given to other ranks out of band (using TCP or something similar), who then join the group. Matt can probably provide more details, but a student is doing the actual work. I believe that they're trying to use the userspace SA patches that I posted to the list, but how everything gets implemented under the proposed libibusa interface probably doesn't matter much. - Sean From arlin.r.davis at intel.com Thu Oct 12 15:37:25 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Thu, 12 Oct 2006 15:37:25 -0700 Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. Message-ID: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> Here is a patch to remove uDAPL scm provider from the build since it is no longer needed nor supported. This provider was merely a stop gap until uCMA was pushed into kernel. Tziporet, can you get this change into OFED 1.1? Signed-off by: Arlin Davis ardavis at ichips.intel.com Index: doc/dat.conf =================================================================== --- doc/dat.conf (revision 9781) +++ doc/dat.conf (working copy) @@ -6,19 +6,10 @@ # \ # # -# Example for openib_cma and openib_scm -# -# For cma version you specify as: +# For the uDAPL cma provder, specify as one of the following: # network address, network hostname, or netdev name and 0 for port # -# For scm version you specify as actual device name and port -# # Simple (OpenIB-cma) default with netdev name provided first on list # to enable use of same dat.conf version on all nodes # OpenIB-cma u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 "ib0 0" "" -OpenIB-cma-ip u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 "192.168.0.22 0" "" -OpenIB-cma-name u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 "svr1-ib0 0" "" -OpenIB-cma-netdev u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 "ib0 0" "" -OpenIB-scm1 u1.2 nonthreadsafe default /usr/lib/libdaplscm.so mv_dapl.1.2 "mthca0 1" "" -OpenIB-scm2 u1.2 nonthreadsafe default /usr/lib/libdaplscm.so mv_dapl.1.2 "mthca0 2" "" Index: Makefile.am =================================================================== --- Makefile.am (revision 9781) +++ Makefile.am (working copy) @@ -18,11 +18,9 @@ datlibdir = $(libdir) dapllibcmadir = $(libdir) -dapllibscmdir = $(libdir) datlib_LTLIBRARIES = dat/udat/libdat.la dapllibcma_LTLIBRARIES = dapl/udapl/libdaplcma.la -dapllibscm_LTLIBRARIES = dapl/udapl/libdaplscm.la dat_udat_libdat_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) \ -I$(srcdir)/dat/include/ -I$(srcdir)/dat/udat/ \ @@ -34,21 +32,13 @@ -I$(srcdir)/dapl/common -I$(srcdir)/dapl/udapl/linux \ -I$(srcdir)/dapl/openib_cma -dapl_udapl_libdaplscm_la_CFLAGS = -Wall $(DBGFLAGS) -D_GNU_SOURCE $(OSFLAGS) \ - -DOPENIB -DCQ_WAIT_OBJECT \ - -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ - -I$(srcdir)/dapl/common -I$(srcdir)/dapl/udapl/linux \ - -I$(srcdir)/dapl/openib_scm - if HAVE_LD_VERSION_SCRIPT dat_version_script = -Wl,--version-script=$(srcdir)/dat/udat/libdat.map daplcma_version_script = -Wl,--version-script=$(srcdir)/dapl/udapl/libdaplcma.map - daplscm_version_script = -Wl,--version-script=$(srcdir)/dapl/udapl/libdaplscm.map else dat_version_script = daplcma_version_script = - daplscm_version_script = endif @@ -177,116 +167,6 @@ -Wl,-init,dapl_init -Wl,-fini,dapl_fini \ -lpthread -libverbs -lrdmacm - -# -# uDAPL OpenIB Socket CM version: libdaplscm.so -# -dapl_udapl_libdaplscm_la_SOURCES = dapl/udapl/dapl_init.c \ - dapl/udapl/dapl_evd_create.c \ - dapl/udapl/dapl_evd_query.c \ - dapl/udapl/dapl_cno_create.c \ - dapl/udapl/dapl_cno_modify_agent.c \ - dapl/udapl/dapl_cno_free.c \ - dapl/udapl/dapl_cno_wait.c \ - dapl/udapl/dapl_cno_query.c \ - dapl/udapl/dapl_lmr_create.c \ - dapl/udapl/dapl_evd_wait.c \ - dapl/udapl/dapl_evd_disable.c \ - dapl/udapl/dapl_evd_enable.c \ - dapl/udapl/dapl_evd_modify_cno.c \ - dapl/udapl/dapl_evd_set_unwaitable.c \ - dapl/udapl/dapl_evd_clear_unwaitable.c \ - dapl/udapl/linux/dapl_osd.c \ - dapl/common/dapl_cookie.c \ - dapl/common/dapl_cr_accept.c \ - dapl/common/dapl_cr_query.c \ - dapl/common/dapl_cr_reject.c \ - dapl/common/dapl_cr_util.c \ - dapl/common/dapl_cr_callback.c \ - dapl/common/dapl_cr_handoff.c \ - dapl/common/dapl_ep_connect.c \ - dapl/common/dapl_ep_create.c \ - dapl/common/dapl_ep_disconnect.c \ - dapl/common/dapl_ep_dup_connect.c \ - dapl/common/dapl_ep_free.c \ - dapl/common/dapl_ep_reset.c \ - dapl/common/dapl_ep_get_status.c \ - dapl/common/dapl_ep_modify.c \ - dapl/common/dapl_ep_post_rdma_read.c \ - dapl/common/dapl_ep_post_rdma_write.c \ - dapl/common/dapl_ep_post_recv.c \ - dapl/common/dapl_ep_post_send.c \ - dapl/common/dapl_ep_query.c \ - dapl/common/dapl_ep_util.c \ - dapl/common/dapl_evd_dequeue.c \ - dapl/common/dapl_evd_free.c \ - dapl/common/dapl_evd_post_se.c \ - dapl/common/dapl_evd_resize.c \ - dapl/common/dapl_evd_util.c \ - dapl/common/dapl_evd_cq_async_error_callb.c \ - dapl/common/dapl_evd_qp_async_error_callb.c \ - dapl/common/dapl_evd_un_async_error_callb.c \ - dapl/common/dapl_evd_connection_callb.c \ - dapl/common/dapl_evd_dto_callb.c \ - dapl/common/dapl_get_consumer_context.c \ - dapl/common/dapl_get_handle_type.c \ - dapl/common/dapl_hash.c \ - dapl/common/dapl_hca_util.c \ - dapl/common/dapl_ia_close.c \ - dapl/common/dapl_ia_open.c \ - dapl/common/dapl_ia_query.c \ - dapl/common/dapl_ia_util.c \ - dapl/common/dapl_llist.c \ - dapl/common/dapl_lmr_free.c \ - dapl/common/dapl_lmr_query.c \ - dapl/common/dapl_lmr_util.c \ - dapl/common/dapl_lmr_sync_rdma_read.c \ - dapl/common/dapl_lmr_sync_rdma_write.c \ - dapl/common/dapl_mr_util.c \ - dapl/common/dapl_provider.c \ - dapl/common/dapl_sp_util.c \ - dapl/common/dapl_psp_create.c \ - dapl/common/dapl_psp_create_any.c \ - dapl/common/dapl_psp_free.c \ - dapl/common/dapl_psp_query.c \ - dapl/common/dapl_pz_create.c \ - dapl/common/dapl_pz_free.c \ - dapl/common/dapl_pz_query.c \ - dapl/common/dapl_pz_util.c \ - dapl/common/dapl_rmr_create.c \ - dapl/common/dapl_rmr_free.c \ - dapl/common/dapl_rmr_bind.c \ - dapl/common/dapl_rmr_query.c \ - dapl/common/dapl_rmr_util.c \ - dapl/common/dapl_rsp_create.c \ - dapl/common/dapl_rsp_free.c \ - dapl/common/dapl_rsp_query.c \ - dapl/common/dapl_cno_util.c \ - dapl/common/dapl_set_consumer_context.c \ - dapl/common/dapl_ring_buffer_util.c \ - dapl/common/dapl_name_service.c \ - dapl/common/dapl_timer_util.c \ - dapl/common/dapl_ep_create_with_srq.c \ - dapl/common/dapl_ep_recv_query.c \ - dapl/common/dapl_ep_set_watermark.c \ - dapl/common/dapl_srq_create.c \ - dapl/common/dapl_srq_free.c \ - dapl/common/dapl_srq_query.c \ - dapl/common/dapl_srq_resize.c \ - dapl/common/dapl_srq_post_recv.c \ - dapl/common/dapl_srq_set_lw.c \ - dapl/common/dapl_srq_util.c \ - dapl/common/dapl_debug.c \ - dapl/openib_scm/dapl_ib_util.c \ - dapl/openib_scm/dapl_ib_cq.c \ - dapl/openib_scm/dapl_ib_qp.c \ - dapl/openib_scm/dapl_ib_cm.c \ - dapl/openib_scm/dapl_ib_mem.c - -dapl_udapl_libdaplscm_la_LDFLAGS = -version-info 1:2:0 $(daplscm_version_script) \ - -Wl,-init,dapl_init -Wl,-fini,dapl_fini \ - -lpthread -libverbs - libdatincludedir = $(includedir)/dat libdatinclude_HEADERS = dat/include/dat/dat.h \ From sashak at voltaire.com Thu Oct 12 17:35:17 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 13 Oct 2006 02:35:17 +0200 Subject: [openib-general] [PATCH] opensm: mcast tables dump improvement Message-ID: <20061013003517.GA20139@sashak.voltaire.com> This improves switch's mcast tables dumping and eliminates multiple file open/seek/close sequences. In one word - cleanup. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_mcast_mgr.c | 108 +++++++++++++++++++++----------------------- 1 files changed, 52 insertions(+), 56 deletions(-) diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index cb0ffb1..f4d6954 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -53,6 +53,7 @@ #endif /* HAVE_CONFIG_H */ #include #include #include +#include #include #include #include @@ -1377,10 +1378,12 @@ osm_mcast_mgr_process_tree( /********************************************************************** **********************************************************************/ + static void -osm_mcast_mgr_dump_mcast_routes( +mcast_mgr_dump_sw_routes( IN const osm_mcast_mgr_t* const p_mgr, - IN const osm_switch_t* const p_sw ) + IN const osm_switch_t* const p_sw, + IN FILE *p_mcfdbFile) { osm_mcast_tbl_t* p_tbl; int16_t mlid_ho = 0; @@ -1390,35 +1393,14 @@ osm_mcast_mgr_dump_mcast_routes( char line[OSM_REPORT_LINE_SIZE]; boolean_t print_lid; const osm_node_t* p_node; - FILE * p_mcfdbFile; uint16_t i, j; uint16_t mask_entry; - char *file_name = NULL; - OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_dump_mcast_routes ); + OSM_LOG_ENTER( p_mgr->p_log, mcast_mgr_dump_sw_routes ); if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) goto Exit; - file_name = - (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); - - CL_ASSERT(file_name); - - strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); - strcat(file_name, "/osm.mcfdbs"); - - /* Open the file or error */ - p_mcfdbFile = fopen(file_name, "a"); - if (! p_mcfdbFile) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "osm_mcast_mgr_dump_mcast_routes: ERR 0A23: " - "Failed to open mcfdb file (%s)\n", - file_name ); - goto Exit; - } - p_node = osm_switch_get_node_ptr( p_sw ); p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); @@ -1459,30 +1441,56 @@ osm_mcast_mgr_dump_mcast_routes( block_num++; } - fclose(p_mcfdbFile); - Exit: - if (file_name) - free(file_name); OSM_LOG_EXIT( p_mgr->p_log ); } +/********************************************************************** + **********************************************************************/ + +struct mcast_mgr_dump_context { + osm_mcast_mgr_t *p_mgr; + FILE *file; +}; + static void -__unlink_mcast_fdb(IN osm_mcast_mgr_t* const p_mgr) +mcast_mgr_dump_table(cl_map_item_t *p_map_item, void *context) { - char *file_name = NULL; + osm_switch_t *p_sw = (osm_switch_t *)p_map_item; + struct mcast_mgr_dump_context *cxt = context; - /* remove the old fdb dump file: */ - file_name = - (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); + mcast_mgr_dump_sw_routes(cxt->p_mgr, p_sw, cxt->file); +} - if( file_name ) - { - strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); - strcat(file_name, "/osm.mcfdbs"); - unlink(file_name); - free(file_name); - } +static void +mcast_mgr_dump_mcast_routes(osm_mcast_mgr_t *p_mgr) +{ + char file_name[1024]; + struct mcast_mgr_dump_context dump_context; + FILE *file; + + if (!osm_log_is_active(p_mgr->p_log, OSM_LOG_ROUTING)) + return; + + snprintf(file_name, sizeof(file_name), "%s/%s", + p_mgr->p_subn->opt.dump_files_dir, "osm.mcfdbs"); + + file = fopen(file_name, "w"); + if (!file) { + osm_log(p_mgr->p_log, OSM_LOG_ERROR, + "mcast_dump_mcast_routes: ERR 0A18: " + "cannot create mcfdb file \'%s\': %s\n", + file_name, strerror(errno)); + return; + } + + dump_context.p_mgr = p_mgr; + dump_context.file = file; + + cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, + mcast_mgr_dump_table, &dump_context); + + fclose(file); } /********************************************************************** @@ -1518,12 +1526,6 @@ osm_mcast_mgr_process_mgrp( goto Exit; } - /* initialize the mc fdb dump file: */ - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) - { - __unlink_mcast_fdb( p_mgr ); - } - /* Walk the switches and download the tables for each. */ @@ -1534,11 +1536,11 @@ osm_mcast_mgr_process_mgrp( if( signal == OSM_SIGNAL_DONE_PENDING ) pending_transactions = TRUE; - osm_mcast_mgr_dump_mcast_routes( p_mgr, p_sw ); - p_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); } + mcast_mgr_dump_mcast_routes( p_mgr ); + Exit: OSM_LOG_EXIT( p_mgr->p_log ); @@ -1594,12 +1596,6 @@ osm_mcast_mgr_process( p_mgrp = (osm_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); } - /* initialize the mc fdb dump file: */ - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) - { - __unlink_mcast_fdb( p_mgr ); - } - /* Walk the switches and download the tables for each. */ @@ -1610,11 +1606,11 @@ osm_mcast_mgr_process( if( signal == OSM_SIGNAL_DONE_PENDING ) pending_transactions = TRUE; - osm_mcast_mgr_dump_mcast_routes( p_mgr, p_sw ); - p_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); } + mcast_mgr_dump_mcast_routes( p_mgr ); + CL_PLOCK_RELEASE( p_mgr->p_lock ); OSM_LOG_EXIT( p_mgr->p_log ); -- 1.4.2.3.g128e From somenath at veritas.com Wed Oct 11 18:48:58 2006 From: somenath at veritas.com (somenath) Date: Wed, 11 Oct 2006 18:48:58 -0700 Subject: [openib-general] APM support in openib stack Message-ID: <452D9F0A.1000000@veritas.com> hi, I am trying to use the APM support in openib kernel stack and facing some problems. here are the steps I follow: 1. first resolve both the path, primary and alternate path. 2. send REQ using: active_param.primary_path = path; active_param.alternate_path = alt_path; ib_send_cm_req( cm_id, &active_param); this call doesn't return any error 3. in req_handler() we follow the same steps as we have done without APM.. i.e. create qpairs, change qp state to RTR and then send REP. however, when trying to change state to RTR usinb ib_modify_qp() I get an error (-22). two info: same code will work if I pass alt_path as NULL or change the alt_path as primary path. I must be missing something here, I assume this basic APM feature works in RHEL4 update 4 distribtion of openib stack. thanks, som. From davem at davemloft.net Thu Oct 12 21:22:40 2006 From: davem at davemloft.net (David Miller) Date: Thu, 12 Oct 2006 21:22:40 -0700 (PDT) Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061012191206.GA16516@mellanox.co.il> References: <20061011.144137.18281355.davem@davemloft.net> <20061012191206.GA16516@mellanox.co.il> Message-ID: <20061012.212240.26278742.davem@davemloft.net> From: "Michael S. Tsirkin" Date: Thu, 12 Oct 2006 21:12:06 +0200 > Quoting r. David Miller : > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > Numbers? > > I created two subnets on top of the same pair infiniband HCAs: I was asking for SG vs. non-SG numbers so I could see proof that it really does help like you say it will. From krkumar2 at in.ibm.com Thu Oct 12 21:30:17 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 13 Oct 2006 10:00:17 +0530 Subject: [openib-general] [PATCH 4/13] Re-write cma_work_handler error cases In-Reply-To: Message-ID: Roland Dreier wrote on 10/13/2006 12:08:25 AM: > What's the motivation here? There's lot of code doing : ret = fn() if (ret) goto err; return 0; err: one_line_cleanup; return ret; which could be easily made easier to code/understand as : ret = fn() if (ret) one_line_cleanup; return ret; I guess we could even change that to unlikely(err) to prevent error paths from getting loaded into the instruction pipeline. Also, almost 30 lines of code were removed (though I didn't check the obj size change which may not be much). > I find it much easier to read code like: For this particular case you pointed out, that is true. But a lot of other places do the code that I showed above - where there are no multiple goto's to error paths, and hence would make sense to do this. That is also the reason I split the patch into multiple patches so that ones that are accepted could be merged rather than one big patch which incorporates everything. Thanks, - KK From krkumar2 at in.ibm.com Thu Oct 12 21:40:39 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 13 Oct 2006 10:10:39 +0530 Subject: [openib-general] [PATCH 4/13] Re-write cma_work_handler error cases In-Reply-To: Message-ID: > There's lot of code doing : To give examples on the above statement, the patches : #2, #8, #10, #12 (all are one line, and one "go to" case patches) #6, #9, #11, #13 (these are 2 line error handling, not 1 line) #3 cleans up error handling to remove multiple goto err's. #7 is an optimization in case dev_list is empty. Thanks, - KK Krishna Kumar2/India/IBM wrote on 10/13/2006 10:00:17 AM: > Roland Dreier wrote on 10/13/2006 12:08:25 AM: > > > What's the motivation here? > > There's lot of code doing : > > ret = fn() > if (ret) > goto err; > return 0; > err: > one_line_cleanup; > return ret; > > which could be easily made easier to code/understand as : > ret = fn() > if (ret) > one_line_cleanup; > return ret; > > I guess we could even change that to unlikely(err) to prevent error > paths from getting loaded into the instruction pipeline. Also, almost > 30 lines of code were removed (though I didn't check the obj size > change which may not be much). > > > I find it much easier to read code like: > For this particular case you pointed out, that is true. But a lot of other > places do the code that I showed above - where there are no multiple > goto's to error paths, and hence would make sense to do this. That is > also the reason I split the patch into multiple patches so that ones that > are accepted could be merged rather than one big patch which > incorporates everything. > > Thanks, > > - KK From krkumar2 at in.ibm.com Thu Oct 12 21:46:38 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 13 Oct 2006 10:16:38 +0530 Subject: [openib-general] [PATCH 0/13] Re-write error cases in CMA routines to simplify code In-Reply-To: <452E8546.8060906@ichips.intel.com> Message-ID: Hi Sean, > Most of these seem to be a style issue. Should error handling be placed at the > end of the function, or within an if (error) type check? Keeping it at the end > of the function tends to make maintenance a little easier, otherwise, we end up > either duplicating the error handling, or moving it back to the end of the function. Correct. But many patches have just one "go to" case and the error handling is also one line. So an "if (unlikely(err))" (since unlikely is still in favour with kernel community, last I saw) could be used to handle these cases. But if there is a case of multiple allocation or other failure to be handled, then it makes sense to put those cases at the bottom, like : if (kmalloc() fails) goto out1; if (another kmalloc fails) goto out2; if (fn_fails) goto out3; out3: out2: out1: etc... I have avoided changing those codes in this patchset since I believe that is the correct way to do error handling. Thanks, - KK From krkumar2 at in.ibm.com Thu Oct 12 21:55:06 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 13 Oct 2006 10:25:06 +0530 Subject: [openib-general] [PATCH 12/13] Re-write cma_add_one error cases In-Reply-To: <452E85CE.80505@ichips.intel.com> Message-ID: Sean Hefty wrote on 10/12/2006 11:43:34 PM: > > + if (!device->node_guid) > > + return; > > I'm not sure that we even need this check anymore. All devices should have a > node_guid set. Maybe we can just remove it, rather than moving it up. OK. > Btw, were version of the rdma_cm were these patches created against? Oops, I made it against older bits. Should I recreate this patchset against latest tree (and change #12 as suggested above) ? Thanks, - KK From mst at mellanox.co.il Thu Oct 12 23:17:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 13 Oct 2006 08:17:25 +0200 Subject: [openib-general] Dropping NETIF_F_SG since no checksum feature. In-Reply-To: <20061012.212240.26278742.davem@davemloft.net> References: <20061012.212240.26278742.davem@davemloft.net> Message-ID: <20061013061725.GB12571@mellanox.co.il> Quoting r. David Miller : > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" > Date: Thu, 12 Oct 2006 21:12:06 +0200 > > > Quoting r. David Miller : > > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > > > Numbers? > > > > I created two subnets on top of the same pair infiniband HCAs: > > I was asking for SG vs. non-SG numbers so I could see proof > that it really does help like you say it will. > Dave, thanks for the clarification. Please note that ib0 is a non-SG device with MTU 2K, sorry that I forgot to mention that. so, to summarize my previous mail: interface flags mtu bandwidth ib0 linear(0) 2044 286.45 ibc0 _F_SG 65484 782.55 If I will set both ib0 and ibc0 to 64K MTU, then benchmark-mode with the same MTU SG is somewhat slower than non-SG (I tested this at some point, by some 10%, don't have the numbers at the moment - do you want to see them?). I did not claim it is faster to do SG with same MTU and it is I think clear why linear should be faster for copy *with the same MTU*. But do you really think that we will be able to allocate even a single 64K linear skb after the machine has been active for a while? My assumption is that if I want to reliably get MTU > PAGE_SIZE I must support SG. Is it the wrong one? If this assumption is correct, then below is my line of thinking: - with infiniband we provably get a 2.5x speedup with MTU of 64K vs to 2K. - to get packets of that size reliably we must declare S/G support - infiniband verbs do not support IP checksumming - per network algorithmics, it is better to piggyback checksum calculation on copying if copying takes place For this reason, I would like to define the meaning of S/G set when checksum bits are all clear as "we support S/G but not checksum, please checksum for us if you copy data anyway". Alternatively, add a new NETIF_F_??_CSUM bit to mean this capability. Does this make sense? Thanks, -- MST From swise at opengridcomputing.com Fri Oct 13 07:23:34 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 13 Oct 2006 09:23:34 -0500 Subject: [openib-general] iwarp stable tag Message-ID: <1160749414.4406.11.camel@stevo-desktop> Hello, I've tagged the current version of the iwarp branch. This tag (or copy really) marks the 2.6.17 stable version of the iwarp branch including the chelsio rnic LLD and RDMA drivers. This tag should be used if you intend to run iwarp on 2.6.17 kernels. To check out this branch: svn co -q https://openib.org/svn/gen2/branches/iwarp/tags/iwarp-2.6.17-stable The main iwarp branch will now move forward with some major changes to the chelsio driver, plus 2.6.18 support. Consider this experimental code and use at your own risk. NOTE: The branch will eventually go away as I move toward git for the chelsio kernel modules, and I put the iwarp provider libs in the main trunk now that iwarp support is in libibverbs and librdmacm... Steve. From tom at opengridcomputing.com Fri Oct 13 08:36:01 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 13 Oct 2006 10:36:01 -0500 Subject: [openib-general] Usermode cm_id definition Message-ID: Sean: The definition of rdma_cm_id in usermode defines the src and dst addresses as sockaddr_in6. I understand that the reason you did this is because this guarantees that enough space is allocated. The problem is that the user will assume that a structure defined as sockaddr_in6 is in fact an in6 address and as such that if it is a ipv4 compatibility address that the ipv4 address is in sin6_addr.s6_addr32[3]. In our case, however, you're really treating the address as a sockaddr, that is, a generic address structure that needs to be cast based on the sa_family. There is a structure, sockaddr_storage, that is guaranteed to be big enough. The screwed up thing about this (IMO) is that the family field is defined to be ss_family, NOT sa_family, so the user can't treat it like a generic sockaddr. This is not particularly clever in my opinion. So we're left with if (sin6.sin6_family == AF_INET4) sin4 = (struct sockaddr_in*)&sin6; /* yuck */ or if we change it to struct sockaddr_storage ss; if (ss.ss_family == AF_INET4) /* erf */ sin4 = (struct sockaddr_in*)&ss; else if (ss.ss_family == AF_INET6) sin6 = (struct sockaddr_in6*)ss; The only "correct" way would be to declare a sockaddr structure * to a struct socket_storage allocated buffer and waste 4 bytes in the rdma_cm_id structure. Anyway, it's buggin' me...Maybe I'm being anal... Thoughts? Tom From mshefty at ichips.intel.com Fri Oct 13 09:29:22 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 09:29:22 -0700 Subject: [openib-general] Usermode cm_id definition In-Reply-To: References: Message-ID: <452FBEE2.9050401@ichips.intel.com> Tom Tucker wrote: > The only "correct" way would be to declare a sockaddr structure * to a > struct socket_storage allocated buffer and waste 4 bytes in the rdma_cm_id > structure. > > Anyway, it's buggin' me...Maybe I'm being anal... > > Thoughts? We could change the definition to something similar in the kernel: struct sockaddr src_addr; u8 src_pad[sizeof(struct sockaddr_storage) - sizeof(struct sockaddr)]; I'm guessing this sort of change wouldn't require any changes to any of the clients. - Sean From swise at opengridcomputing.com Fri Oct 13 10:29:29 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 13 Oct 2006 12:29:29 -0500 Subject: [openib-general] Usermode cm_id definition In-Reply-To: <452FBEE2.9050401@ichips.intel.com> References: <452FBEE2.9050401@ichips.intel.com> Message-ID: <1160760569.4406.17.camel@stevo-desktop> That seems good. On Fri, 2006-10-13 at 09:29 -0700, Sean Hefty wrote: > Tom Tucker wrote: > > The only "correct" way would be to declare a sockaddr structure * to a > > struct socket_storage allocated buffer and waste 4 bytes in the rdma_cm_id > > structure. > > > > Anyway, it's buggin' me...Maybe I'm being anal... > > > > Thoughts? > > We could change the definition to something similar in the kernel: > > struct sockaddr src_addr; > u8 src_pad[sizeof(struct sockaddr_storage) - > sizeof(struct sockaddr)]; > > I'm guessing this sort of change wouldn't require any changes to any of the clients. > > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Fri Oct 13 10:38:31 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 13 Oct 2006 12:38:31 -0500 Subject: [openib-general] [RFC] [PATCH 8/8] rdma_ucm 2.6.20: add userspace support for rdma_cm In-Reply-To: <000901c6ecc6$fe1b0060$c0d4180a@amr.corp.intel.com> References: <000901c6ecc6$fe1b0060$c0d4180a@amr.corp.intel.com> Message-ID: <1160761111.4406.23.camel@stevo-desktop> FYI: Once the ucma gets into roland's git tree, I'll base the chelsio drivers against the appropriate branch (for-2.6.20?) and test this whole thing with the chelsio driver, user and kernel. Steve. On Tue, 2006-10-10 at 16:51 -0700, Sean Hefty wrote: > Export the rdma_cm capabilities to userspace. > > Signed-off-by: Sean Hefty > --- > I added in a patch to include rdma_establish with this series, since it's > going to miss 2.6.19. This threw off my patch counting. > > This patch differs from svn in a few areas. First, data is reported with > events, eliminating the rdma_get_dst_attr() call from userspace. Secondly, > get/set option implementations have been removed. There's also a bug in > the svn code that allows reporting a multicast event to a user after they've > destroyed the group. This patch includes a fix for that, which simplifies > the user/kernel interface slightly. > From mshefty at ichips.intel.com Fri Oct 13 10:52:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 10:52:44 -0700 Subject: [openib-general] [RFC] [PATCH 8/8] rdma_ucm 2.6.20: add userspace support for rdma_cm In-Reply-To: <1160761111.4406.23.camel@stevo-desktop> References: <000901c6ecc6$fe1b0060$c0d4180a@amr.corp.intel.com> <1160761111.4406.23.camel@stevo-desktop> Message-ID: <452FD26C.3070706@ichips.intel.com> Steve Wise wrote: > FYI: Once the ucma gets into roland's git tree, I'll base the chelsio > drivers against the appropriate branch (for-2.6.20?) and test this whole > thing with the chelsio driver, user and kernel. Ok - I also have updates to the librdmacm that matches up with these changes. I didn't post them, and probably won't until we get closer to merging things. Plus, I didn't add any compatibility code for older releases, and I'd like to at least add that for the kernel modules that shipped with the OFED releases. - Sean From sean.hefty at intel.com Fri Oct 13 11:24:11 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 11:24:11 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <452D9F0A.1000000@veritas.com> Message-ID: <000001c6eef4$c8336480$8698070a@amr.corp.intel.com> >3. in req_handler() we follow the same steps as we have done without APM.. > i.e. create qpairs, change qp state to RTR and then send REP. > >however, when trying to change state to RTR usinb ib_modify_qp() I get >an error (-22). > >two info: same code will work if I pass alt_path as NULL or change the >alt_path as primary path. > >I must be missing something here, I assume this basic APM feature works >in RHEL4 update 4 distribtion >of openib stack. I added code to the ib_cm to handle APM, but haven't ever tested it myself. I believe others have used it successfully though. What differences are there between the primary and alternate paths? I.e. are just the LIDs different, or are other values also different? - Sean From mshefty at ichips.intel.com Fri Oct 13 11:37:29 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 11:37:29 -0700 Subject: [openib-general] [PATCH 12/13] Re-write cma_add_one error cases In-Reply-To: References: Message-ID: <452FDCE9.7090002@ichips.intel.com> Krishna Kumar2 wrote: > Oops, I made it against older bits. Should I recreate this patchset against > latest tree (and change #12 as suggested above) ? Actually, I'd just like the patches against the upstream kernel (maybe for a for-2.6.20 branch in Roland's git tree?). Patch 7 is a good cleanup, and reworking patch 12 to just remove the node_guid check should do. I'm not sure we gain enough to justify the other changes. - Sean From tom at opengridcomputing.com Fri Oct 13 11:39:52 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 13 Oct 2006 13:39:52 -0500 Subject: [openib-general] Usermode cm_id definition In-Reply-To: <1160760569.4406.17.camel@stevo-desktop> Message-ID: Sean strikes again. That seems perfect. On 10/13/06 12:29 PM, "Steve Wise" wrote: > > That seems good. > > > > On Fri, 2006-10-13 at 09:29 -0700, Sean Hefty wrote: >> Tom Tucker wrote: >>> The only "correct" way would be to declare a sockaddr structure * to a >>> struct socket_storage allocated buffer and waste 4 bytes in the rdma_cm_id >>> structure. >>> >>> Anyway, it's buggin' me...Maybe I'm being anal... >>> >>> Thoughts? >> >> We could change the definition to something similar in the kernel: >> >> struct sockaddr src_addr; >> u8 src_pad[sizeof(struct sockaddr_storage) - >> sizeof(struct sockaddr)]; >> >> I'm guessing this sort of change wouldn't require any changes to any of the >> clients. >> >> - Sean >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > From somenath at veritas.com Thu Oct 12 11:46:15 2006 From: somenath at veritas.com (somenath) Date: Thu, 12 Oct 2006 11:46:15 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000001c6eef4$c8336480$8698070a@amr.corp.intel.com> References: <000001c6eef4$c8336480$8698070a@amr.corp.intel.com> Message-ID: <452E8D77.8010600@veritas.com> Sean Hefty wrote: >>3. in req_handler() we follow the same steps as we have done without APM.. >> i.e. create qpairs, change qp state to RTR and then send REP. >> >>however, when trying to change state to RTR usinb ib_modify_qp() I get >>an error (-22). >> >>two info: same code will work if I pass alt_path as NULL or change the >>alt_path as primary path. >> >>I must be missing something here, I assume this basic APM feature works >>in RHEL4 update 4 distribtion >>of openib stack. >> >> > >I added code to the ib_cm to handle APM, but haven't ever tested it myself. I >believe others have used it successfully though. > >What differences are there between the primary and alternate paths? I.e. are >just the LIDs different, or are other values also different? > >- Sean > > Sean: thanks for ur reply. I use ib_sa_path_rec_get( device, HCA_PRM_PORT, /* first port =1, second port=2 */ &path_rec, IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_NUMB_PATH | IB_SA_PATH_REC_PKEY, 5000, GFP_KERNEL, func_completion, context, &query) to get the primary path, and make the same call with HCA_ALT_PORT (=2) to get the alternate path. primary path has the source and destination gid for the HCA port 1, alternate path the source and destination gid for the HCA port 2. using these two paths, I send the REQ, otherwise gets the REQ... (I can dump the primary and alternate path received in req handler to check everything is ok, will try that next..) do you remember when you checked in the working code? I am wondering if the RHEL4 U4 binary distrition of redhat has your changes. thanks, som. From krause at cup.hp.com Fri Oct 13 11:40:51 2006 From: krause at cup.hp.com (Michael Krause) Date: Fri, 13 Oct 2006 11:40:51 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000001c6eef4$c8336480$8698070a@amr.corp.intel.com> References: <452D9F0A.1000000@veritas.com> <000001c6eef4$c8336480$8698070a@amr.corp.intel.com> Message-ID: <6.2.0.14.2.20061013114032.0682c958@esmail.cup.hp.com> At 11:24 AM 10/13/2006, Sean Hefty wrote: > >3. in req_handler() we follow the same steps as we have done without APM.. > > i.e. create qpairs, change qp state to RTR and then send REP. > > > >however, when trying to change state to RTR usinb ib_modify_qp() I get > >an error (-22). > > > >two info: same code will work if I pass alt_path as NULL or change the > >alt_path as primary path. > > > >I must be missing something here, I assume this basic APM feature works > >in RHEL4 update 4 distribtion > >of openib stack. > >I added code to the ib_cm to handle APM, but haven't ever tested it myself. I >believe others have used it successfully though. > >What differences are there between the primary and alternate paths? I.e. are >just the LIDs different, or are other values also different? The spec allows a full address vector to be specified not just LID. Mike From sean.hefty at intel.com Fri Oct 13 11:59:06 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 11:59:06 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <452E8D77.8010600@veritas.com> Message-ID: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> > ib_sa_path_rec_get( > device, > HCA_PRM_PORT, /* first port =1, second >port=2 */ Note that this tells the ib_sa module which port to send the request out on. It's separate from the actual path information being requested, which is based on the GIDs. > &path_rec, > IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | > IB_SA_PATH_REC_NUMB_PATH > | IB_SA_PATH_REC_PKEY, ... >do you remember when you checked in the working code? >I am wondering if the RHEL4 U4 binary distrition of redhat has your changes. That, I'm not sure about. The fact that it works when alt_path = primary_path is what's confusing me at the moment. - Sean From johnip at sgi.com Fri Oct 13 12:17:30 2006 From: johnip at sgi.com (John Partridge) Date: Fri, 13 Oct 2006 14:17:30 -0500 Subject: [openib-general] Race in mthca_cmd_post() Message-ID: <452FE64A.6050603@sgi.com> Roland, I have been testing OFED-1.1 on SGI Altix (ia64) and on certain of our machines we see a kernel panic because of a PIO read error coming from mthca_cmd_post() Here's the stack trace :- 0xe000006040f28000 4957 4913 0 4 R 0xe000006040f283a0 *modprobe 0xa00000010000caa0 ia64_leave_kernel 0xa00000010041a720 sn_dma_flush+0x20 args (0xc000018010780698, 0xe000016003ad8280, 0xa00000010013dc90, 0x308, 0x286) 0xa00000010040f940 ___sn_readl+0x40 args (0xc000018010780698, 0xfff7fff2, 0xa00000020628d2c0, 0x60d, 0xe000026003caa500) 0xa00000020628d2c0 [ib_mthca]mthca_cmd_post+0x520 args (0xe000026003caa000, 0x0, 0x8000000000000000, 0x0, 0x0) 0xa00000020628daa0 [ib_mthca]mthca_cmd_poll+0xa0 args (0xe000026003caa000, 0x0, 0xe000006040f2fa10, 0x1, 0x0) 0xa00000020628def0 [ib_mthca]mthca_cmd_imm+0xd0 args (0xe000026003caa000, 0x0, 0xe000006040f2fa10, 0x0, 0x0) 0xa00000020628e0b0 [ib_mthca]mthca_SYS_EN+0x50 args (0xe000026003caa408, 0xe000006040f2fa22, 0xe000026079df5880, 0xa00000020628a680, 0x40e) 0xa00000020628a680 [ib_mthca]mthca_init_hca+0x1720 args (0xe000026003caa000, 0xa00000020628b7f0, 0xe000006040f2fa22, 0xe000026003caa418, 0xa0000001008ee010) 0xa00000020628bd00 [ib_mthca]__mthca_init_one+0xe60 args (0xe00003607a234800, 0x0, 0xe000026003caa408, 0xe000026003caa000, 0x0) 0xa00000020628cd40 [ib_mthca]mthca_init_one+0x100 args (0xe00003607a234800, 0xa0000002062cc0a8, 0xa0000002062ed450, 0xffffffffffffffed, 0xa0000001002b7520) 0xa0000001002b7520 pci_device_probe+0x260 [4]more> args (0xa000000100720e80, 0xa000000100720ea8, 0xe00003607a234800, 0xa0000002062cc340, 0x0) 0xa0000001003a9d40 driver_probe_device+0x100 args (0xa0000002062cc398, 0xe00003607a234870, 0x205, 0xe00003607a234a18, 0xa0000001003aa040) 0xa0000001003aa040 __driver_attach+0xc0 args (0xe00003607a234870, 0xa0000002062cc398, 0xe00003607a2349f0, 0xa0000001003a9020, 0x38a) 0xa0000001003a9020 bus_for_each_dev+0x80 args (0x0, 0x0, 0xa0000002062cc398, 0xa00000010061ac40, 0xa0000001003a9b60) 0xa0000001003a9b60 driver_attach+0x40 args (0xa0000002062cc398, 0xa0000001003a87a0, 0x40b, 0x40b) I put a PCI-X analyzer on the bus along with the HCA and found that we saw a Memory Read to register 698 but no evidence of the SYS_EN command making it down to the card. We were trying to read the gobit before the DOORBELL had completed. I think this could only happen on multi CPU machines with fast CPU's and an architecture which does not do PIO ordering very well. To fix this I have the following patch. Please can you look at it and let me know what you think :- --- drivers/infiniband/hw/mthca/mthca_cmd.c 2006-10-05 08:07:01.000000000 -0500 +++ fix/drivers/infiniband/hw/mthca/mthca_cmd.c 2006-10-13 14:01:09.104455038 -0500 @@ -282,12 +282,15 @@ mutex_lock(&dev->cmd.hcr_mutex); - if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) + if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) { + mmiowb(); mthca_cmd_post_dbell(dev, in_param, out_param, in_modifier, op_modifier, op, token); - else + } else { + mmiowb(); err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier, op_modifier, op, token, event); + } mutex_unlock(&dev->cmd.hcr_mutex); return err; Thanks John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From rdreier at cisco.com Fri Oct 13 12:40:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 13 Oct 2006 12:40:15 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <452FE64A.6050603@sgi.com> (John Partridge's message of "Fri, 13 Oct 2006 14:17:30 -0500") References: <452FE64A.6050603@sgi.com> Message-ID: John> I put a PCI-X analyzer on the bus along with the HCA and John> found that we saw a Memory Read to register 698 but no John> evidence of the SYS_EN command making it down to the John> card. We were trying to read the gobit before the DOORBELL John> had completed. I think this could only happen on multi CPU John> machines with fast CPU's and an architecture which does not John> do PIO ordering very well. Forgive me for being dense, but could you describe the race a little more precisely? Like list out what's happening and point to where two things happen out of order? What confuses me is that you seem to be saying that a read of PCI MMIO space is racing with a write -- and I would have thought that a read has to flush all posted writes. And I really don't understand how readl() could panic the kernel anyway -- the MMIO region should be set up properly long before, so what is sn_dma_flush() crashing on? Also a couple of comments on the patch itself: > - if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) > + if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) { > + mmiowb(); > mthca_cmd_post_dbell(dev, in_param, out_param, in_modifier, > op_modifier, op, token); > - else > + } else { > + mmiowb(); > err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier, > op_modifier, op, token, event); > + } First and most trivial, you indented with spaces instead of tabs. It's always obvious that a patch that looks like old_code(); + new_code(); has whitespace damage, because things don't line up. Also, why did you do if (something) mmiowb() ... else mmiowb() ... instead of just mmiowb() if (something) ... else ... Finally, if you're going to add a mmiowb() (and I don't doubt it is necessary, I'd just like to understand why exactly) then you need to add a big comment explaining what it protects -- mmiowb() is always confusing to people so it's important to have an explanation so that future changes don't break the driver again. - R. From rdreier at cisco.com Fri Oct 13 12:59:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 13 Oct 2006 12:59:46 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <452FE64A.6050603@sgi.com> (John Partridge's message of "Fri, 13 Oct 2006 14:17:30 -0500") References: <452FE64A.6050603@sgi.com> Message-ID: Oh, and no Signed-off-by: line either... From somenath at veritas.com Thu Oct 12 14:37:37 2006 From: somenath at veritas.com (somenath) Date: Thu, 12 Oct 2006 14:37:37 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> References: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> Message-ID: <452EB5A1.3020300@veritas.com> Sean Hefty wrote: >>ib_sa_path_rec_get( >> device, >> HCA_PRM_PORT, /* first port =1, second >>port=2 */ >> >> > >Note that this tells the ib_sa module which port to send the request out on. >It's separate from the actual path information being requested, which is based >on the GIDs. > > > that's right: so I just verified that by dumping dgid, sgid in primary and alternate path obtained in req_recv_handler(); they look ok to me, but just dumping here, in case u find something here: for node1: destination gid======= gid fe80:0:0:0:5:ad00:3:a801 source gid======= gid fe80:0:0:0:5:ad00:3:9abd APM destination gid======= gid fe80:0:0:0:5:ad00:3:a802 APM source gid======= gid fe80:0:0:0:5:ad00:3:9abe =================================================== for node2: destination gid======= gid fe80:0:0:0:5:ad00:3:9abd source gid======= gid fe80:0:0:0:5:ad00:3:a801 APM destination gid======= gid fe80:0:0:0:5:ad00:3:9abe APM source gid======= gid fe80:0:0:0:5:ad00:3:a802 ================================================== req_handler() { // create qpairs, connection handle, ... ib_stat = modifyqp_rtr(connection);; if (ib_stat) { goto exit; ================> i get error here } ib_stat = ib_send_cm_rep( connection->cm_id, &accept_param); if (ib_stat) { goto exit; } } if I do, modifyqp_rtr() after sending ib_send_cm_rep() then REP sending (just to try out) is successful (and other side recvs the REP), however changing state to RTR still fails after that. I must be missing something here: do I have to do anything with: 1.path_mig_state field of ib_qp_attr? 2. when do I set alt_port_num? I do that using ib_modify_qp() using IB_QP_PORT attrib mask. 3. how do I set alt_timeout and alt_pkey_index? with what attrib mask? 4. is there any sample working code using APM? thanks, som. >> &path_rec, >> IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | >> IB_SA_PATH_REC_NUMB_PATH >> | IB_SA_PATH_REC_PKEY, >> >> >... > > >>do you remember when you checked in the working code? >>I am wondering if the RHEL4 U4 binary distrition of redhat has your changes. >> >> > >That, I'm not sure about. The fact that it works when alt_path = primary_path >is what's confusing me at the moment. > >- Sean > > From johnip at sgi.com Fri Oct 13 14:45:55 2006 From: johnip at sgi.com (John Partridge) Date: Fri, 13 Oct 2006 16:45:55 -0500 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <452FE64A.6050603@sgi.com> Message-ID: <45300913.4040003@sgi.com> Roland, Thanks for looking at this, I appologize for not giving enough detail, I'm never too sure how much to post. Roland Dreier wrote: > John> I put a PCI-X analyzer on the bus along with the HCA and > John> found that we saw a Memory Read to register 698 but no > John> evidence of the SYS_EN command making it down to the > John> card. We were trying to read the gobit before the DOORBELL > John> had completed. I think this could only happen on multi CPU > John> machines with fast CPU's and an architecture which does not > John> do PIO ordering very well. > > Forgive me for being dense, but could you describe the race a little > more precisely? Like list out what's happening and point to where two > things happen out of order? What confuses me is that you seem to be > saying that a read of PCI MMIO space is racing with a write -- and I > would have thought that a read has to flush all posted writes. I was confused by this, but I believe that the readl to reg 698 is not completing because the DDR memory is not yet available because SYS_EN never got down to the card before the readl, or did not complete before readl. This could be wrong I don't know. I was expecting to see a PIO write to reg 680 (which I can't find in the analyzer trace). What I did see is a lot of Config Write's to the PCI config space and then a Memeory read (to 698). Here is a snapshot of the trace :- 23438: * Config Write REG = 01 TYPE = 1 Bus1 Device0 Func0 23439: BE = 0000 Requester = Bus0 Device0 Func0 Tag = 1 23440: (no DEVSEL#, no IRDY#) 23441: 23442: GAP of 2 line(s) 23443: -RETRY 23444: 23445: GAP of 24 line(s) 23446: * Config Write REG = 01 TYPE = 1 Bus1 Device0 Func0 23447: BE = 0000 Requester = Bus0 Device0 Func0 Tag = 1 23448: (no DEVSEL#, no IRDY#) 23449: 23450: GAP of 2 line(s) 23451: Data = 02300107 -SPLIT RESPONSE 23452: 23453: GAP of 3 line(s) 23454: * Memory Rd DW Addr = 00280698 23455: BE = 0000 Requester = Bus0 Device0 Func0 Tag = 0 23456: (no DEVSEL#, no IRDY#) 23457: 23458: GAP of 2 line(s) 23459: Data = ffffffff -SPLIT RESPONSE 23460: 23461: GAP of 7 line(s) 23462: * Split compl. (Requester = Bus0 Device0 Func0) Tag = 0 23463: Completer = Bus0 Device2 Func0 (Error completion) 23464: 23465: Data = 10000004 23466: > > And I really don't understand how readl() could panic the kernel > anyway -- the MMIO region should be set up properly long before, so > what is sn_dma_flush() crashing on? The panic occurs because of the Memory read error completion, the read probably fails because the card is not ready enabled ?? I don't know about other architectures, but on ours a PIO read failing will cause a panic (maybe a more correct description would be MCA) > > Also a couple of comments on the patch itself: > > > - if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) > > + if (event && dev->cmd.flags & MTHCA_CMD_POST_DOORBELLS && fw_cmd_doorbell) { > > + mmiowb(); > > mthca_cmd_post_dbell(dev, in_param, out_param, in_modifier, > > op_modifier, op, token); > > - else > > + } else { > > + mmiowb(); > > err = mthca_cmd_post_hcr(dev, in_param, out_param, in_modifier, > > op_modifier, op, token, event); > > + } > > First and most trivial, you indented with spaces instead of tabs. > It's always obvious that a patch that looks like > > old_code(); > + new_code(); > > has whitespace damage, because things don't line up. Sorry I just cut and pasted it, just to get a discussion under way. It was in no way meant to be a final patch. I would attach a real file when it comes to it. Sorry I should have made that clear. > > Also, why did you do > > if (something) > mmiowb() > ... > else > mmiowb() > ... > > instead of just > > mmiowb() > if (something) > ... > else > ... Actually looking at the code again, I'm not sure that is the correct place for the mmiowb() I think maybe it would be better done in the mthca_cmd_post_hcr() routine before we do any work in there. But, I need to make sure that mthca_cmd_post_hcr() is always protected by a lock before we do that. What do you think ? > > Finally, if you're going to add a mmiowb() (and I don't doubt it is > necessary, I'd just like to understand why exactly) then you need to > add a big comment explaining what it protects -- mmiowb() is always > confusing to people so it's important to have an explanation so that > future changes don't break the driver again. OK thanks excellent point. John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From rjwalsh at pathscale.com Fri Oct 13 15:03:22 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 13 Oct 2006 15:03:22 -0700 Subject: [openib-general] Question about gen2_basic test Message-ID: <45300D2A.7090702@pathscale.com> Hi all, I've a question about one of the gen2_basic tests. The test is test 3 of the QP test collection. There's a piece of code in this test that does a modify_qp, followed by a query_qp. The query QP bit checks that the modify_qp did what was expected of it. One check looks like this: if (mask & IBV_QP_MAX_DEST_RD_ATOMIC) { CHECK_VALUE("max_dest_rd_atomic", query_attr.max_dest_rd_atomic, next_power_of_two(attr->max_dest_rd_atomic), return -1); } There's similar code for max_rd_atomic. My question is: why is there a next_power_of_two() bit in there? I don't see anywhere in the spec that says that non-power-of-two values are illegal, and we've just added support to our driver for values > 1 for these two attributes, and we support non-power-of-two values. This test is failing for us because the value returned in query_attr is what we set it to in modify_qp, not a rounded-up version. Am I missing something in the spec? Regards, Robert. From rdreier at cisco.com Fri Oct 13 15:02:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 13 Oct 2006 15:02:59 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <45300913.4040003@sgi.com> (John Partridge's message of "Fri, 13 Oct 2006 16:45:55 -0500") References: <452FE64A.6050603@sgi.com> <45300913.4040003@sgi.com> Message-ID: John> I was confused by this, but I believe that the readl to reg John> 698 is not completing because the DDR memory is not yet John> available because SYS_EN never got down to the card before John> the readl, or did not complete before readl. This could be John> wrong I don't know. I was expecting to see a PIO write to John> reg 680 (which I can't find in the analyzer trace). What I John> did see is a lot of Config Write's to the PCI config space John> and then a Memeory read (to 698). OK, consider me thoroughly confused. The read to reg 698 (MTHCA_HCR_BASE + HCR_STATUS_OFFSET) is coming from go_bit(), and that seems entirely correct to me. The SYS_EN command should end up in mthca_cmd_post_hcr(), and that checks the go bit before doing anything else. So this is exactly what I would expect to see. AFAIK this is working everywhere -- the device does not need to be enabled for reads to this area to work. What does the PCI trace look like after you add mmiowb()? - R. From mshefty at ichips.intel.com Fri Oct 13 15:56:40 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 15:56:40 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <452EB5A1.3020300@veritas.com> References: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> <452EB5A1.3020300@veritas.com> Message-ID: <453019A8.10904@ichips.intel.com> somenath wrote: > I must be missing something here: do I have to do anything with: > > 1.path_mig_state field of ib_qp_attr? This should be set when transitioning to RTS. The field is set by the ib_cm in ib_cm_init_qp_attr. (Actually, are you using this call to set your QP attributes?) > 2. when do I set alt_port_num? I do that using ib_modify_qp() using > IB_QP_PORT attrib mask. This is set when transitioning to RTR. > 3. how do I set alt_timeout and alt_pkey_index? with what attrib mask? > 4. is there any sample working code using APM? Not that I'm aware of. From somenath at veritas.com Thu Oct 12 16:31:05 2006 From: somenath at veritas.com (somenath) Date: Thu, 12 Oct 2006 16:31:05 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453019A8.10904@ichips.intel.com> References: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> <452EB5A1.3020300@veritas.com> <453019A8.10904@ichips.intel.com> Message-ID: <452ED039.9010908@veritas.com> Sean Hefty wrote: > somenath wrote: > >> I must be missing something here: do I have to do anything with: >> >> 1.path_mig_state field of ib_qp_attr? > > > This should be set when transitioning to RTS. The field is set by the > ib_cm in ib_cm_init_qp_attr. (Actually, are you using this call to > set your QP attributes?) yes, I use ib_cm_init_qp_attr() > >> 2. when do I set alt_port_num? I do that using ib_modify_qp() using >> IB_QP_PORT attrib mask. > > > This is set when transitioning to RTR. > yes, i do that. >> 3. how do I set alt_timeout and alt_pkey_index? with what attrib >> mask? 4. is there any sample working code using APM? > > > Not that I'm aware of. thanks, som. From mshefty at ichips.intel.com Fri Oct 13 17:00:14 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 17:00:14 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <452ED039.9010908@veritas.com> References: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> <452EB5A1.3020300@veritas.com> <453019A8.10904@ichips.intel.com> <452ED039.9010908@veritas.com> Message-ID: <4530288E.802@ichips.intel.com> Can you confirm that I have your test failure correctly? pri_path = path 1 alt_path = NULL works pri_path = path 2 alt_path = NULL works pri_path = alt_path = path 1 works pri_path = alt_path = path 2 works pri_path = path 1 alt_path = path 2 fails Are the only difference between path 1 and 2 the S/DGIDs and S/DLIDs? - Sean From somenath at veritas.com Thu Oct 12 17:20:16 2006 From: somenath at veritas.com (somenath) Date: Thu, 12 Oct 2006 17:20:16 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <4530288E.802@ichips.intel.com> References: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> <452EB5A1.3020300@veritas.com> <453019A8.10904@ichips.intel.com> <452ED039.9010908@veritas.com> <4530288E.802@ichips.intel.com> Message-ID: <452EDBC0.7020904@veritas.com> Sean Hefty wrote: > Can you confirm that I have your test failure correctly? > > pri_path = path 1 > alt_path = NULL > works yes. > > pri_path = path 2 > alt_path = NULL > works > yes. > pri_path = alt_path = path 1 > works > no, I haven't tested that. I can try that too, if u think that can provide useful info.. > pri_path = alt_path = path 2 > works no, not tried that either. > > pri_path = path 1 > alt_path = path 2 > fails > yes, that fails to change the QP state to RTR in req_recv_handler().. (if I don't change the state to RTR, REP sending will be successful) > Are the only difference between path 1 and 2 the S/DGIDs and S/DLIDs? > that's how its coded, but I haven't verified that in req_recv_handler(). I will verify that and let you know (by dumping both the paths in rec_recv_handler()). thanks, som. > - Sean From somenath at veritas.com Thu Oct 12 18:16:26 2006 From: somenath at veritas.com (somenath) Date: Thu, 12 Oct 2006 18:16:26 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <452EDBC0.7020904@veritas.com> References: <000101c6eef9$a8544440$8698070a@amr.corp.intel.com> <452EB5A1.3020300@veritas.com> <453019A8.10904@ichips.intel.com> <452ED039.9010908@veritas.com> <4530288E.802@ichips.intel.com> <452EDBC0.7020904@veritas.com> Message-ID: <452EE8EA.2070808@veritas.com> somenath wrote: > Sean Hefty wrote: > >> Can you confirm that I have your test failure correctly? >> >> pri_path = path 1 >> alt_path = NULL >> works > > > > yes. > >> >> pri_path = path 2 >> alt_path = NULL >> works >> > yes. > >> pri_path = alt_path = path 1 >> works >> > > no, I haven't tested that. I can try that too, if u think that can > provide useful info.. > >> pri_path = alt_path = path 2 >> works > > > > no, not tried that either. > >> >> pri_path = path 1 >> alt_path = path 2 >> fails >> > > yes, that fails to change the QP state to RTR in req_recv_handler().. > (if I don't change the state to RTR, REP sending will be successful) > >> Are the only difference between path 1 and 2 the S/DGIDs and S/DLIDs? >> > that's how its coded, but I haven't verified that in req_recv_handler(). > I will verify that and let you know (by dumping both the paths in > rec_recv_handler()). for node1: primary_path: dlid=0x1f06 slid=0x1d06 raw=0x0 flow=0x0 hop=0x0 tra=0x0 rev=0x1 numb=0x0 pkey=0xffff sl=0x0 mtus=0x2 mtu=0x4 rate=0x2 plts=0x3 plt=0x2 pref=0x2 alt_path: dlid=0x2006 slid=0x1e06 raw=0x0 flow=0x0 hop=0x0 tra=0x0 rev=0x1 numb=0x0 pkey=0xffff sl=0x0 mtus=0x2 mtu=0x4 rate=0x2 plts=0x3 plt=0x2 pref=0x2 they seem to be same, thanks, som. > thanks, som. > > > > >> - Sean > > > > From sean.hefty at intel.com Fri Oct 13 23:43:37 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 13 Oct 2006 23:43:37 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <452EDBC0.7020904@veritas.com> Message-ID: <000201c6ef5c$13fdfd00$dec8180a@amr.corp.intel.com> >> pri_path = alt_path = path 1 >> works >> > >no, I haven't tested that. I can try that too, if u think that can >provide useful info.. I misunderstood one of your earlier e-mails then. I threw together a test case to try this, and it worked for me. Can you see if the same works for you? If not, then my guess is that the release you're using is missing some needed patches. (You may be able to work around the issue in your code, however, so we'll see what can be done.) My systems only have one path between them, so I until I can physically add another path, I won't be able to test the case where pri_path != alt_path. - Sean From chas at cmf.nrl.navy.mil Sat Oct 14 07:25:52 2006 From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Sat, 14 Oct 2006 10:25:52 -0400 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: Message-ID: <200610141425.k9EEPqaW007190@cmf.nrl.navy.mil> In message ,"Roland Dreier" writes: >saying that a read of PCI MMIO space is racing with a write -- and I >would have thought that a read has to flush all posted writes. a read does flush all the posted writes but that doesnt mean that the write operation has had enough time to "complete". i had a similar problem on the altix platform with posted writes. part of the hw init was to write the reset register, wait a few ticks, and then read the register until you saw a flag clear. reading the device "too soon" failed because it was in some poor state that didnt respond properly. with posted writes, you needed to force out the writes (not using read obviously) and then wait the appropriate time. From rdreier at cisco.com Sat Oct 14 09:53:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 14 Oct 2006 09:53:03 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <200610141425.k9EEPqaW007190@cmf.nrl.navy.mil> (chas williams's message of "Sat, 14 Oct 2006 10:25:52 -0400") References: <200610141425.k9EEPqaW007190@cmf.nrl.navy.mil> Message-ID: chas> i had a similar problem on the altix platform with posted chas> writes. part of the hw init was to write the reset chas> register, wait a few ticks, and then read the register until chas> you saw a flag clear. reading the device "too soon" failed chas> because it was in some poor state that didnt respond chas> properly. with posted writes, you needed to force out the chas> writes (not using read obviously) and then wait the chas> appropriate time. How do you force out writes without doing a read? I don't know of any other way to flush writes that is guaranteed by the PCI spec. In any case that doesn't seem to be the problem here: the read is supposed to be done first, even in the source code. - R. From rdreier at cisco.com Sat Oct 14 10:33:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 14 Oct 2006 10:33:15 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <200610141728.k9EHS82e009018@cmf.nrl.navy.mil> (chas williams's message of "Sat, 14 Oct 2006 13:28:08 -0400") References: <200610141728.k9EHS82e009018@cmf.nrl.navy.mil> Message-ID: chas> see Documentation/io_ordering.txt. That says you should do a read to flush writes, doesn't it?? What am I missing. chas> i thought it might be because in a later message john said, john> completing because the DDR memory is not yet available john> because SYS_EN never got down to the card before the readl, john> or did not complete before readl. The read that is failing is not going to DDR memory -- it going to a "safe" register. - R. From chas at cmf.nrl.navy.mil Sat Oct 14 10:28:08 2006 From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Sat, 14 Oct 2006 13:28:08 -0400 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: Message-ID: <200610141728.k9EHS82e009018@cmf.nrl.navy.mil> In message ,Roland Dreier writes: >How do you force out writes without doing a read? I don't know of any >other way to flush writes that is guaranteed by the PCI spec. see Documentation/io_ordering.txt. >In any case that doesn't seem to be the problem here: the read is >supposed to be done first, even in the source code. i thought it might be because in a later message john said, >completing because the DDR memory is not yet available because SYS_EN never >got down to the card before the readl, or did not complete before readl. i would like to think that the altix isnt reordering read and writes and that perhaps there needs to be a short delay between certain writes. From chas at cmf.nrl.navy.mil Sat Oct 14 10:47:09 2006 From: chas at cmf.nrl.navy.mil (chas williams - CONTRACTOR) Date: Sat, 14 Oct 2006 13:47:09 -0400 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: Message-ID: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> In message ,Roland Dreier writes: >That says you should do a read to flush writes, doesn't it?? What am >I missing. i guess my point is that you dont need to read from the device, you could read from the bridge or a config register. >The read that is failing is not going to DDR memory -- it going to a >"safe" register. i believe by safe register they meant the pci config register space and not the memory mapped registers on the card. looking at the trace from the analyzer, there are a couple writes to config register (config reg 1, PCI_COMMAND_IO) and then a read from the memory mapped region. i would guess the read to the mmio region is flushing the writes to the config register but the read happens "too soon" after those writes. on a more mundance computer, the write/write/read probably wouldnt be batched together. From rdreier at cisco.com Sat Oct 14 13:24:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 14 Oct 2006 13:24:43 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> (chas williams's message of "Sat, 14 Oct 2006 13:47:09 -0400") References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> Message-ID: chas> i would guess the read to the mmio region is flushing the chas> writes to the config register but the read happens "too chas> soon" after those writes. on a more mundance computer, the chas> write/write/read probably wouldnt be batched together. config writes can't be posted though, so that doesn't make sense. - R. From mst at mellanox.co.il Sat Oct 14 13:14:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 14 Oct 2006 22:14:51 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1160665813.2917.306.camel@fc6.xsintricity.com> References: <1160449038.3430.18.camel@sarium.pathscale.com> <1160594792.2917.186.camel@fc6.xsintricity.com> <1160604536.8378.104.camel@sarium.pathscale.com> <1160665813.2917.306.camel@fc6.xsintricity.com> Message-ID: <20061014201451.GA10453@mellanox.co.il> Quoting r. Doug Ledford : > Sorry. RHEL5 Beta1 has been out for a while, but OFED 1.1 still isn't > done yet. Obviously, I wasn't able to get something in RHEL5 that > didn't even exist prior to freeze. Would it be possible to include patches backporting fixes in infiniband kernel components from 2.6.18/OFED 1.1 to modules that already ship with RHEL5? -- MST _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From sweitzen at cisco.com Sat Oct 14 14:08:15 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sat, 14 Oct 2006 14:08:15 -0700 Subject: [openib-general] [openfabrics-ewg] Cisco SQA results for OFED 1.1 rc6 Message-ID: > > The following bugs and enhancement requests were filed. > > > > 247 OFED IPoIB HA not working on RHEL4 U3 > > > We fixed it inRC7 Agreed fixed, thanks. > > 249 OFED 1.1: Open MPI 1.1.1 won't compile with > Intel C 9.[01] > > on SLES 10 > > > I guess this will not be fixed for OFED 1.1. Correct? There is an update to Intel C 9.1, I'll be trying it out with rc7. > > 258 OFED: ppc64 GNU mpif90 missing for MVAPICH > > > Can you send us log file? I put the logs in bugzilla. > > 259 problems with OFED IPoIB HA on SLES10 > > > Fixed in RC7 Agreed fixed, thanks. > > 269 OFED 1.1 rc6 IPoIB does not interoperate with > Cisco SFS 3001 > > > Can Cisco debug it and send patches? We're working on it. > > 270 tvflash does not work with HCA recovery jumper > > > Can Cisco send a fix? We're working on it. Scott From eitan at mellanox.co.il Sun Oct 15 01:28:26 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 15 Oct 2006 10:28:26 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <452E7753.80402@ichips.intel.com> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> <452D6117.6040400@ichips.intel.com> <452DE8C4.2040804@mellanox.co.il> <452E7753.80402@ichips.intel.com> Message-ID: <4531F12A.2000306@mellanox.co.il> Sean Hefty wrote: > Eitan Zahavi wrote: > >> I disagree. If you sniff at the MAD level you can simply react to the >> lower level messages. >> > > First, when designing this, I did consider using the MAD snooping ability, and > changing what could be done with snooping. However, the multicast handling is > not simply sniffing MADs going out on the wire and incrementing / decrementing > some count. It can change or prevent a MAD from being sent. This is a > fundamental change to the behavior of the ib_mad APIs. > I am sorry I was not involved in that early stage. My bad. I need to look deeper into the code. As long as a response is generated even though the MAD was not sent this is not an API change but a bug fix. In this stage it seems that only a patch would convince you otherwise. I will try working on it this week. What I had in mind was to provide back a MAD response in the case of delete when the client is not the last one on the group. All other MADs go on the wire (duplicate "join"). > MADs are sent and tracked by their respective registered ib_mad clients. Exactly and the agent ID is part of the MAD trans_id. So we know which agent is sending which MAD. > Trying > to push this down into the MAD layer means that the send request from one client > may now occur on some other client's registration. Not sure I am following you here. If you refer to the race where one client sends "join" while the other sends "leave" you should make sure: 1. Mark a client as "joined" only after receiving the SA response. 2. Consider a "leave" when the client MAD is sent out. > If that client decides to > unregister in the middle of their send, the operation is canceled, and now needs > to be restarted on some other registration. And even though the operation was > canceled, we still need to know whether it was seen by the SA. This requires > sniffing all MADs, and quickly gets extremely complex. > Cancel does not really revert a post_send. Isn't it? So if we catch it just before it is posting we should be fine. > In order to avoid issues these with which registered client is actually > performing the operation, the solution is to filter multicast requests through a > single registration. If each client uses its own agent ID then it is available in the trans_id of the MAD. > The ib_mad layer is complex enough as it is. (Have you > tried tracing a MAD through the send path?) We don't need to push even more > functionality down into it. > I agree that layering on top is easier. But does it really solve the bug? I think not. If you would REPLACE the API and not provide both options (above and below refcount enforcement ) it would make sense to me. > - Sean > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From akepner at sgi.com Sun Oct 15 02:02:44 2006 From: akepner at sgi.com (akepner at sgi.com) Date: Sun, 15 Oct 2006 02:02:44 -0700 (PDT) Subject: [openib-general] [PATCH] use mmiowb after doorbell ring Message-ID: We discovered a problem when running IPoIB applications on multiple CPUs on an Altix system. Many messages such as: ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq) appear in syslog, and the driver wedges up. Apparently this is because writes to the doorbells from different CPUs are clobbering one another. The following patch adds mmiowb() calls after doorbell rings to ensure the doorbell register updates are ordered. Signed-off-by: --- diff -rpu openib-1.1.orig/drivers/infiniband/hw/mthca/mthca_cq.c openib-1.1/drivers/infiniband/hw/mthca/mthca_cq.c --- openib-1.1.orig/drivers/infiniband/hw/mthca/mthca_cq.c 2006-10-15 00:23:07.474893244 -0700 +++ openib-1.1/drivers/infiniband/hw/mthca/mthca_cq.c 2006-10-15 00:25:03.601978852 -0700 @@ -41,6 +41,8 @@ #include +#include + #include "mthca_dev.h" #include "mthca_cmd.h" #include "mthca_memfree.h" @@ -314,6 +316,9 @@ void mthca_cq_clean(struct mthca_dev *de wmb(); cq->cons_index += nfreed; update_cons_index(dev, cq, nfreed); + /* use mmiowb to ensure update is ordered properly + * prior to releasing the spinlock */ + mmiowb(); } spin_unlock_irq(&cq->lock); @@ -711,6 +716,11 @@ repoll: } } + if (freed) { + /* we rang the MTHCA_CQ_DOORBELL so use mmiowb + * to make sure it is ordered properly */ + mmiowb(); + } spin_unlock_irqrestore(&cq->lock, flags); return err == 0 || err == -EAGAIN ? npolled : err; diff -rpu openib-1.1.orig/drivers/infiniband/hw/mthca/mthca_qp.c openib-1.1/drivers/infiniband/hw/mthca/mthca_qp.c --- openib-1.1.orig/drivers/infiniband/hw/mthca/mthca_qp.c 2006-10-15 00:23:20.126932247 -0700 +++ openib-1.1/drivers/infiniband/hw/mthca/mthca_qp.c 2006-10-15 00:25:03.613697320 -0700 @@ -43,6 +43,8 @@ #include #include +#include + #include "mthca_dev.h" #include "mthca_cmd.h" #include "mthca_memfree.h" @@ -1730,6 +1732,9 @@ out: mthca_write64(doorbell, dev->kar + MTHCA_SEND_DOORBELL, MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + /* use mmiowb to ensure write to doorbell is ordered + * before releasing spinlock */ + mmiowb(); } qp->sq.next_ind = ind; @@ -1849,6 +1854,9 @@ out: qp->rq.next_ind = ind; qp->rq.head += nreq; + /* use mmiowb to ensure writes to doorbell are ordered + * before releasing spinlock */ + mmiowb(); spin_unlock_irqrestore(&qp->rq.lock, flags); return err; } @@ -2110,6 +2118,9 @@ out: MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); } + /* use mmiowb to ensure writes to doorbell are ordered + * before releasing spinlock */ + mmiowb(); spin_unlock_irqrestore(&qp->sq.lock, flags); return err; } diff -rpu openib-1.1.orig/drivers/infiniband/hw/mthca/mthca_srq.c openib-1.1/drivers/infiniband/hw/mthca/mthca_srq.c --- openib-1.1.orig/drivers/infiniband/hw/mthca/mthca_srq.c 2006-10-15 00:23:25.428562360 -0700 +++ openib-1.1/drivers/infiniband/hw/mthca/mthca_srq.c 2006-10-15 00:25:03.626392326 -0700 @@ -35,6 +35,8 @@ #include #include +#include + #include "mthca_dev.h" #include "mthca_cmd.h" #include "mthca_memfree.h" @@ -593,6 +595,9 @@ int mthca_tavor_post_srq_recv(struct ib_ MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); } + /* use mmiowb to ensure writes to doorbell are ordered + * before releasing spinlock */ + mmiowb(); spin_unlock_irqrestore(&srq->lock, flags); return err; } -- Arthur From dotanb at dev.mellanox.co.il Sun Oct 15 02:43:02 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 15 Oct 2006 11:43:02 +0200 Subject: [openib-general] what happens if one close the device in user level without releasing the resources? In-Reply-To: <20061012181624.GC15881@mellanox.co.il> References: <452E64E6.1010608@dev.mellanox.co.il> <20061012181624.GC15881@mellanox.co.il> Message-ID: <453202A6.5090206@dev.mellanox.co.il> Michael S. Tsirkin wrote: > Quoting r. Dotan Barak : > >> Subject: what happens if one close the device in user level without releasing the resources? >> >> Hi. >> >> What should happen if one opens the IB device, allocate resources and >> close the device? >> for example, if a user do the following operations in a loop: >> ibv_get_device_list >> in a loop: >> ibv_open_device >> ibv_alloc_pd >> ibv_create_cq >> ibv_close_device? >> >> should the ibv_close_device clean all of the allocated resources or it >> is up to the user to take care of this? >> > Up to the user. > > The problem is that after one closed the HCA: 1) he don't know if he closed all of the resources 2) he can't release the resources that he have (i believe that this will cause a memory corruption ..) When one tries to deallocate a PD which being used, this operation fail. can this verb (ibv_close_device) behave the same? thanks Dotan From dotanb at dev.mellanox.co.il Sun Oct 15 04:54:58 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 15 Oct 2006 13:54:58 +0200 Subject: [openib-general] [libibverbs] [PATCH] Add enumeration of port capabilities Message-ID: <1160913298.32765.3.camel@mtls05.yok.mtl.com> Added enumeration of port capabilities so user level application will know the supported features of the port. Signed-off-by: Dotan Barak --- Index: last_stable/src/userspace/libibverbs/include/infiniband/verbs.h =================================================================== --- last_stable.orig/src/userspace/libibverbs/include/infiniband/verbs.h 2006-10-11 17:06:05.000000000 +0200 +++ last_stable/src/userspace/libibverbs/include/infiniband/verbs.h 2006-10-12 17:27:03.493384512 +0200 @@ -157,6 +157,31 @@ enum ibv_port_state { IBV_PORT_ACTIVE_DEFER = 5 }; +enum ibv_port_cap_flags { + IBV_PORT_SM = 1 << 1, + IBV_PORT_NOTICE_SUP = 1 << 2, + IBV_PORT_TRAP_SUP = 1 << 3, + IBV_PORT_OPT_IPD_SUP = 1 << 4, + IBV_PORT_AUTO_MIGR_SUP = 1 << 5, + IBV_PORT_SL_MAP_SUP = 1 << 6, + IBV_PORT_MKEY_NVRAM = 1 << 7, + IBV_PORT_PKEY_NVRAM = 1 << 8, + IBV_PORT_LED_INFO_SUP = 1 << 9, + IBV_PORT_SM_DISABLED = 1 << 10, + IBV_PORT_SYS_IMAGE_GUID_SUP = 1 << 11, + IBV_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = 1 << 12, + IBV_PORT_CM_SUP = 1 << 16, + IBV_PORT_SNMP_TUNNEL_SUP = 1 << 17, + IBV_PORT_REINIT_SUP = 1 << 18, + IBV_PORT_DEVICE_MGMT_SUP = 1 << 19, + IBV_PORT_VENDOR_CLASS_SUP = 1 << 20, + IBV_PORT_DR_NOTICE_SUP = 1 << 21, + IBV_PORT_CAP_MASK_NOTICE_SUP = 1 << 22, + IBV_PORT_BOOT_MGMT_SUP = 1 << 23, + IBV_PORT_LINK_LATENCY_SUP = 1 << 24, + IBV_PORT_CLIENT_REG_SUP = 1 << 25 +}; + struct ibv_port_attr { enum ibv_port_state state; enum ibv_mtu max_mtu; From kliteyn at dev.mellanox.co.il Sun Oct 15 07:16:18 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 15 Oct 2006 16:16:18 +0200 Subject: [openib-general] [PATCHv2 7/13] osm: port to WinIB stack : opensm/osm_ucast_file.c In-Reply-To: <1160477239.4524.39030.camel@hal.voltaire.com> References: <1160425828.4524.1657.camel@hal.voltaire.com> <452B3AD0.9070400@dev.mellanox.co.il> <1160477239.4524.39030.camel@hal.voltaire.com> Message-ID: <453242B2.8080201@dev.mellanox.co.il> Hi Hal. Hal Rosenstock wrote: > On Tue, 2006-10-10 at 02:16, Yevgeny Kliteynik wrote: >> Great, thanks. >> >> Now only one patch left > > I think there was one more in addition that was agreed to: > Add Windows defines to config.h and remove from various opensm files. > Will you being doing this once the other one is accepted ? Yes, I will. -- Yevgeny >> - it also deals with this varargs >> macro issue, so it was applied only partially. >> I'll resubmit the patch shortly to save you the boredom >> of looking for it and extracting the changes that weren't >> applied. > > Thanks. > > -- Hal > >> -- >> Yevgeny >> >> Hal Rosenstock wrote: >>> On Sun, 2006-10-08 at 11:42, Yevgeny Kliteynik wrote: >>>> Hi Hal >>>> >>>> This is the re-submission of the patch that was >>>> originally sibmitted by Eitan - just removing some >>>> cosmetic changes from the patch and re-diffing it >>>> with the trunk: >>>> >>>> 1. Avoid varargs macros not supported by Windows >>>> 2. Included additional header for PRIx64 macro >>>> >>>> Yevgeny >>>> >>>> Signed-off-by: Yevgeny Kliteynik >>> Thanks. Applied. >>> >>> -- Hal >>> > > From kliteyn at dev.mellanox.co.il Sun Oct 15 07:31:43 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 15 Oct 2006 16:31:43 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits Message-ID: Hi Hal This patch fixes a few data type problems with OSM on 64-bit Windows machines. The changes are done in the following files: opensm/osm_prtn_config.c opensm/osm_pkey.c opensm/osm_qos.c Note that the casting is done on the calculation result, wich is string lenght, index in table, or index in string, so it's ok to 'downcast' the value. The patch is for trunk only. Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_prtn_config.c =================================================================== --- opensm/osm_prtn_config.c (revision 9820) +++ opensm/osm_prtn_config.c (working copy) @@ -254,7 +254,7 @@ static int parse_name_token(char *str, c p++; q = p + strlen(p); - len += q - str + 1; + len += (int)(q - str) + 1; while ( q != p && ( *q == '\0' || *q == ' ' || *q == '\t' || *q == '\n')) *q-- = '\0'; @@ -293,7 +293,7 @@ static int parse_part_conf(struct part_c if (*p == '\t' || *p == '\0' || *p == '\n') p++; - len += p - str; + len += (int)(p - str); str = p; if (conf->p_prtn) Index: opensm/osm_pkey.c =================================================================== --- opensm/osm_pkey.c (revision 9820) +++ opensm/osm_pkey.c (working copy) @@ -297,7 +297,7 @@ osm_pkey_tbl_get_block_and_idx( (p_pkey < block->pkey_entry + IB_NUM_PKEY_ELEMENTS_IN_BLOCK)) { *p_block_idx = block_index; - *p_pkey_idx = p_pkey - block->pkey_entry; + *p_pkey_idx = (uint8_t)(p_pkey - block->pkey_entry); return(IB_SUCCESS); } } Index: opensm/osm_qos.c =================================================================== --- opensm/osm_qos.c (revision 9820) +++ opensm/osm_qos.c (working copy) @@ -399,7 +399,7 @@ static int parse_one_unsigned(char *str, *val = strtoul(str, &end, 0); if (*end) end++; - return end - str; + return (int)(end - str); } static int parse_vlarb_entry(char *str, ib_vl_arb_element_t * e) @@ -410,7 +410,7 @@ static int parse_vlarb_entry(char *str, e->vl = val % 15; p += parse_one_unsigned(p, ',', &val); e->weight = (uint8_t)val; - return p - str; + return (int)(p - str); } static int parse_sl2vl_entry(char *str, uint8_t * raw) @@ -420,7 +420,7 @@ static int parse_sl2vl_entry(char *str, p += parse_one_unsigned(p, ',', &val1); p += parse_one_unsigned(p, ',', &val2); *raw = (val1 << 4) | (val2 & 0xf); - return p - str; + return (int)(p - str); } static void qos_build_config(struct qos_config *cfg, From kliteyn at dev.mellanox.co.il Sun Oct 15 07:51:55 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 15 Oct 2006 16:51:55 +0200 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: <1160577041.32093.53620.camel@hal.voltaire.com> References: <1160577041.32093.53620.camel@hal.voltaire.com> Message-ID: <45324B0B.1090004@dev.mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2006-10-11 at 06:23, Yevgeny Kliteynik wrote: >> Hi Hal >> >> Fixing a few problems in the multicast test flow, >> plus some cosmetics. >> >> Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik > > Thanks. Applied. See question below... > >> Index: osmt_multicast.c >> =================================================================== >> --- osmt_multicast.c (revision 9776) >> +++ osmt_multicast.c (working copy) > > [snip...] > >> @@ -1808,14 +1865,54 @@ osmt_run_mcast_flow( IN osmtest_t * cons >> >> /* o15.0.1.6: */ >> /* - Create a new MCG with valid requested MGID. */ >> + osmt_init_mc_query_rec(p_osmt, &mc_req_rec); >> + mc_req_rec.mgid = good_mgid; >> >> osm_log( &p_osmt->log, OSM_LOG_INFO, >> "osmt_run_mcast_flow: " >> - "Checking Create given MGID=0x%016" PRIx64 " : " >> + "Checking Create given valid MGID=0x%016" PRIx64 " : " >> + "0x%016" PRIx64 " (o15.0.1.6)...\n", >> + cl_ntoh64(mc_req_rec.mgid.unicast.prefix), >> + cl_ntoh64(mc_req_rec.mgid.unicast.interface_id)); >> + >> + /* Before creation, need to check that this group doesn't exist */ >> + osm_log( &p_osmt->log, OSM_LOG_INFO, >> + "osmt_run_mcast_flow: " >> + "Verifying that MCGroup with this MGID doesn't exist by trying to Join it (o15.0.1.13)...\n"); >> + >> + ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_NON_MEMBER); >> + >> + osm_log( &p_osmt->log, OSM_LOG_ERROR, >> + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); >> + status = osmt_send_mcast_request( p_osmt, 1, /* join */ >> + &mc_req_rec, >> + comp_mask, >> + &res_sa_mad ); >> + osm_log( &p_osmt->log, OSM_LOG_ERROR, >> + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); >> + >> + if ((status != IB_REMOTE_ERROR) || >> + (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) >> + { >> + osm_log( &p_osmt->log, OSM_LOG_ERROR, >> + "osmt_run_mcast_flow: ERR 0301: " >> + "Tried joining group that shouldn't have existed - got %s/%s\n", >> + ib_get_err_str( status ), >> + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) >> + ); >> + status = IB_ERROR; >> + goto Exit; >> + } > > In the event this works, the SA is potentially left with some bad state > because of this. Should the join be removed for this case ? When osmtest completes w/o error, at the end of the test flow SM is returned to its initial state. When osmtest runs into some error, it exits, leaving SM as is. I think it's better to leave SM state as is after discovering a failure for further examination. But even if we wanted to 'clean up', it is not always possible. For instance, what happens when multicast group was created by osmtest, and the group removal fails? -- Yevgeny > > -- Hal > From sashak at voltaire.com Sun Oct 15 08:06:49 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 15 Oct 2006 17:06:49 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits In-Reply-To: References: Message-ID: <20061015150649.GA25758@sashak.voltaire.com> Hi Evgeny, On 16:31 Sun 15 Oct , Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes a few data type problems with OSM on > 64-bit Windows machines. Could you explain what those problems are? Sasha > The changes are done in the following files: > > opensm/osm_prtn_config.c > opensm/osm_pkey.c > opensm/osm_qos.c > > Note that the casting is done on the calculation result, > wich is string lenght, index in table, or index in string, > so it's ok to 'downcast' the value. > > The patch is for trunk only. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik > > Index: opensm/osm_prtn_config.c > =================================================================== > --- opensm/osm_prtn_config.c (revision 9820) > +++ opensm/osm_prtn_config.c (working copy) > @@ -254,7 +254,7 @@ static int parse_name_token(char *str, c > p++; > > q = p + strlen(p); > - len += q - str + 1; > + len += (int)(q - str) + 1; > while ( q != p && > ( *q == '\0' || *q == ' ' || *q == '\t' || *q == '\n')) > *q-- = '\0'; > @@ -293,7 +293,7 @@ static int parse_part_conf(struct part_c > if (*p == '\t' || *p == '\0' || *p == '\n') > p++; > > - len += p - str; > + len += (int)(p - str); > str = p; > > if (conf->p_prtn) > Index: opensm/osm_pkey.c > =================================================================== > --- opensm/osm_pkey.c (revision 9820) > +++ opensm/osm_pkey.c (working copy) > @@ -297,7 +297,7 @@ osm_pkey_tbl_get_block_and_idx( > (p_pkey < block->pkey_entry + IB_NUM_PKEY_ELEMENTS_IN_BLOCK)) > { > *p_block_idx = block_index; > - *p_pkey_idx = p_pkey - block->pkey_entry; > + *p_pkey_idx = (uint8_t)(p_pkey - block->pkey_entry); > return(IB_SUCCESS); > } > } > Index: opensm/osm_qos.c > =================================================================== > --- opensm/osm_qos.c (revision 9820) > +++ opensm/osm_qos.c (working copy) > @@ -399,7 +399,7 @@ static int parse_one_unsigned(char *str, > *val = strtoul(str, &end, 0); > if (*end) > end++; > - return end - str; > + return (int)(end - str); > } > > static int parse_vlarb_entry(char *str, ib_vl_arb_element_t * e) > @@ -410,7 +410,7 @@ static int parse_vlarb_entry(char *str, > e->vl = val % 15; > p += parse_one_unsigned(p, ',', &val); > e->weight = (uint8_t)val; > - return p - str; > + return (int)(p - str); > } > > static int parse_sl2vl_entry(char *str, uint8_t * raw) > @@ -420,7 +420,7 @@ static int parse_sl2vl_entry(char *str, > p += parse_one_unsigned(p, ',', &val1); > p += parse_one_unsigned(p, ',', &val2); > *raw = (val1 << 4) | (val2 & 0xf); > - return p - str; > + return (int)(p - str); > } > > static void qos_build_config(struct qos_config *cfg, > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From kliteyn at dev.mellanox.co.il Sun Oct 15 08:18:16 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 15 Oct 2006 17:18:16 +0200 Subject: [openib-general] [PATCH] OpenSM/osmtest.c: Fix float calculation in osmtest_stress_large_rmpp_pr In-Reply-To: <1160657724.32093.110260.camel@hal.voltaire.com> References: <1160657724.32093.110260.camel@hal.voltaire.com> Message-ID: <45325138.3000801@dev.mellanox.co.il> Good catch, thanks. -- Yevgeny Hal Rosenstock wrote: > OpenSM/osmtest.c: Fix float calculation in osmtest_stress_large_rmpp_pr > > Signed-off-by: Hal Rosenstock > --- > Index: osmtest/osmtest.c > =================================================================== > --- osmtest/osmtest.c (revision 9795) > +++ osmtest/osmtest.c (working copy) > @@ -2868,7 +2868,7 @@ osmtest_stress_large_rmpp_pr( IN osmtest > if (num_recs == 0) > ratio = 0; > else > - ratio = (float)(num_queries / num_recs); > + ratio = ((float)num_queries / (float)num_recs); > printf( "-I- Queries to Record Ratio is %" PRIu64 " records, %" PRIu64 " queries : %.2f \n", > num_recs, num_queries, ratio); > print_freq = 0; > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Sun Oct 15 08:48:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 15 Oct 2006 08:48:21 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: ( akepner@sgi.com's message of "Sun, 15 Oct 2006 02:02:44 -0700 (PDT)") References: Message-ID: > Apparently this is because writes to the doorbells from > different CPUs are clobbering one another. The following > patch adds mmiowb() calls after doorbell rings to ensure > the doorbell register updates are ordered. Makes sense. I was wondering if there would be any problems like this after John's message... > We discovered a problem when running IPoIB applications on > multiple CPUs on an Altix system. Many messages such as: > > ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq) > > appear in syslog, and the driver wedges up. However, this is a somewhat weird symptom, although I can imagine that out-of-order doorbells cause extra completions or something like that, which causes IPoIB to overrun the send queue. Adding the mmiowb()s definitely fixes things? > Signed-off-by: Should this be Signed-off-by: Arthur Kepner actually? (I just looked through the kernel git log to guess your name) > @@ -1730,6 +1732,9 @@ out: > mthca_write64(doorbell, > dev->kar + MTHCA_SEND_DOORBELL, > MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); > + /* use mmiowb to ensure write to doorbell is ordered > + * before releasing spinlock */ > + mmiowb(); > } > > qp->sq.next_ind = ind; Any reason why this mmiowb() is placed slightly differently from the others (which are right before the spin_unlock)? Thanks, Roland From kliteyn at dev.mellanox.co.il Sun Oct 15 08:45:29 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 15 Oct 2006 17:45:29 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits In-Reply-To: <20061015150649.GA25758@sashak.voltaire.com> References: <20061015150649.GA25758@sashak.voltaire.com> Message-ID: <45325799.5040705@dev.mellanox.co.il> Hi Sasha, Sasha Khapyorsky wrote: > Hi Evgeny, > > On 16:31 Sun 15 Oct , Yevgeny Kliteynik wrote: >> Hi Hal >> >> This patch fixes a few data type problems with OSM on >> 64-bit Windows machines. > > Could you explain what those problems are? Basically, in all three files the problem was assigning the result of pointer arithmetics (which is __int64) to an int/uint variable. Casting to int is ok because, as I said, this result is actually string length, index in table, or index in string, so no range check is required. -- Yevgeny > Sasha > > >> The changes are done in the following files: >> >> opensm/osm_prtn_config.c >> opensm/osm_pkey.c >> opensm/osm_qos.c >> >> Note that the casting is done on the calculation result, >> wich is string lenght, index in table, or index in string, >> so it's ok to 'downcast' the value. >> >> The patch is for trunk only. >> >> Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik >> >> Index: opensm/osm_prtn_config.c >> =================================================================== >> --- opensm/osm_prtn_config.c (revision 9820) >> +++ opensm/osm_prtn_config.c (working copy) >> @@ -254,7 +254,7 @@ static int parse_name_token(char *str, c >> p++; >> >> q = p + strlen(p); >> - len += q - str + 1; >> + len += (int)(q - str) + 1; >> while ( q != p && >> ( *q == '\0' || *q == ' ' || *q == '\t' || *q == '\n')) >> *q-- = '\0'; >> @@ -293,7 +293,7 @@ static int parse_part_conf(struct part_c >> if (*p == '\t' || *p == '\0' || *p == '\n') >> p++; >> >> - len += p - str; >> + len += (int)(p - str); >> str = p; >> >> if (conf->p_prtn) >> Index: opensm/osm_pkey.c >> =================================================================== >> --- opensm/osm_pkey.c (revision 9820) >> +++ opensm/osm_pkey.c (working copy) >> @@ -297,7 +297,7 @@ osm_pkey_tbl_get_block_and_idx( >> (p_pkey < block->pkey_entry + IB_NUM_PKEY_ELEMENTS_IN_BLOCK)) >> { >> *p_block_idx = block_index; >> - *p_pkey_idx = p_pkey - block->pkey_entry; >> + *p_pkey_idx = (uint8_t)(p_pkey - block->pkey_entry); >> return(IB_SUCCESS); >> } >> } >> Index: opensm/osm_qos.c >> =================================================================== >> --- opensm/osm_qos.c (revision 9820) >> +++ opensm/osm_qos.c (working copy) >> @@ -399,7 +399,7 @@ static int parse_one_unsigned(char *str, >> *val = strtoul(str, &end, 0); >> if (*end) >> end++; >> - return end - str; >> + return (int)(end - str); >> } >> >> static int parse_vlarb_entry(char *str, ib_vl_arb_element_t * e) >> @@ -410,7 +410,7 @@ static int parse_vlarb_entry(char *str, >> e->vl = val % 15; >> p += parse_one_unsigned(p, ',', &val); >> e->weight = (uint8_t)val; >> - return p - str; >> + return (int)(p - str); >> } >> >> static int parse_sl2vl_entry(char *str, uint8_t * raw) >> @@ -420,7 +420,7 @@ static int parse_sl2vl_entry(char *str, >> p += parse_one_unsigned(p, ',', &val1); >> p += parse_one_unsigned(p, ',', &val2); >> *raw = (val1 << 4) | (val2 & 0xf); >> - return p - str; >> + return (int)(p - str); >> } >> >> static void qos_build_config(struct qos_config *cfg, >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> > From mst at mellanox.co.il Sun Oct 15 08:59:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 15 Oct 2006 17:59:34 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1160870845.2917.334.camel@fc6.xsintricity.com> References: <1160870845.2917.334.camel@fc6.xsintricity.com> Message-ID: <20061015155934.GD15055@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: RHEL5 and OFED ... > > On Sat, 2006-10-14 at 22:14 +0200, Michael S. Tsirkin wrote: > > Quoting r. Doug Ledford : > > > Sorry. RHEL5 Beta1 has been out for a while, but OFED 1.1 still isn't > > > done yet. Obviously, I wasn't able to get something in RHEL5 that > > > didn't even exist prior to freeze. > > > > Would it be possible to include patches backporting fixes in infiniband kernel > > components from 2.6.18/OFED 1.1 to modules that already ship with RHEL5? > > Maybe. It would depend on the patch. Of course, keep in mind that > RHEL5 Beta1 *has* a 2.6.18 kernel. Really? That's way cool. How does one generate patches for you? Patch against vanilla 2.6.18? There are three kind of patches we might want: 1. The first thing would be to list fixes between 2.6.18 and 2.6.19-rc1 and backport these. Some of them are in OFED. 2. I think in cma module there are 2 main interfaces in OFED 1.1 missing in 2.6.18: API to set timeouts and retry count, and rdma_establish to handle passive side 3. Finally, OFED has some out of kernel modules (mostly UCMA and SDP) that it would be nice to add Where does the line pass, approximately? Backporting does take work, so I'd like to get a feeling on what's likely to get accepted before I commit to this. Now for userspace - does RHEL5 include at least libibverbs-1.0? This has been released a while back, and Roland makes regular bugfix releases. -- MST From cdmaest at sandia.gov Sun Oct 15 09:14:24 2006 From: cdmaest at sandia.gov (Maestas, Christopher Daniel) Date: Sun, 15 Oct 2006 10:14:24 -0600 Subject: [openib-general] RHEL5 and OFED ... Message-ID: <347180497203A942A6AA82C85846CBC9034F6001@ES23SNLNT.srn.sandia.gov> > Now for userspace - does RHEL5 include at least libibverbs-1.0? > This has been released a while back, and Roland makes regular bugfix releases. Here's what I see on a rhel4 u4 system: --- $ rpm -q libibverbs libibverbs-1.0.3-1 --- So I would think rhel5 would have at least that or greater. When I compiled rpms for 1.1rc7 it generated: --- # ls libibverbs-* libibverbs-1.0.4-0.x86_64.rpm libibverbs-utils-1.0.4-0.x86_64.rpm libibverbs-devel-1.0.4-0.x86_64.rpm --- From dledford at redhat.com Sun Oct 15 09:13:00 2006 From: dledford at redhat.com (Doug Ledford) Date: Sun, 15 Oct 2006 12:13:00 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061015155934.GD15055@mellanox.co.il> References: <1160870845.2917.334.camel@fc6.xsintricity.com> <20061015155934.GD15055@mellanox.co.il> Message-ID: <1160928780.2917.383.camel@fc6.xsintricity.com> On Sun, 2006-10-15 at 17:59 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: RHEL5 and OFED ... > > > > On Sat, 2006-10-14 at 22:14 +0200, Michael S. Tsirkin wrote: > > > Quoting r. Doug Ledford : > > > > Sorry. RHEL5 Beta1 has been out for a while, but OFED 1.1 still isn't > > > > done yet. Obviously, I wasn't able to get something in RHEL5 that > > > > didn't even exist prior to freeze. > > > > > > Would it be possible to include patches backporting fixes in infiniband kernel > > > components from 2.6.18/OFED 1.1 to modules that already ship with RHEL5? > > > > Maybe. It would depend on the patch. Of course, keep in mind that > > RHEL5 Beta1 *has* a 2.6.18 kernel. > > Really? That's way cool. > How does one generate patches for you? Patch against vanilla 2.6.18? The best way is to install the kernel src.rpm, then go into /usr/src/redhat/SPECS and rpmbuild --bp kernel-2.6.spec and then go into /usr/src/redhat/BUILD/kernel-2.6.18/linux-2.6.18.noarch and patch the appropriate files there. Save off originals with a unique extension, then use gendiff to create the patch. > There are three kind of patches we might want: > > 1. The first thing would be to list fixes between 2.6.18 and 2.6.19-rc1 and > backport these. Some of them are in OFED. That would be helpful. Since 2.6.19-rc looks to have integrated the iWARP merge, the fixes are no doubt mixed in with a bunch of new code, so I didn't pull anything from 2.6.19-rc since I was likely to break things. Targeted fixes that skip the iWARP changes from someone that knows them would be helpful. > 2. I think in cma module there are 2 main interfaces in OFED 1.1 missing in 2.6.18: > API to set timeouts and retry count, and rdma_establish to handle passive side OK. > 3. Finally, OFED has some out of kernel modules (mostly UCMA and SDP) > that it would be nice to add The UCMA module is indeed missing. The SDP module is present, but it's the older OFED 1.0 code that could stand to be updated. > Where does the line pass, approximately? Backporting does take work, so I'd > like to get a feeling on what's likely to get accepted before I commit to this. Right now I already have a bugzilla against RHEL5 beta about the missing UCMA, so that needs fixing anyway. A reproducer for the SDP problems in the OFED 1.0 stack will give me the ability to open an SDP bugzilla against RHEL5 and fix it up. > Now for userspace - does RHEL5 include at least libibverbs-1.0? > This has been released a while back, and Roland makes regular bugfix releases. It includes the OFED 1.0 libibverbs (which makes openmpi complain about lack of out of band data support, but otherwise seems to work). As a side note, it includes the OFED 1.0 libehca and ib_ehca kernel module. I didn't update the kernel code to the OFED 1.1 ehca because I didn't know if it would introduce compatibility problems with the user space library. However, it might be worthwhile to update that as well given the amount of change that has occurred and the difference in testing between the two. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From tziporet at dev.mellanox.co.il Sun Oct 15 09:19:15 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 15 Oct 2006 18:19:15 +0200 Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. In-Reply-To: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> References: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> Message-ID: <45325F83.4040309@dev.mellanox.co.il> Arlin Davis wrote: > Here is a patch to remove uDAPL scm provider from the build since it is no longer needed nor > supported. This provider was merely a stop gap until uCMA was pushed into kernel. > > Tziporet, can you get this change into OFED 1.1? > Does this fix a blocker bug? If not then I suggest we wait with this for 1.2 since its not a small patch. Tziporet From eli at dev.mellanox.co.il Sun Oct 15 09:27:15 2006 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Sun, 15 Oct 2006 18:27:15 +0200 Subject: [openib-general] IPOIB NAPI Message-ID: <1160929635.5389.2.camel@localhost> Hi Roland, can you tell when you are going to push your NAPI patch to ipoib? Is there anything I can do to help making this happen? From rdreier at cisco.com Sun Oct 15 09:39:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 15 Oct 2006 09:39:49 -0700 Subject: [openib-general] IPOIB NAPI In-Reply-To: <1160929635.5389.2.camel@localhost> (Eli Cohen's message of "Sun, 15 Oct 2006 18:27:15 +0200") References: <1160929635.5389.2.camel@localhost> Message-ID: Eli> Hi Roland, can you tell when you are going to push your NAPI Eli> patch to ipoib? Is there anything I can do to help making Eli> this happen? I've been meaning to mention this... I have a preliminary version in git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git ipoib-napi There are further changes I would like to add on top of that, but comments on the two patches there would be appreciated. And also benchmarks would be good. - R. From rdreier at cisco.com Sun Oct 15 09:54:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 15 Oct 2006 09:54:20 -0700 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061015155934.GD15055@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 15 Oct 2006 17:59:34 +0200") References: <1160870845.2917.334.camel@fc6.xsintricity.com> <20061015155934.GD15055@mellanox.co.il> Message-ID: Michael> 1. The first thing would be to list fixes between 2.6.18 Michael> and 2.6.19-rc1 and backport these. Some of them are in Michael> OFED. If you want to do this, I think it would be great to also submit the patches to -stable for inclusion in 2.6.18.x. - R. From sashak at voltaire.com Sun Oct 15 10:21:17 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 15 Oct 2006 19:21:17 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits In-Reply-To: <45325799.5040705@dev.mellanox.co.il> References: <20061015150649.GA25758@sashak.voltaire.com> <45325799.5040705@dev.mellanox.co.il> Message-ID: <20061015172117.GC30468@sashak.voltaire.com> On 17:45 Sun 15 Oct , Yevgeny Kliteynik wrote: > Hi Sasha, > > Sasha Khapyorsky wrote: > > Hi Evgeny, > > > > On 16:31 Sun 15 Oct , Yevgeny Kliteynik wrote: > >> Hi Hal > >> > >> This patch fixes a few data type problems with OSM on > >> 64-bit Windows machines. > > > > Could you explain what those problems are? > > Basically, in all three files the problem was assigning > the result of pointer arithmetics (which is __int64) to > an int/uint variable. > Casting to int is ok because, as I said, this result is > actually string length, index in table, or index in string, > so no range check is required. So isn't it better to shut-up compiler warnings/whatever with appropriate warning level flags instead of putting confused castings in the code? (I know there are couple of such already, but I don't think it was a good idea). Sasha From sweitzen at cisco.com Sun Oct 15 13:16:02 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 15 Oct 2006 13:16:02 -0700 Subject: [openib-general] We wish to do the 1.1 release next week Message-ID: Yes, bug 273 (http://openib.org/bugzilla/show_bug.cgi?id=273) is a blocking issue for Cisco. Roland sent a patch last Monday. I'm done testing the other parts of rc7, and am testing his patch later today. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren > Sent: Thursday, October 12, 2006 7:44 AM > To: openfabrics-ewg at openib.org; OPENIB > Subject: [openib-general] We wish to do the 1.1 release next week > > Hi all, > > I am back from vacation and found you waited with the release > for me :-) > > From a quick look at status mails I think we can do the official > release next week. > > Please reply if there are still any blocking issues you have. > > Also - please update all documents till end of Monday next week. > > Tziporet > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From kliteyn at dev.mellanox.co.il Sun Oct 15 14:22:46 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 15 Oct 2006 23:22:46 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits In-Reply-To: <20061015172117.GC30468@sashak.voltaire.com> References: <20061015150649.GA25758@sashak.voltaire.com> <45325799.5040705@dev.mellanox.co.il> <20061015172117.GC30468@sashak.voltaire.com> Message-ID: <4532A6A6.3040800@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 17:45 Sun 15 Oct , Yevgeny Kliteynik wrote: >> Hi Sasha, >> >> Sasha Khapyorsky wrote: >>> Hi Evgeny, >>> >>> On 16:31 Sun 15 Oct , Yevgeny Kliteynik wrote: >>>> Hi Hal >>>> >>>> This patch fixes a few data type problems with OSM on >>>> 64-bit Windows machines. >>> Could you explain what those problems are? >> >> Basically, in all three files the problem was assigning >> the result of pointer arithmetics (which is __int64) to >> an int/uint variable. >> Casting to int is ok because, as I said, this result is >> actually string length, index in table, or index in string, >> so no range check is required. > > So isn't it better to shut-up compiler warnings/whatever with appropriate > warning level flags instead of putting confused castings in the code? Personally, I don't like the idea of decreasing compiler's "suspiciousness" - it will result in writing less portable code. Just imagine what would it take to port OSM from Linux to Windows, if the Linux code wasn't originally compiled with a strict compiler. > (I know there are couple of such already, but I don't think it was a > good idea). IMO, small price to pay. -- Yevgeny > Sasha > From halr at voltaire.com Sun Oct 15 14:40:38 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Oct 2006 17:40:38 -0400 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: <45324B0B.1090004@dev.mellanox.co.il> References: <1160577041.32093.53620.camel@hal.voltaire.com> <45324B0B.1090004@dev.mellanox.co.il> Message-ID: <1160948436.32093.302983.camel@hal.voltaire.com> On Sun, 2006-10-15 at 10:51, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Wed, 2006-10-11 at 06:23, Yevgeny Kliteynik wrote: > >> Hi Hal > >> > >> Fixing a few problems in the multicast test flow, > >> plus some cosmetics. > >> > >> Yevgeny > >> > >> Signed-off-by: Yevgeny Kliteynik > > > > Thanks. Applied. See question below... > > > >> Index: osmt_multicast.c > >> =================================================================== > >> --- osmt_multicast.c (revision 9776) > >> +++ osmt_multicast.c (working copy) > > > > [snip...] > > > >> @@ -1808,14 +1865,54 @@ osmt_run_mcast_flow( IN osmtest_t * cons > >> > >> /* o15.0.1.6: */ > >> /* - Create a new MCG with valid requested MGID. */ > >> + osmt_init_mc_query_rec(p_osmt, &mc_req_rec); > >> + mc_req_rec.mgid = good_mgid; > >> > >> osm_log( &p_osmt->log, OSM_LOG_INFO, > >> "osmt_run_mcast_flow: " > >> - "Checking Create given MGID=0x%016" PRIx64 " : " > >> + "Checking Create given valid MGID=0x%016" PRIx64 " : " > >> + "0x%016" PRIx64 " (o15.0.1.6)...\n", > >> + cl_ntoh64(mc_req_rec.mgid.unicast.prefix), > >> + cl_ntoh64(mc_req_rec.mgid.unicast.interface_id)); > >> + > >> + /* Before creation, need to check that this group doesn't exist */ > >> + osm_log( &p_osmt->log, OSM_LOG_INFO, > >> + "osmt_run_mcast_flow: " > >> + "Verifying that MCGroup with this MGID doesn't exist by trying to Join it (o15.0.1.13)...\n"); > >> + > >> + ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_NON_MEMBER); > >> + > >> + osm_log( &p_osmt->log, OSM_LOG_ERROR, > >> + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); > >> + status = osmt_send_mcast_request( p_osmt, 1, /* join */ > >> + &mc_req_rec, > >> + comp_mask, > >> + &res_sa_mad ); > >> + osm_log( &p_osmt->log, OSM_LOG_ERROR, > >> + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); > >> + > >> + if ((status != IB_REMOTE_ERROR) || > >> + (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) > >> + { > >> + osm_log( &p_osmt->log, OSM_LOG_ERROR, > >> + "osmt_run_mcast_flow: ERR 0301: " > >> + "Tried joining group that shouldn't have existed - got %s/%s\n", > >> + ib_get_err_str( status ), > >> + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) > >> + ); > >> + status = IB_ERROR; > >> + goto Exit; > >> + } > > > > In the event this works, the SA is potentially left with some bad state > > because of this. Should the join be removed for this case ? > > When osmtest completes w/o error, at the end of the test flow > SM is returned to its initial state. Right. > When osmtest runs into some error, it exits, leaving SM as is. > I think it's better to leave SM state as is after discovering > a failure for further examination. I think this should depend on the failure. > But even if we wanted to 'clean up', it is not always possible. > For instance, what happens when multicast group was created by > osmtest, and the group removal fails? Right but the ones which can be cleaned up should be IMO. -- Hal > > -- > Yevgeny > > > > -- Hal > > From sashak at voltaire.com Sun Oct 15 15:44:06 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 16 Oct 2006 00:44:06 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits In-Reply-To: <4532A6A6.3040800@dev.mellanox.co.il> References: <20061015150649.GA25758@sashak.voltaire.com> <45325799.5040705@dev.mellanox.co.il> <20061015172117.GC30468@sashak.voltaire.com> <4532A6A6.3040800@dev.mellanox.co.il> Message-ID: <20061015224406.GJ30468@sashak.voltaire.com> On 23:22 Sun 15 Oct , Yevgeny Kliteynik wrote: > > > Sasha Khapyorsky wrote: > > On 17:45 Sun 15 Oct , Yevgeny Kliteynik wrote: > >> Hi Sasha, > >> > >> Sasha Khapyorsky wrote: > >>> Hi Evgeny, > >>> > >>> On 16:31 Sun 15 Oct , Yevgeny Kliteynik wrote: > >>>> Hi Hal > >>>> > >>>> This patch fixes a few data type problems with OSM on > >>>> 64-bit Windows machines. > >>> Could you explain what those problems are? > >> > >> Basically, in all three files the problem was assigning > >> the result of pointer arithmetics (which is __int64) to > >> an int/uint variable. > >> Casting to int is ok because, as I said, this result is > >> actually string length, index in table, or index in string, > >> so no range check is required. > > > > So isn't it better to shut-up compiler warnings/whatever with appropriate > > warning level flags instead of putting confused castings in the code? > > Personally, I don't like the idea of decreasing compiler's > "suspiciousness" - it will result in writing less portable > code. Explicit casting does not help in code portability, this may only hurt since hides potential problems. Also in general it is good idea to run compiler with warning level increased from time to time and to analyze the problems - default flags setting does not prevent to do this. > Just imagine what would it take to port OSM from Linux to > Windows, if the Linux code wasn't originally compiled with > a strict compiler. > > > (I know there are couple of such already, but I don't think it was a > > good idea). > > IMO, small price to pay. Small or not, it still be price (also it becomes *(many times)). Why it is necessary if there are other simple way to achieve the same? Sasha From bboas at systemfabricworks.com Sun Oct 15 14:02:52 2006 From: bboas at systemfabricworks.com (Bill Boas) Date: Sun, 15 Oct 2006 14:02:52 -0700 Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 Message-ID: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> To all in the OpenFabrics Community We will be holding our first Developer Summit in the Tampa Convention Center courtesy of SC06 starting at 1.30PM in Room 17 on Thursday November 16, 2006. On Friday November 17, we will start in Room 13 at 8.00 AM and continue till 5.00PM. We have had to schedule into these time slots because no other usable space is available at any other times during the week of SC06! OpenFabrics will cater food and beverages for afternoon break and supper on Thursday, breakfast, lunch and two breaks on Friday. We will set up a registration site at Acteva to collect $$ to cover our out of pocket expenses – I’ll email out the URL for that site in the next day or two. Please review attached Strawman purposes, suggested attendees and agenda. Any changes or comments, please email them to the community for all to comment on please. The Summit has several dimensions and themes throughout our work there: 1) – consistency and robustness of the Linux and Windows software stacks for Release 2.0 of OpenFabrics; 2) - feature selection, development resources and timelines for Release 2.0; 3) - activities, features and processes of the Enterprise Working Group on OFED 1.x until Release 2.0 is ready hand-off to the EWG; 4) – enhancing the resources of the EWG to be ready for 2.0 it so that it may be subsequently be distributed as OFED 2.0. and adopted by the OpenFabrics vendor and customer communities for production use. This is a far too much work for just a day and half! PLEASE START NOW exchanging ideas for additional features, contact peer engineers from companies and customers to discuss work sizing, development resources, identify volunteer developers for items so that when we meet on the 16th we’re not starting from a blank sheet! Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal Rosenstock, Tom Tucker and Bob Woodruff are leading the pre-meeting, STRAWMAN collation of requirements, feature prioritization, developer assignments, sizing and processes so that we have the list largely complete prior to the meeting and people know has already volunteered for items from the list. Bill Boas VP, Business Development | System Fabric Works bboas at systemfabricworks.com | 510-375-8840 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Tampa Convention Center Layout.pdf Type: application/pdf Size: 84407 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SC06_OFA_Developer_Summit_Strawman_10_15_06.ppt Type: application/vnd.ms-powerpoint Size: 163840 bytes Desc: not available URL: From krkumar2 at in.ibm.com Sun Oct 15 21:39:01 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 16 Oct 2006 10:09:01 +0530 Subject: [openib-general] [PATCH 1/2] Optimize cma_bind_loopback to check for empty list. Message-ID: <20061016043901.4944.77557.sendpatchset@localhost.localdomain> Optimize to test for an empty list. Patch made against 2.6.19-rc1 tree. Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -1480,19 +1480,18 @@ static int cma_bind_loopback(struct rdma u8 p; mutex_lock(&lock); + if (list_empty(&dev_list)) { + ret = -ENODEV; + goto out; + } list_for_each_entry(cma_dev, &dev_list, list) for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) - if (!ib_query_port (cma_dev->device, p, &port_attr) && + if (!ib_query_port(cma_dev->device, p, &port_attr) && port_attr.state == IB_PORT_ACTIVE) goto port_found; - if (!list_empty(&dev_list)) { - p = 1; - cma_dev = list_entry(dev_list.next, struct cma_device, list); - } else { - ret = -ENODEV; - goto out; - } + p = 1; + cma_dev = list_entry(dev_list.next, struct cma_device, list); port_found: ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); From krkumar2 at in.ibm.com Sun Oct 15 21:39:08 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Mon, 16 Oct 2006 10:09:08 +0530 Subject: [openib-general] [PATCH 2/2] Remove redundant check in cma_add_one. In-Reply-To: <20061016043901.4944.77557.sendpatchset@localhost.localdomain> References: <20061016043901.4944.77557.sendpatchset@localhost.localdomain> Message-ID: <20061016043908.4944.97154.sendpatchset@localhost.localdomain> Remove redundant check in cma_add_one(). Patch made against 2.6.19-rc1 tree. Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 @@ -2114,8 +2114,6 @@ static void cma_add_one(struct ib_device cma_dev->device = device; cma_dev->node_guid = device->node_guid; - if (!cma_dev->node_guid) - goto err; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); @@ -2127,9 +2125,6 @@ static void cma_add_one(struct ib_device list_for_each_entry(id_priv, &listen_any_list, list) cma_listen_on_dev(id_priv, cma_dev); mutex_unlock(&lock); - return; -err: - kfree(cma_dev); } static int cma_remove_id_dev(struct rdma_id_private *id_priv) From kliteyn at dev.mellanox.co.il Mon Oct 16 02:06:40 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 16 Oct 2006 11:06:40 +0200 Subject: [openib-general] [PATCH] opensm: mcast tables dump improvement In-Reply-To: <20061013003517.GA20139@sashak.voltaire.com> References: <20061013003517.GA20139@sashak.voltaire.com> Message-ID: <45334BA0.3080703@dev.mellanox.co.il> Hi Sasha. Looks good. One remark though: all the static functions should have the "__osm_" prefix in their names. -- Yevgeny Sasha Khapyorsky wrote: > This improves switch's mcast tables dumping and eliminates multiple file > open/seek/close sequences. In one word - cleanup. > > Signed-off-by: Sasha Khapyorsky > --- > osm/opensm/osm_mcast_mgr.c | 108 +++++++++++++++++++++----------------------- > 1 files changed, 52 insertions(+), 56 deletions(-) > > diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c > index cb0ffb1..f4d6954 100644 > --- a/osm/opensm/osm_mcast_mgr.c > +++ b/osm/opensm/osm_mcast_mgr.c > @@ -53,6 +53,7 @@ #endif /* HAVE_CONFIG_H */ > #include > #include > #include > +#include > #include > #include > #include > @@ -1377,10 +1378,12 @@ osm_mcast_mgr_process_tree( > > /********************************************************************** > **********************************************************************/ > + > static void > -osm_mcast_mgr_dump_mcast_routes( > +mcast_mgr_dump_sw_routes( > IN const osm_mcast_mgr_t* const p_mgr, > - IN const osm_switch_t* const p_sw ) > + IN const osm_switch_t* const p_sw, > + IN FILE *p_mcfdbFile) > { > osm_mcast_tbl_t* p_tbl; > int16_t mlid_ho = 0; > @@ -1390,35 +1393,14 @@ osm_mcast_mgr_dump_mcast_routes( > char line[OSM_REPORT_LINE_SIZE]; > boolean_t print_lid; > const osm_node_t* p_node; > - FILE * p_mcfdbFile; > uint16_t i, j; > uint16_t mask_entry; > - char *file_name = NULL; > > - OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_dump_mcast_routes ); > + OSM_LOG_ENTER( p_mgr->p_log, mcast_mgr_dump_sw_routes ); > > if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) > goto Exit; > > - file_name = > - (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); > - > - CL_ASSERT(file_name); > - > - strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); > - strcat(file_name, "/osm.mcfdbs"); > - > - /* Open the file or error */ > - p_mcfdbFile = fopen(file_name, "a"); > - if (! p_mcfdbFile) > - { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "osm_mcast_mgr_dump_mcast_routes: ERR 0A23: " > - "Failed to open mcfdb file (%s)\n", > - file_name ); > - goto Exit; > - } > - > p_node = osm_switch_get_node_ptr( p_sw ); > > p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); > @@ -1459,30 +1441,56 @@ osm_mcast_mgr_dump_mcast_routes( > block_num++; > } > > - fclose(p_mcfdbFile); > - > Exit: > - if (file_name) > - free(file_name); > OSM_LOG_EXIT( p_mgr->p_log ); > } > > +/********************************************************************** > + **********************************************************************/ > + > +struct mcast_mgr_dump_context { > + osm_mcast_mgr_t *p_mgr; > + FILE *file; > +}; > + > static void > -__unlink_mcast_fdb(IN osm_mcast_mgr_t* const p_mgr) > +mcast_mgr_dump_table(cl_map_item_t *p_map_item, void *context) > { > - char *file_name = NULL; > + osm_switch_t *p_sw = (osm_switch_t *)p_map_item; > + struct mcast_mgr_dump_context *cxt = context; > > - /* remove the old fdb dump file: */ > - file_name = > - (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); > + mcast_mgr_dump_sw_routes(cxt->p_mgr, p_sw, cxt->file); > +} > > - if( file_name ) > - { > - strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); > - strcat(file_name, "/osm.mcfdbs"); > - unlink(file_name); > - free(file_name); > - } > +static void > +mcast_mgr_dump_mcast_routes(osm_mcast_mgr_t *p_mgr) > +{ > + char file_name[1024]; > + struct mcast_mgr_dump_context dump_context; > + FILE *file; > + > + if (!osm_log_is_active(p_mgr->p_log, OSM_LOG_ROUTING)) > + return; > + > + snprintf(file_name, sizeof(file_name), "%s/%s", > + p_mgr->p_subn->opt.dump_files_dir, "osm.mcfdbs"); > + > + file = fopen(file_name, "w"); > + if (!file) { > + osm_log(p_mgr->p_log, OSM_LOG_ERROR, > + "mcast_dump_mcast_routes: ERR 0A18: " > + "cannot create mcfdb file \'%s\': %s\n", > + file_name, strerror(errno)); > + return; > + } > + > + dump_context.p_mgr = p_mgr; > + dump_context.file = file; > + > + cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, > + mcast_mgr_dump_table, &dump_context); > + > + fclose(file); > } > > /********************************************************************** > @@ -1518,12 +1526,6 @@ osm_mcast_mgr_process_mgrp( > goto Exit; > } > > - /* initialize the mc fdb dump file: */ > - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) > - { > - __unlink_mcast_fdb( p_mgr ); > - } > - > /* > Walk the switches and download the tables for each. > */ > @@ -1534,11 +1536,11 @@ osm_mcast_mgr_process_mgrp( > if( signal == OSM_SIGNAL_DONE_PENDING ) > pending_transactions = TRUE; > > - osm_mcast_mgr_dump_mcast_routes( p_mgr, p_sw ); > - > p_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); > } > > + mcast_mgr_dump_mcast_routes( p_mgr ); > + > Exit: > OSM_LOG_EXIT( p_mgr->p_log ); > > @@ -1594,12 +1596,6 @@ osm_mcast_mgr_process( > p_mgrp = (osm_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); > } > > - /* initialize the mc fdb dump file: */ > - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) > - { > - __unlink_mcast_fdb( p_mgr ); > - } > - > /* > Walk the switches and download the tables for each. > */ > @@ -1610,11 +1606,11 @@ osm_mcast_mgr_process( > if( signal == OSM_SIGNAL_DONE_PENDING ) > pending_transactions = TRUE; > > - osm_mcast_mgr_dump_mcast_routes( p_mgr, p_sw ); > - > p_sw = (osm_switch_t*)cl_qmap_next( &p_sw->map_item ); > } > > + mcast_mgr_dump_mcast_routes( p_mgr ); > + > CL_PLOCK_RELEASE( p_mgr->p_lock ); > > OSM_LOG_EXIT( p_mgr->p_log ); From yosef.et at gmail.com Mon Oct 16 02:34:43 2006 From: yosef.et at gmail.com (yosef etigin) Date: Mon, 16 Oct 2006 11:34:43 +0200 Subject: [openib-general] building OFED package from git and svn Message-ID: Hello, I have been trying to build OFED source package (1.1 rev 9820) as was described in the HOWTO.build_ofed wiki. The package was built successfully, however i had trouble compilng it. The error I get is a missing library inside OFED's temporary build tree. The error is during the compilation of DAPL: gcc: /tmp/OFED-1.1-rev9725/SOURCES/openib-1.1/src/userspace/librdmacm/src/.libs/.libs/librdmacm.so: No such file or directory log of the make process is attached. When I changed the top-level Makefile of open-ib sources, line 306: from: > AM_LDFLAGS="-L../libibverbs/src -libverbs -L../librdmacm/src/.libs > -lrdmacm -lsysfs" > to: > AM_LDFLAGS="-L../libibverbs/src -libverbs -L../librdmacm/src/ -lrdmacm > -lsysfs" It compiled OK. Is this really a problem in this Makefile or this fix covers up for something more deep? Yossi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: application/octet-stream Size: 15667 bytes Desc: not available URL: From kliteyn at dev.mellanox.co.il Mon Oct 16 02:47:50 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 16 Oct 2006 11:47:50 +0200 Subject: [openib-general] [PATCH] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment Message-ID: <45335546.5090909@dev.mellanox.co.il> Hi Hal Leaving OSM_LOG_DIR or OSM_CACHE_DIR environment variables empty will cause OSM to wright log or cache files to / since OSM runs as root process. Although one might say that this is just a question of point of view, I really think that to prevent root directory trashing (as I did by mistake on my machine), empty variable should be treated as if it is not set. If a user wants the SM to write something to /, he should specify this explicitly by setting "/" as an env. variable value. -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_db_files.c =================================================================== --- opensm/osm_db_files.c (revision 9820) +++ opensm/osm_db_files.c (working copy) @@ -182,6 +182,8 @@ osm_db_init( CL_ASSERT( p_db_imp != NULL); p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); + if (p_db_imp->db_dir_name && (strlen(p_db_imp->db_dir_name) == 0)) + p_db_imp->db_dir_name = NULL; if ( p_db_imp->db_dir_name == NULL ) p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9820) +++ opensm/osm_subnet.c (working copy) @@ -472,6 +472,8 @@ osm_subn_set_default_opt( p_opt->honor_guid2lid_file = FALSE; p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); + if (p_opt->dump_files_dir && (strlen(p_opt->dump_files_dir) == 0)) + p_opt->dump_files_dir = NULL; if (!p_opt->dump_files_dir) p_opt->dump_files_dir = OSM_DEFAULT_TMP_DIR; @@ -719,6 +721,8 @@ osm_subn_rescan_conf_file( char *p_key, *p_val ,*p_last; /* try to open the options file from the cache dir */ + if (p_cache_dir && (strlen(p_cache_dir) == 0)) + p_cache_dir = NULL; if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; @@ -770,6 +774,8 @@ osm_subn_parse_conf_file( char *p_key, *p_val ,*p_last; /* try to open the options file from the cache dir */ + if (p_cache_dir && (strlen(p_cache_dir) == 0)) + p_cache_dir = NULL; if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; @@ -1002,6 +1008,8 @@ osm_subn_write_conf_file( FILE *opts_file; /* try to open the options file from the cache dir */ + if (p_cache_dir && (strlen(p_cache_dir) == 0)) + p_cache_dir = NULL; if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; From halr at voltaire.com Mon Oct 16 03:09:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 06:09:07 -0400 Subject: [openib-general] [PATCH] opensm: mcast tables dump improvement In-Reply-To: <20061013003517.GA20139@sashak.voltaire.com> References: <20061013003517.GA20139@sashak.voltaire.com> Message-ID: <1160993336.32093.332416.camel@hal.voltaire.com> On Thu, 2006-10-12 at 20:35, Sasha Khapyorsky wrote: > This improves switch's mcast tables dumping and eliminates multiple file > open/seek/close sequences. In one word - cleanup. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 16 03:22:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 06:22:05 -0400 Subject: [openib-general] [PATCH] osm: port to WinIB stack - 64 bits In-Reply-To: References: Message-ID: <1160994115.32093.332892.camel@hal.voltaire.com> On Sun, 2006-10-15 at 10:31, Yevgeny Kliteynik wrote: > Hi Hal > > This patch fixes a few data type problems with OSM on > 64-bit Windows machines. > The changes are done in the following files: > > opensm/osm_prtn_config.c > opensm/osm_pkey.c > opensm/osm_qos.c > > Note that the casting is done on the calculation result, > wich is string lenght, index in table, or index in string, > so it's ok to 'downcast' the value. > > The patch is for trunk only. > > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From ogerlitz at voltaire.com Mon Oct 16 04:28:35 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 16 Oct 2006 13:28:35 +0200 Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. In-Reply-To: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> References: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> Message-ID: <45336CE3.1030609@voltaire.com> Arlin Davis wrote: > Here is a patch to remove uDAPL scm provider from the build since it is > no longer needed nor > supported. This provider was merely a stop gap until uCMA was pushed > into kernel. > > Tziporet, can you get this change into OFED 1.1? > > Signed-off by: Arlin Davis ardavis at ichips.intel.com Can you also apply the patch and remove the dapl/openib_scm directory from the openib SVN? Or. From ogerlitz at voltaire.com Mon Oct 16 04:32:30 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 16 Oct 2006 13:32:30 +0200 Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. In-Reply-To: <45325F83.4040309@dev.mellanox.co.il> References: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> <45325F83.4040309@dev.mellanox.co.il> Message-ID: <45336DCE.90104@voltaire.com> Tziporet Koren wrote: > Arlin Davis wrote: >> Here is a patch to remove uDAPL scm provider from the build since it >> is no longer needed nor >> supported. This provider was merely a stop gap until uCMA was pushed >> into kernel. >> >> Tziporet, can you get this change into OFED 1.1? >> > Does this fix a blocker bug? > If not then I suggest we wait with this for 1.2 since its not a small > patch. To clarify, without this patch, OFED provides an untested && unsupported udapl provider library (libdaplscm.so). The way i realized that this happens was when I got bug reports and questions from a customer who attempted to use it... So it should be removed. Or. From halr at voltaire.com Mon Oct 16 05:50:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 08:50:25 -0400 Subject: [openib-general] [PATCH 1/2] libibmad: Add support for DrSLID Message-ID: <1161003023.32093.338978.camel@hal.voltaire.com> libibmad: Add support for DrSLID Signed-off-by: Hal Rosenstock --- Index: ../libibmad/include/infiniband/mad.h =================================================================== --- ../libibmad/include/infiniband/mad.h (revision 9746) +++ ../libibmad/include/infiniband/mad.h (working copy) @@ -577,6 +577,7 @@ enum { IB_DEST_LID, IB_DEST_DRPATH, IB_DEST_GUID, + IB_DEST_DRSLID, }; enum { Index: ../libibmad/src/resolve.c =================================================================== --- ../libibmad/src/resolve.c (revision 9746) +++ ../libibmad/src/resolve.c (working copy) @@ -93,6 +93,9 @@ ib_resolve_portid_str(ib_portid_t *porti { uint64_t guid; int lid; + char *routepath; + ib_portid_t selfportid = {0}; + int selfport = 0; switch (dest_type) { case IB_DEST_LID: @@ -113,6 +116,20 @@ ib_resolve_portid_str(ib_portid_t *porti /* keep guid in portid? */ return ib_resolve_guid(portid, &guid, sm_id, 0); + case IB_DEST_DRSLID: + lid = strtol(addr_str, &routepath, 0); + routepath++; + if (!IB_LID_VALID(lid)) + return -1; + ib_portid_set(portid, lid, 0, 0); + + /* handle DR parsing and set DrSLID to local lid */ + if (ib_resolve_self(&selfportid, &selfport, 0) < 0) + return -1; + if (str2drpath(&portid->drpath, routepath, selfportid.lid, 0) < 0) + return -1; + return 0; + default: IBWARN("bad dest_type %d", dest_type); } Index: ../libibmad/src/smp.c =================================================================== --- ../libibmad/src/smp.c (revision 9746) +++ ../libibmad/src/smp.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -55,7 +55,9 @@ smp_set(void *data, ib_portid_t *portid, ib_rpc_t rpc = {0}; DEBUG("attr %d mod %d route %s", attrid, mod, portid2str(portid)); - if (portid->lid <= 0) + if ((portid->lid <= 0) || + (portid->drpath.drslid == 0xffff) || + (portid->drpath.drdlid == 0xffff)) rpc.mgtclass = IB_SMI_DIRECT_CLASS; /* direct SMI */ else rpc.mgtclass = IB_SMI_CLASS; /* Lid routed SMI */ @@ -87,7 +89,9 @@ smp_query(void *rcvbuf, ib_portid_t *por rpc.datasz = IB_SMP_DATA_SIZE; rpc.dataoffs = IB_SMP_DATA_OFFS; - if (portid->lid <= 0) + if ((portid->lid <= 0) || + (portid->drpath.drslid == 0xffff) || + (portid->drpath.drdlid == 0xffff)) rpc.mgtclass = IB_SMI_DIRECT_CLASS; /* direct SMI */ else rpc.mgtclass = IB_SMI_CLASS; /* Lid routed SMI */ Index: ../libibmad/src/mad.c =================================================================== --- ../libibmad/src/mad.c (revision 9746) +++ ../libibmad/src/mad.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -182,7 +182,10 @@ mad_build_pkt(void *umad, ib_rpc_t *rpc, else if (lid_routed) umad_set_addr(umad, dport->lid, dport->qp, 0, 0); else - umad_set_addr(umad, 0xffff, 0, 0, 0); /* direct routed smi */ + if ((dport->drpath.drslid != 0xffff) && (dport->lid > 0)) + umad_set_addr(umad, dport->lid, 0, 0, 0); + else + umad_set_addr(umad, 0xffff, 0, 0, 0); umad_set_grh(umad, (dport->grh && !is_smi) ? 0/*grh*/ : 0); /* FIXME: GRH support */ umad_set_pkey(umad, is_smi ? 0 : dport->pkey_idx); Index: ../libibmad/src/portid.c =================================================================== --- ../libibmad/src/portid.c (revision 9746) +++ ../libibmad/src/portid.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -79,7 +79,10 @@ portid2str(ib_portid_t *portid) *(uint64_t *)portid->gid, *(uint64_t *)(portid->gid+8)); } - return buf; + if (portid->drpath.cnt) + s += sprintf(s, " "); + else + return buf; } s += sprintf(s, "DR path "); for (i = 0; i < portid->drpath.cnt+1; i++) From halr at voltaire.com Mon Oct 16 05:50:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 08:50:45 -0400 Subject: [openib-general] [PATCH 2/2] Diags/smpquery: Add support for DrSLID Message-ID: <1161003043.32093.338980.camel@hal.voltaire.com> Diags/smpquery: Add support for DrSLID Signed-off-by: Hal Rosenstock --- Index: src/smpquery.c =================================================================== --- src/smpquery.c (revision 9776) +++ src/smpquery.c (working copy) @@ -376,7 +376,8 @@ usage(void) fprintf(stderr, "\n\texamples:\n"); fprintf(stderr, "\t\t%s portinfo 3 1\t\t\t\t# portinfo by lid, with port modifier\n", basename); fprintf(stderr, "\t\t%s -G switchinfo 0x2C9000100D051 1\t# switchinfo by guid\n", basename); - fprintf(stderr, "\t\t%s -D nodeinfo 0\t\t\t# nodeinfo by direct route\n", basename); + fprintf(stderr, "\t\t%s -D nodeinfo 0\t\t\t\t# nodeinfo by direct route\n", basename); + fprintf(stderr, "\t\t%s -c nodeinfo 6 0,12\t\t\t# nodeinfo by combined route\n", basename); exit(-1); } @@ -393,7 +394,7 @@ main(int argc, char **argv) char *err; op_fn_t *fn; - static char const str_opts[] = "C:P:t:s:devDGVhu"; + static char const str_opts[] = "C:P:t:s:devDcGVhu"; static const struct option long_opts[] = { { "C", 1, 0, 'C'}, { "P", 1, 0, 'P'}, @@ -401,6 +402,7 @@ main(int argc, char **argv) { "err_show", 0, 0, 'e'}, { "verbose", 0, 0, 'v'}, { "Direct", 0, 0, 'D'}, + { "combined", 0, 0, 'c'}, { "Guid", 0, 0, 'G'}, { "smlid", 1, 0, 's'}, { "timeout", 1, 0, 't'}, @@ -429,6 +431,9 @@ main(int argc, char **argv) case 'D': dest_type = IB_DEST_DRPATH; break; + case 'c': + dest_type = IB_DEST_DRSLID; + break; case 'G': dest_type = IB_DEST_GUID; break; @@ -469,11 +474,20 @@ main(int argc, char **argv) madrpc_init(ca, ca_port, mgmt_classes, 3); - if (ib_resolve_portid_str(&portid, argv[1], dest_type, sm_id) < 0) - IBERROR("can't resolve destination port %s", argv[1]); - - if ((err = fn(&portid, argv+2, argc-2))) - IBERROR("operation %s: %s", argv[0], err); - + if (dest_type != IB_DEST_DRSLID) { + if (ib_resolve_portid_str(&portid, argv[1], dest_type, sm_id) < 0) + IBERROR("can't resolve destination port %s", argv[1]); + if ((err = fn(&portid, argv+2, argc-2))) + IBERROR("operation %s: %s", argv[0], err); + } else { + char concat[64]; + + memset(concat, 0, 64); + snprintf(concat, sizeof(concat), "%s %s", argv[1], argv[2]); + if (ib_resolve_portid_str(&portid, concat, dest_type, sm_id) < 0) + IBERROR("can't resolve destination port %s", concat); + if ((err = fn(&portid, argv+3, argc-3))) + IBERROR("operation %s: %s", argv[0], err); + } exit(0); } Index: man/smpquery.8 =================================================================== --- man/smpquery.8 (revision 9776) +++ man/smpquery.8 (working copy) @@ -1,4 +1,4 @@ -.TH SMPQUERY 8 "July 25, 2006" "OpenIB" "OpenIB Diagnostics" +.TH SMPQUERY 8 "October 16, 2006" "OpenIB" "OpenIB Diagnostics" .SH NAME smpquery \- query InfiniBand subnet management attributes @@ -54,6 +54,11 @@ using the util_name -h syntax. "0" # self port "0,1,2,1,4" # out via port 1, then 2, ... .PP +\-c use combined route address arguments. The + address is a combination of a LID and a direct route path. + The LID specified is the DLID and the local LID is used + as the DrSLID. +.PP \-G use GUID address argument. In most cases, it is the Port GUID. Example: "0x08f1040023" @@ -88,6 +93,8 @@ smpquery portinfo 3 1 # portinfo smpquery -G switchinfo 0x2C9000100D051 1 # switchinfo by guid .PP smpquery -D nodeinfo 0 # nodeinfo by direct route +.PP +smpquery -c nodeinfo 6 0,12 # nodeinfo by combined route .SH SEE ALSO .BR smpdump (8) From halr at voltaire.com Mon Oct 16 05:53:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 08:53:58 -0400 Subject: [openib-general] [PATCH] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <45335546.5090909@dev.mellanox.co.il> References: <45335546.5090909@dev.mellanox.co.il> Message-ID: <1161003220.32093.339118.camel@hal.voltaire.com> On Mon, 2006-10-16 at 05:47, Yevgeny Kliteynik wrote: > Hi Hal > > Leaving OSM_LOG_DIR or OSM_CACHE_DIR environment variables > empty will cause OSM to wright log or cache files to / > since OSM runs as root process. > > Although one might say that this is just a question of point > of view, I really think that to prevent root directory trashing > (as I did by mistake on my machine), empty variable should be > treated as if it is not set. > If a user wants the SM to write something to /, he should specify > this explicitly by setting "/" as an env. variable value. > > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From tziporet at dev.mellanox.co.il Mon Oct 16 06:08:25 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 16 Oct 2006 15:08:25 +0200 Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. In-Reply-To: <45336DCE.90104@voltaire.com> References: <000001c6ee4e$fea9d2c0$67c8180a@amr.corp.intel.com> <45325F83.4040309@dev.mellanox.co.il> <45336DCE.90104@voltaire.com> Message-ID: <45338449.7040601@dev.mellanox.co.il> Or Gerlitz wrote: > Tziporet Koren wrote: > > To clarify, without this patch, OFED provides an untested && unsupported > udapl provider library (libdaplscm.so). The way i realized that this > happens was when I got bug reports and questions from a customer who > attempted to use it... So it should be removed. > > Or. > But we can document this in the release notes. Arlin - can you update uDAPL release notes with this or do you want me to do it. Thanks, Tziporet From mst at mellanox.co.il Mon Oct 16 06:25:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 16 Oct 2006 15:25:12 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <347180497203A942A6AA82C85846CBC9034F6001@ES23SNLNT.srn.sandia.gov> References: <347180497203A942A6AA82C85846CBC9034F6001@ES23SNLNT.srn.sandia.gov> Message-ID: <20061016132512.GD14878@mellanox.co.il> Quoting r. Maestas, Christopher Daniel : > Subject: Re: [openib-general] RHEL5 and OFED ... > > > Now for userspace - does RHEL5 include at least libibverbs-1.0? > > This has been released a while back, and Roland makes regular bugfix > releases. > > Here's what I see on a rhel4 u4 system: > --- > $ rpm -q libibverbs > libibverbs-1.0.3-1 > --- > > So I would think rhel5 would have at least that or greater. When I > compiled rpms for 1.1rc7 it generated: > --- > # ls libibverbs-* > libibverbs-1.0.4-0.x86_64.rpm libibverbs-utils-1.0.4-0.x86_64.rpm > libibverbs-devel-1.0.4-0.x86_64.rpm Dough, would it be possible to update this + libmthca? -- MST From mst at mellanox.co.il Mon Oct 16 06:25:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 16 Oct 2006 15:25:55 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: References: <1160870845.2917.334.camel@fc6.xsintricity.com> <20061015155934.GD15055@mellanox.co.il> Message-ID: <20061016132555.GE14878@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: RHEL5 and OFED ... > > Michael> 1. The first thing would be to list fixes between 2.6.18 > Michael> and 2.6.19-rc1 and backport these. Some of them are in > Michael> OFED. > > If you want to do this, I think it would be great to also submit the > patches to -stable for inclusion in 2.6.18.x. OKay. -- MST From bugzilla-daemon at openib.org Mon Oct 16 07:37:36 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 16 Oct 2006 07:37:36 -0700 (PDT) Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread on latest processors Message-ID: <20061016143736.74AB02283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=229 cdmaest at sandia.gov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cdmaest at sandia.gov ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dotanb at dev.mellanox.co.il Mon Oct 16 07:47:21 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 16 Oct 2006 16:47:21 +0200 Subject: [openib-general] here are man pages to the several verbs Message-ID: <45339B79.8000102@dev.mellanox.co.il> Hi. I started to write man pages for the ib verbs API in the user level. Those pages are based on the the verbs description (in verbs.h), on the IB spec and on my experience. Attached is a file with the man pages (binary + pod source files + Makefile) of the following verbs: ibv_get_device_list ibv_free_device_list ibv_get_device_guid ibv_get_device_name ibv_open_device ibv_close_device The pod file is using the Pod::Man perl module which is needed in order to compile the man pages. The man pages binaries can be added to the openib svn and be compiled only when there is a change in the pod files. I'm working on the rest of the verbs and soon (during the following weeks) i will send the rest of the files. Feedback is always welcome Dotan -------------- next part -------------- A non-text attachment was scrubbed... Name: ibverbs_man.tar.gz Type: application/x-gzip Size: 5013 bytes Desc: not available URL: From hoegeenanneke at hetnet.nl Mon Oct 16 07:58:18 2006 From: hoegeenanneke at hetnet.nl (Hoege en Anneke) Date: Mon, 16 Oct 2006 16:58:18 +0200 Subject: [openib-general] my winning prize Message-ID: <001201c6f133$8470aa00$0202a8c0@retestrak> Mr Sten Williams, We have won a price off 850.000,-- US dollars but we have to pay 500 pound delivery cost in frond. Can u ashore me that we won 850.000,-- US dollars en will you send it on paper to me. I only want to know because i don't want to pay 500,-- pound and recieve nothing. Hoege Heetebrij -------------- next part -------------- An HTML attachment was scrubbed... URL: From Arkady.Kanevsky at netapp.com Mon Oct 16 06:48:47 2006 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Mon, 16 Oct 2006 09:48:47 -0400 Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 Message-ID: Bill, 2 small changes to the diagram on slide 6. SRP box should be yellow since it is IB specific. Drop the word "R-NIC" from the User APIs box. I think we can improve this diagram message. Both kernel and user API boxes for "verbs/API" should be non-colored "common". Thanks, Arkady Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 ________________________________ From: Bill Boas [mailto:bboas at systemfabricworks.com] Sent: Sunday, October 15, 2006 5:03 PM To: openib-promoters at openib.org; Openfabrics-ewg at openib.org; openib-general at openib.org; financial at hsir.org Cc: a.federico at caspur.it ; Jacques-Charles.Lafoucriere at cea.fr ; ute.lembke at toyota-f1.com ; tsodring at simula.no ; 'Kyril Faenov'; gbrunsdo at cisco.com ; marc.mendez at bull.net ; wim.obbels at cc.kuleuven.be ; Pompey.Nagra at qlogic.com ; lthiers at datadirectnetworks.com; ftillier at silverstorm.com ; Patrice.Lucas at cea.fr ; dag.moxnes at sun.com ; Frank.Glaeser at thomson.net ; alexander.elbs at rz.uni-karlsruhe.de ; boenisch at hlrs.de; andrey.slepuhin at t-platforms.ru ; moray at quadrics.com ; bmt at zurich.ibm.com ; philippe.gregoire at cea.fr ; philippe.bernadat at hp.com ; eeb at bartonsoftware.com ; clement.t.cole at intel.com ; 'Jeffrey Scott'; Stephane.Thiell at cea.fr ; gmontry at neteffect.com ; wdey at cisco.com ; hermann.vondrateln at qlogic.com; ralf.koehler at thomson.net ; Stephane.Mathieu at cea.fr ; amir.sharif at cisco.com ; lui at zurich.ibm.com ; philippe.bruiant at falconstor.com ; lgatineau at serviware.com ; ince.muenchrath-weiss at intel.com ; bill.magro at intel.com ; psrivats at cisco.com ; slyness at silverstorm.com ; a.thomasch at de.ibm.com; tuan.phamdo at intel.com ; Kianoosh Naghshineh; svenar at simula.no ; xpillons at microsoft.com ; jean-louis.lavignon at bull.net; jpa at prism.uvsq.fr ; holger.obermaier at rz.uni-karlsruhe.de ; vzverev at genesis.spb.ru ; tskeie at simula.no ; gunther.mayer at volkswagen.de ; a.righi at cineca.it ; Harvey.Richardson at sun.com ; pchevaux at cisco.com ; nkelshik at cisco.com ; jean.Gonnord at cea.fr; sverre.jarp at cern.ch ; m.rosati at caspur.it ; thomas.jacob at intel.com ; line.holen at sun.com ; mans at arastra.com Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 To all in the OpenFabrics Community We will be holding our first Developer Summit in the Tampa Convention Center courtesy of SC06 starting at 1.30PM in Room 17 on Thursday November 16, 2006. On Friday November 17, we will start in Room 13 at 8.00 AM and continue till 5.00PM. We have had to schedule into these time slots because no other usable space is available at any other times during the week of SC06! OpenFabrics will cater food and beverages for afternoon break and supper on Thursday, breakfast, lunch and two breaks on Friday. We will set up a registration site at Acteva to collect $$ to cover our out of pocket expenses - I'll email out the URL for that site in the next day or two. Please review attached Strawman purposes, suggested attendees and agenda. Any changes or comments, please email them to the community for all to comment on please. The Summit has several dimensions and themes throughout our work there: 1) - consistency and robustness of the Linux and Windows software stacks for Release 2.0 of OpenFabrics; 2) - feature selection, development resources and timelines for Release 2.0; 3) - activities, features and processes of the Enterprise Working Group on OFED 1.x until Release 2.0 is ready hand-off to the EWG; 4) - enhancing the resources of the EWG to be ready for 2.0 it so that it may be subsequently be distributed as OFED 2.0. and adopted by the OpenFabrics vendor and customer communities for production use. This is a far too much work for just a day and half! PLEASE START NOW exchanging ideas for additional features, contact peer engineers from companies and customers to discuss work sizing, development resources, identify volunteer developers for items so that when we meet on the 16th we're not starting from a blank sheet! Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal Rosenstock, Tom Tucker and Bob Woodruff are leading the pre-meeting, STRAWMAN collation of requirements, feature prioritization, developer assignments, sizing and processes so that we have the list largely complete prior to the meeting and people know has already volunteered for items from the list. Bill Boas VP, Business Development | System Fabric Works bboas at systemfabricworks.com | 510-375-8840 -------------- next part -------------- An HTML attachment was scrubbed... URL: From arlin.r.davis at intel.com Mon Oct 16 09:18:13 2006 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Mon, 16 Oct 2006 09:18:13 -0700 Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. Message-ID: >But we can document this in the release notes. >Arlin - can you update uDAPL release notes with this or do you want me >to do it. Go ahead and update the release notes. If you could also apply the small dat.conf patch that removes SCM from the configuration it would reduce the changes of customer picking up SCM by mistake. Most customers refer to dat.conf for supported configurations. -arlin > >Thanks, >Tziporet From tziporet at dev.mellanox.co.il Mon Oct 16 09:22:00 2006 From: tziporet at dev.mellanox.co.il (tziporet at dev.mellanox.co.il) Date: Mon, 16 Oct 2006 18:22:00 +0200 (IST) Subject: [openib-general] [PATCH] remove scm provider from uDAPL build. In-Reply-To: References: Message-ID: <41649.194.90.237.34.1161015720.squirrel@dev.mellanox.co.il> > > > Go ahead and update the release notes. If you could also apply the small > dat.conf patch that removes SCM from the configuration it would reduce > the changes of customer picking up SCM by mistake. Most customers refer > to dat.conf for supported configurations. > > -arlin > OK I will Tziporet From tziporet at mellanox.co.il Mon Oct 16 09:25:33 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 16 Oct 2006 18:25:33 +0200 Subject: [openib-general] [openfabrics-ewg] We wish to do the 1.1 release next week Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACEAA@mtlexch01.mtl.com> This patch is already in. We will publish latest pre-release version tomorrow so everybody can do latest checks. Is this OK? Tziporet -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Sunday, October 15, 2006 10:16 PM To: Tziporet Koren; openfabrics-ewg at openib.org; OPENIB Subject: Re: [openfabrics-ewg] [openib-general] We wish to do the 1.1 release next week Yes, bug 273 (http://openib.org/bugzilla/show_bug.cgi?id=273) is a blocking issue for Cisco. Roland sent a patch last Monday. I'm done testing the other parts of rc7, and am testing his patch later today. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren > Sent: Thursday, October 12, 2006 7:44 AM > To: openfabrics-ewg at openib.org; OPENIB > Subject: [openib-general] We wish to do the 1.1 release next week > > Hi all, > > I am back from vacation and found you waited with the release > for me :-) > > From a quick look at status mails I think we can do the official > release next week. > > Please reply if there are still any blocking issues you have. > > Also - please update all documents till end of Monday next week. > > Tziporet > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From xma at us.ibm.com Mon Oct 16 09:30:47 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 16 Oct 2006 09:30:47 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: Message-ID: Hi, Roland, If we only support GSO enablement in ethtool, there is no problem. What I meant is anything related to MAC address in ethtool utility needs to be updated for IB device. Do you like the idea to add ethtool support in IPoIB? Do you want me to work on this? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From akepner at sgi.com Mon Oct 16 09:15:30 2006 From: akepner at sgi.com (akepner at sgi.com) Date: Mon, 16 Oct 2006 09:15:30 -0700 (PDT) Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: On Sun, 15 Oct 2006, Roland Dreier wrote: > .... > However, this is a somewhat weird symptom, although I can imagine that > out-of-order doorbells cause extra completions or something like that, > which causes IPoIB to overrun the send queue. > > Adding the mmiowb()s definitely fixes things? > At least with the workload that we used to reproduce this bug, yes. (The workload was simply 2 ttcp processes, each placed on a different node of an Altix.) Without the mmiowb()s things would hang very reliably and very quickly (within a second). With the additional mmiowb() calls I never observed a problem after 10's of minutes. > > Signed-off-by: > > Should this be > > Signed-off-by: Arthur Kepner > That's correct. Thanks. > actually? (I just looked through the kernel git log to guess your name) > > > @@ -1730,6 +1732,9 @@ out: > > mthca_write64(doorbell, > > dev->kar + MTHCA_SEND_DOORBELL, > > MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); > > + /* use mmiowb to ensure write to doorbell is ordered > > + * before releasing spinlock */ > > + mmiowb(); > > } > > > > qp->sq.next_ind = ind; > > Any reason why this mmiowb() is placed slightly differently from the > others (which are right before the spin_unlock)? > I wanted to put it in the "if (likely(nreq))" block so that we don't do the mmiowb() unless it's really necessary. A very minor optimization (but a co-worker reports that it does produce a measurable, but small performance improvement.) -- Arthur From rdreier at cisco.com Mon Oct 16 09:39:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 09:39:29 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: (Shirley Ma's message of "Mon, 16 Oct 2006 09:30:47 -0700") References: Message-ID: Shirley> If we only support GSO enablement in ethtool, there is no Shirley> problem. What I meant is anything related to MAC address Shirley> in ethtool utility needs to be updated for IB device. Shirley> Do you like the idea to add ethtool support in IPoIB? Do Shirley> you want me to work on this? It seems like a good thing at first blush, athough I haven't thought through all the implications. But yes, being able to dump info and set options via ethtool seems like a good idea. I assume there will also be ethtool changes to merge into the upstream ethtool sources. - R. From eli at dev.mellanox.co.il Mon Oct 16 09:41:20 2006 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Mon, 16 Oct 2006 18:41:20 +0200 Subject: [openib-general] IPOIB NAPI In-Reply-To: References: <1160929635.5389.2.camel@localhost> Message-ID: <1161016880.12344.10.camel@localhost> On Sun, 2006-10-15 at 09:39 -0700, Roland Dreier wrote: > I've been meaning to mention this... I have a preliminary version in > > git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git ipoib-napi > > There are further changes I would like to add on top of that, but > comments on the two patches there would be appreciated. And also > benchmarks would be good. Please diff to see my comments. Generaly it looks like the condition on netif_rx_reschedule() should be inverted. Also ou need to set max to some large value since you don't know if how many completions you missed and you want to make sure you get all the ones the sneaked from the last poll to the request notify. int ipoib_poll(struct net_device *dev, int *budget) { struct ipoib_dev_priv *priv = netdev_priv(dev); int max = min(*budget, dev->quota); int done; int t; int empty; int missed_event; int n, i; repoll: done = 0; empty = 0; while (max) { t = min(IPOIB_NUM_WC, max); n = ib_poll_cq(priv->cq, t, priv->ibwc); for (i = 0; i < n; ++i) { if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) { ++done; --max; ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); } else ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); } if (n != t) { empty = 1; break; } } dev->quota -= done; *budget -= done; if (empty) { netif_rx_complete(dev); ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP, &missed_event); if (missed_event && !netif_rx_reschedule(dev, 0)) { max = 1000; goto repoll; } return 0; } return 1; } From sweitzen at cisco.com Mon Oct 16 09:47:40 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 16 Oct 2006 09:47:40 -0700 Subject: [openib-general] [openfabrics-ewg] We wish to do the 1.1 release next week Message-ID: Yes, that would be great. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Tziporet Koren [mailto:tziporet at mellanox.co.il] > Sent: Monday, October 16, 2006 9:26 AM > To: Scott Weitzenkamp (sweitzen); Tziporet Koren; > openfabrics-ewg at openib.org; OPENIB > Subject: RE: [openfabrics-ewg] [openib-general] We wish to do > the 1.1 release next week > > This patch is already in. > We will publish latest pre-release version tomorrow so > everybody can do > latest checks. > > Is this OK? > Tziporet > > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott > Weitzenkamp (sweitzen) > Sent: Sunday, October 15, 2006 10:16 PM > To: Tziporet Koren; openfabrics-ewg at openib.org; OPENIB > Subject: Re: [openfabrics-ewg] [openib-general] We wish to do the 1.1 > release next week > > Yes, bug 273 (http://openib.org/bugzilla/show_bug.cgi?id=273) is a > blocking issue for Cisco. Roland sent a patch last Monday. I'm done > testing the other parts of rc7, and am testing his patch later today. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of > Tziporet Koren > > Sent: Thursday, October 12, 2006 7:44 AM > > To: openfabrics-ewg at openib.org; OPENIB > > Subject: [openib-general] We wish to do the 1.1 release next week > > > > Hi all, > > > > I am back from vacation and found you waited with the release > > for me :-) > > > > From a quick look at status mails I think we can do the official > > release next week. > > > > Please reply if there are still any blocking issues you have. > > > > Also - please update all documents till end of Monday next week. > > > > Tziporet > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From rdreier at cisco.com Mon Oct 16 09:48:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 09:48:11 -0700 Subject: [openib-general] IPOIB NAPI In-Reply-To: <1161016880.12344.10.camel@localhost> (Eli Cohen's message of "Mon, 16 Oct 2006 18:41:20 +0200") References: <1160929635.5389.2.camel@localhost> <1161016880.12344.10.camel@localhost> Message-ID: Eli> Please diff to see my comments. Generaly it looks like the Eli> condition on netif_rx_reschedule() should be inverted. Why? A return value of 0 means that the reschedule failed (probably because the poll routine is already running somewhere else) and the poll routine should just return. I think the code is correct as it stands. Eli> Also ou need to set max to some large value since you don't Eli> know if how many completions you missed and you want to make Eli> sure you get all the ones the sneaked from the last poll to Eli> the request notify. Why? max is there to limit us from doing more work than the quota passed in from the networking stack. If we fail to drain the CQ because we exhaust max, then the poll routine will return 1 and will remain scheduled, so the networking stack will call the poll routine again to continue grabbing completions. - R. From rdreier at cisco.com Mon Oct 16 09:55:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 09:55:11 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: ( akepner@sgi.com's message of "Mon, 16 Oct 2006 09:15:30 -0700 (PDT)") References: Message-ID: akepner> At least with the workload that we used to reproduce this akepner> bug, yes. (The workload was simply 2 ttcp processes, each akepner> placed on a different node of an Altix.) Without the akepner> mmiowb()s things would hang very reliably and very akepner> quickly (within a second). With the additional mmiowb() akepner> calls I never observed a problem after 10's of minutes. OK, cool. Sounds convincing to me. BTW -- are there Altix systems with PCIe? Have you tested the mthca_arbel_xxx (mem-free PCIe HCA) changes, or just the mthca_tavor_xxx (PCI-X HCA) parts? akepner> I wanted to put it in the "if (likely(nreq))" block so akepner> that we don't do the mmiowb() unless it's really akepner> necessary. A very minor optimization (but a co-worker akepner> reports that it does produce a measurable, but small akepner> performance improvement.) I see -- the other mmiowb()s are next to the spin_unlock()s elsewhere because the other routines might ring doorbells during the loop if someone passes in a ton of work requests, right? (All the mmiowb()s look necessary to me but I'm just curious about the level of testing) I'm still a little puzzled by the fact that it affects performance, because that "if (likely(nreq))" is super-super-likely: under any normal workload, I would expect it always to be true. It's really strange that there's any difference between mmiowb(); qp->sq.next_ind = ind; qp->sq.head += nreq; and qp->sq.next_ind = ind; qp->sq.head += nreq; mmiowb(); Anyway, it all looks good. I'll apply your patch and submit it to -stable for 2.6.18.x. Thanks, Roland From eli at dev.mellanox.co.il Mon Oct 16 09:59:23 2006 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Mon, 16 Oct 2006 18:59:23 +0200 Subject: [openib-general] IPOIB NAPI In-Reply-To: References: <1160929635.5389.2.camel@localhost> <1161016880.12344.10.camel@localhost> Message-ID: <1161017963.12344.13.camel@localhost> On Mon, 2006-10-16 at 09:48 -0700, Roland Dreier wrote: > Eli> Please diff to see my comments. Generaly it looks like the > Eli> condition on netif_rx_reschedule() should be inverted. > > Why? A return value of 0 means that the reschedule failed (probably > because the poll routine is already running somewhere else) and the > poll routine should just return. I think the code is correct as it stands. > > Eli> Also ou need to set max to some large value since you don't > Eli> know if how many completions you missed and you want to make > Eli> sure you get all the ones the sneaked from the last poll to > Eli> the request notify. > > Why? max is there to limit us from doing more work than the quota > passed in from the networking stack. If we fail to drain the CQ > because we exhaust max, then the poll routine will return 1 and will > remain scheduled, so the networking stack will call the poll routine > again to continue grabbing completions. > > - R. OK I see what you mean. So I guess it's OK then. From tziporet at mellanox.co.il Mon Oct 16 10:03:56 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 16 Oct 2006 19:03:56 +0200 Subject: [openib-general] OFED 1.1 release schedule Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> This is the plan to do the 1.1 release this week: We will publish 1.1-pre1 package tomorrow (Tue. 17-Oct) Only blocker issues from RC7 will be updated: 1. SRP fix for Cisco FC gateway 2. Small updates for the install 3. Fix in diagnet to support SM on a switch 4. Activate scaling code of ehca as default in the install 5. Documentation update Each company will have 3 days for latest certification process and then the release can be done on Thursday. Company owners - please approve if this is OK with you. If not please elaborate the blocking reasons. Thanks, Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Oct 16 10:24:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 10:24:21 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: ( akepner@sgi.com's message of "Sun, 15 Oct 2006 02:02:44 -0700 (PDT)") References: Message-ID: Actually, one further question: > @@ -314,6 +316,9 @@ void mthca_cq_clean(struct mthca_dev *de > wmb(); > cq->cons_index += nfreed; > update_cons_index(dev, cq, nfreed); > + /* use mmiowb to ensure update is ordered properly > + * prior to releasing the spinlock */ > + mmiowb(); > } > > spin_unlock_irq(&cq->lock); > @@ -711,6 +716,11 @@ repoll: > } > } > > + if (freed) { > + /* we rang the MTHCA_CQ_DOORBELL so use mmiowb > + * to make sure it is ordered properly */ > + mmiowb(); > + } > spin_unlock_irqrestore(&cq->lock, flags); > > return err == 0 || err == -EAGAIN ? npolled : err; Would it be cleaner just to put an mmiowb() inside update_cons_index() in the case where the doorbell gets rung? That way there's no unnecessary mmiowb() in the memfree case (which doesn't use a doorbell or do any writing to PCI at all). - R. From xma at us.ibm.com Mon Oct 16 10:29:51 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 16 Oct 2006 10:29:51 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: Message-ID: Good. Then after enabling GSO, we can chain multiple packets together in IPoIB for one doorbell to send large packet. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Oct 16 10:37:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 10:37:12 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: (Shirley Ma's message of "Mon, 16 Oct 2006 10:29:51 -0700") References: Message-ID: Shirley> Good. Then after enabling GSO, we can chain multiple Shirley> packets together in IPoIB for one doorbell to send large Shirley> packet. How does that work? GSO doesn't change the hard_start_xmit() interface, does it? - R. From mshefty at ichips.intel.com Mon Oct 16 10:36:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 10:36:15 -0700 Subject: [openib-general] [PATCH 1/2] Optimize cma_bind_loopback to check for empty list. In-Reply-To: <20061016043901.4944.77557.sendpatchset@localhost.localdomain> References: <20061016043901.4944.77557.sendpatchset@localhost.localdomain> Message-ID: <4533C30F.9060103@ichips.intel.com> Acked-by: Sean Hefty Roland, can we queue these two patches for 2.6.20? > Optimize to test for an empty list. Patch made against 2.6.19-rc1 tree. > > Signed-off-by: Krishna Kumar > -------- > diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c > --- org/drivers/infiniband/core/cma.c 2006-10-09 16:40:04.000000000 +0530 > +++ new/drivers/infiniband/core/cma.c 2006-10-09 16:52:03.000000000 +0530 > @@ -1480,19 +1480,18 @@ static int cma_bind_loopback(struct rdma > u8 p; > > mutex_lock(&lock); > + if (list_empty(&dev_list)) { > + ret = -ENODEV; > + goto out; > + } > list_for_each_entry(cma_dev, &dev_list, list) > for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) > - if (!ib_query_port (cma_dev->device, p, &port_attr) && > + if (!ib_query_port(cma_dev->device, p, &port_attr) && > port_attr.state == IB_PORT_ACTIVE) > goto port_found; > > - if (!list_empty(&dev_list)) { > - p = 1; > - cma_dev = list_entry(dev_list.next, struct cma_device, list); > - } else { > - ret = -ENODEV; > - goto out; > - } > + p = 1; > + cma_dev = list_entry(dev_list.next, struct cma_device, list); > > port_found: > ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); From rdreier at cisco.com Mon Oct 16 10:41:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 10:41:28 -0700 Subject: [openib-general] [PATCH 1/2] Optimize cma_bind_loopback to check for empty list. In-Reply-To: <4533C30F.9060103@ichips.intel.com> (Sean Hefty's message of "Mon, 16 Oct 2006 10:36:15 -0700") References: <20061016043901.4944.77557.sendpatchset@localhost.localdomain> <4533C30F.9060103@ichips.intel.com> Message-ID: > Roland, can we queue these two patches for 2.6.20? Sure, no problem. - R. From halr at voltaire.com Mon Oct 16 10:44:50 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 13:44:50 -0400 Subject: [openib-general] [PATCH] Diags/ibnetdiscover: More changes for IB routers Message-ID: <1161020686.32093.350424.camel@hal.voltaire.com> Diags/ibnetdiscover: More changes for IB routers Signed-off-by: Hal Rosenstock Index: src/ibnetdiscover.c =================================================================== --- src/ibnetdiscover.c (revision 9827) +++ src/ibnetdiscover.c (working copy) @@ -461,14 +461,30 @@ node_name(Node *node) void list_node(Node *node) { + char *node_type; + + switch(node->type) { + case SWITCH_NODE: + node_type = "Switch"; + break; + case CA_NODE: + node_type = "Ca"; + break; + case ROUTER_NODE: + node_type = "Router"; + break; + default: + node_type = "???"; + break; + } #if __WORDSIZE == 64 fprintf(f, "%s\t : 0x%016lx ports %d devid 0x%x vendid 0x%x \"%s\"\n", - node->type == SWITCH_NODE ? "Switch" : "Ca", + node_type, node->nodeguid, node->numports, node->devid, node->vendid, clean_nodedesc(node->nodedesc)); #else fprintf(f, "%s\t : 0x%016Lx ports %d devid 0x%x vendid 0x%x \"%s\"\n", - node->type == SWITCH_NODE ? "Switch" : "Ca", + node_type, node->nodeguid, node->numports, node->devid, node->vendid, clean_nodedesc(node->nodedesc)); #endif @@ -558,14 +574,31 @@ out_switch(Node *node, int group) void out_ca(Node *node) { + char *node_type; + char *node_type2; + out_ids(node); + switch(node->type) { + case CA_NODE: + node_type = "ca"; + node_type2 = "Ca"; + break; + case ROUTER_NODE: + node_type = "router"; + node_type2 = "Router"; + break; + default: + node_type = "???"; + node_type2 = "???"; + break; + } #if __WORDSIZE == 64 - fprintf(f, "%s=0x%lx\n", "caguid", node->nodeguid); + fprintf(f, "%s%s=0x%lx\n", node_type, "guid", node->nodeguid); #else - fprintf(f, "%s=0x%Lx\n", "caguid", node->nodeguid); + fprintf(f, "%s%s=0x%Lx\n", node_type, "guid", node->nodeguid); #endif fprintf(f, "%s\t%d %s\t\t# %s\n", - "Ca", node->numports, node_name(node), + node_type2, node->numports, node_name(node), clean_nodedesc(node->nodedesc)); } Index: include/ibnetdiscover.h =================================================================== --- include/ibnetdiscover.h (revision 9827) +++ include/ibnetdiscover.h (working copy) @@ -38,6 +38,7 @@ #define MAXHOPS 63 #define CA_NODE 1 #define SWITCH_NODE 2 +#define ROUTER_NODE 3 /* Vendor IDs (for chassis based systems) */ #define VTR_VENDOR_ID 0x8f1 /* Voltaire */ From xma at us.ibm.com Mon Oct 16 10:46:41 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 16 Oct 2006 10:46:41 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: Message-ID: Roland Dreier wrote on 10/16/2006 10:37:12 AM: > Shirley> Good. Then after enabling GSO, we can chain multiple > Shirley> packets together in IPoIB for one doorbell to send large > Shirley> packet. > > How does that work? GSO doesn't change the hard_start_xmit() > interface, does it? > > - R. No, it doesn't. I am thinking to add enqueue/dequeue multiple packets in qdisc. It would benifit other networking device. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon Oct 16 10:49:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 10:49:32 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: (Shirley Ma's message of "Mon, 16 Oct 2006 10:46:41 -0700") References: Message-ID: Shirley> No, it doesn't. I am thinking to add enqueue/dequeue Shirley> multiple packets in qdisc. It would benifit other Shirley> networking device. So am I understanding correctly -- this is other work that is independent of GSO? Is the plan to add a new optional driver method that extends hard_start_xmit() to accept multiple packets? - R. From xma at us.ibm.com Mon Oct 16 10:53:29 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 16 Oct 2006 10:53:29 -0700 Subject: [openib-general] enable GSO over IPoIB In-Reply-To: Message-ID: Roland Dreier wrote on 10/16/2006 10:49:32 AM: > Shirley> No, it doesn't. I am thinking to add enqueue/dequeue > Shirley> multiple packets in qdisc. It would benifit other > Shirley> networking device. > > So am I understanding correctly -- this is other work that is > independent of GSO? Is the plan to add a new optional driver method > that extends hard_start_xmit() to accept multiple packets? > > - R. Yes, you are right. It is the new work indendent of GSO. I hope I have the BW to do all the work on time. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From HNGUYEN at de.ibm.com Mon Oct 16 11:13:58 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 16 Oct 2006 20:13:58 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release schedule In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> Message-ID: Hi, > We will publish 1.1-pre1 package tomorrow (Tue. 17-Oct) > Only blocker issues from RC7 will be updated: > 1. SRP fix for Cisco FC gateway > 2. Small updates for the install currently we're working on the one install issue as I mentioned in another thread. We found out that the 64- and 32-bit binaries were built properly, but during the packaging the 32-bit binaries were not picked, but the 64-bit ones together with 32-bit libraries. We're trying to understand how the specific files for each rpm come in. We appreciate for any hints/suggestions. Has anyone else also observed this problem? > 3. Fix in diagnet to support SM on a switch > 4. Activate scaling code of ehca as default in the install Great, thanks! > 5. Documentation update > Each company will have 3 days for latest certification process and then the release can be done on Thursday. Supposed we could solve the issue above tomorrow this should be ok for us. Regards Hoang-Nam Nguyen From ardavis at ichips.intel.com Mon Oct 16 11:15:38 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 16 Oct 2006 11:15:38 -0700 Subject: [openib-general] OFED 1.1 release schedule In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> Message-ID: <4533CC4A.2070709@ichips.intel.com> Tziporet Koren wrote: > This is the plan to do the 1.1 release this week: > > > > We will publish 1.1-pre1 package tomorrow (Tue. 17-Oct) > > Only blocker issues from RC7 will be updated: > > 1. SRP fix for Cisco FC gateway > 2. Small updates for the install > 3. Fix in diagnet to support SM on a switch > 4. Activate scaling code of ehca as default in the install > 5. Documentation update > Can someone double check the ib_cm kernel patch (sean_cm_drep_on_not_found.patch) again and verify the build process. I don't see the cm_issue_drep symbol in an RC7 build. From the build logs it appears that the patch is applied but I do not see the symbol in the installed ib_cm.ko after the build is complete. system with OFED RC7.. nm ib_cm.ko | grep issue 0000000000001689 t cm_issue_rej system with latest svn pull.... nm ib_cm.ko | grep issue 000029f7 t cm_issue_drep 00001486 t cm_issue_rej -arlin From mshefty at ichips.intel.com Mon Oct 16 11:17:19 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 11:17:19 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <4531F12A.2000306@mellanox.co.il> References: <000201c6ecbe$e4660370$c0d4180a@amr.corp.intel.com> <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <20061010234933.GA29632@mellanox.co.il> <452CE68D.8040709@voltaire.com> <452CE6FC.5090303@mellanox.co.il> <452CF49F.40903@voltaire.com> <452CFD0C.2050709@mellanox.co.il> <452D1456.1050009@ichips.intel.com> <452D19CF.30006@mellanox.co.il> <452D2005.60905@ichips.intel.com> <452D25F7.1010902@mellanox.co.il> <452D2E4D.8000902@ichips.intel.com> <452D5002.1060901@mellanox.co.il> <452D6117.6040400@ichips.intel.com> <452DE8C4.2040804@mellanox.co.il> <452E7753.80402@ichips.intel.com> <4531F12A.2000306@mellanox.co.il> Message-ID: <4533CCAF.1000307@ichips.intel.com> Eitan Zahavi wrote: > I agree that layering on top is easier. But does it really solve the > bug? I think not. If you would REPLACE the API and not provide both options > (above and below refcount enforcement ) it would make sense to me. We disagree on the philosophy here. I view ib_mad as multiplexing to QP0/1, dispatching responses, and implementing rmpp. I've tried to keep as much class specific information out of the mad layer as possible, and I'm resistant to changing that. We don't want the mad layer implementing a bunch of class specific data. Its not designed around providing that capability, nor do I think that it should be. Consider that there's nothing that prevents a user from using the ib_mad interface to send CM mads directly, or even allocating a UD QP, and sending MADs by simply posting sends. - Sean From sweitzen at cisco.com Mon Oct 16 11:18:37 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 16 Oct 2006 11:18:37 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release schedule Message-ID: This plan is OK with Cisco. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Monday, October 16, 2006 10:04 AM To: Open Fabrics Cc: openib Subject: [openfabrics-ewg] OFED 1.1 release schedule This is the plan to do the 1.1 release this week: We will publish 1.1-pre1 package tomorrow (Tue. 17-Oct) Only blocker issues from RC7 will be updated: 1. SRP fix for Cisco FC gateway 2. Small updates for the install 3. Fix in diagnet to support SM on a switch 4. Activate scaling code of ehca as default in the install 5. Documentation update Each company will have 3 days for latest certification process and then the release can be done on Thursday. Company owners - please approve if this is OK with you. If not please elaborate the blocking reasons. Thanks, Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Mon Oct 16 12:04:36 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 16 Oct 2006 21:04:36 +0200 Subject: [openib-general] [PATCH] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <45335546.5090909@dev.mellanox.co.il> References: <45335546.5090909@dev.mellanox.co.il> Message-ID: <20061016190436.GA21672@sashak.voltaire.com> On 11:47 Mon 16 Oct , Yevgeny Kliteynik wrote: > Hi Hal > > Leaving OSM_LOG_DIR or OSM_CACHE_DIR environment variables > empty will cause OSM to wright log or cache files to / > since OSM runs as root process. > > Although one might say that this is just a question of point > of view, I really think that to prevent root directory trashing > (as I did by mistake on my machine), empty variable should be > treated as if it is not set. > If a user wants the SM to write something to /, he should specify > this explicitly by setting "/" as an env. variable value. > > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik > > Index: opensm/osm_db_files.c > =================================================================== > --- opensm/osm_db_files.c (revision 9820) > +++ opensm/osm_db_files.c (working copy) > @@ -182,6 +182,8 @@ osm_db_init( > CL_ASSERT( p_db_imp != NULL); > > p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); > + if (p_db_imp->db_dir_name && (strlen(p_db_imp->db_dir_name) == 0)) > + p_db_imp->db_dir_name = NULL; > if ( p_db_imp->db_dir_name == NULL ) > p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; Would be better something like: if (p_db_imp->db_dir_name == NULL || !*p_db_imp->dir_name) p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; This eliminates invocation of strlen() which will return !=0 in most of cases. The same is below. Sasha > > Index: opensm/osm_subnet.c > =================================================================== > --- opensm/osm_subnet.c (revision 9820) > +++ opensm/osm_subnet.c (working copy) > @@ -472,6 +472,8 @@ osm_subn_set_default_opt( > p_opt->honor_guid2lid_file = FALSE; > > p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); > + if (p_opt->dump_files_dir && (strlen(p_opt->dump_files_dir) == 0)) > + p_opt->dump_files_dir = NULL; > if (!p_opt->dump_files_dir) > p_opt->dump_files_dir = OSM_DEFAULT_TMP_DIR; > > @@ -719,6 +721,8 @@ osm_subn_rescan_conf_file( > char *p_key, *p_val ,*p_last; > > /* try to open the options file from the cache dir */ > + if (p_cache_dir && (strlen(p_cache_dir) == 0)) > + p_cache_dir = NULL; > if (! p_cache_dir) > p_cache_dir = OSM_DEFAULT_CACHE_DIR; > > @@ -770,6 +774,8 @@ osm_subn_parse_conf_file( > char *p_key, *p_val ,*p_last; > > /* try to open the options file from the cache dir */ > + if (p_cache_dir && (strlen(p_cache_dir) == 0)) > + p_cache_dir = NULL; > if (! p_cache_dir) > p_cache_dir = OSM_DEFAULT_CACHE_DIR; > > @@ -1002,6 +1008,8 @@ osm_subn_write_conf_file( > FILE *opts_file; > > /* try to open the options file from the cache dir */ > + if (p_cache_dir && (strlen(p_cache_dir) == 0)) > + p_cache_dir = NULL; > if (! p_cache_dir) > p_cache_dir = OSM_DEFAULT_CACHE_DIR; > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Mon Oct 16 12:03:00 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 16 Oct 2006 12:03:00 -0700 Subject: [openib-general] IPOIB NAPI In-Reply-To: <1161017963.12344.13.camel@localhost> Message-ID: Roland, Don't know why I have trouble to get this patch from your git tree. Do you mind to post this patch here so I can test the performance over ehca? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From somenath at veritas.com Sun Oct 15 12:16:21 2006 From: somenath at veritas.com (somenath) Date: Sun, 15 Oct 2006 12:16:21 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000201c6ef5c$13fdfd00$dec8180a@amr.corp.intel.com> References: <000201c6ef5c$13fdfd00$dec8180a@amr.corp.intel.com> Message-ID: <45328905.9000904@veritas.com> Sean Hefty wrote: >>>pri_path = alt_path = path 1 >>>works >>> >>> >>> >>no, I haven't tested that. I can try that too, if u think that can >>provide useful info.. >> >> > >I misunderstood one of your earlier e-mails then. I threw together a test case >to try this, and it worked for me. Can you see if the same works for you? If >not, then my guess is that the release you're using is missing some needed >patches. (You may be able to work around the issue in your code, however, so >we'll see what can be done.) > >My systems only have one path between them, so I until I can physically add >another path, I won't be able to test the case where pri_path != alt_path. > >- Sean > > Sean: now I am able to create connections with pri_path = path1, alt_path = path2 with the following change in the code. I specify the port_num and al_port_num before calling ib_modify_qp() to change state to RTR (earlier I was changing this when modifying state to IB_QPS_INIT). ( but it fails if I do this qp_attr_mask |= IB_QP_PORT, should it fail?) but the connection migration to alt_path etc. is not yet tested, so not sure how much is working...will check what happens next. thanks, som. From mshefty at ichips.intel.com Mon Oct 16 12:29:05 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 12:29:05 -0700 Subject: [openib-general] [RFC] Notice/InformInfo event reporting Message-ID: <4533DD81.9030502@ichips.intel.com> I'm beginning work on adding InformInfo/Notice event reporting to the IB stack, and I'd like any input on potential implementations, as well as intended usage. Clients use InformInfo to register for events, with registration tracked on a per source QP basis. Given this, possible approaches are: 1. Clients can perform their own registration using their own QPs. If several clients wish to register for the same event, multiple QPs would be used. Additional traffic would be used when reporting events. But, event dispatching is centralized to the SA. 2. A single registration manager can perform all registrations. This would require reference counting registration requests. At a high level, the behavior is similar to what's done for multicast join/leave. This limits use to a single QP, and minimizes traffic, but duplicates event dispatching code on every node. 2a. Using option 2, a registration manager could register to receive all events, then filter based on local registration requests. This would prevent overlapping requests to the SA, but increase the number of events seen at each end node. 2b. Similar to option 2a, but clients would see all events (possibly filtered on type only), requiring that they perform additional filtering. My current thinking is to register for all events, then require that clients filter unwanted events. (Security events would be filtered from userspace clients.) - Sean From akepner at sgi.com Mon Oct 16 12:14:18 2006 From: akepner at sgi.com (akepner at sgi.com) Date: Mon, 16 Oct 2006 12:14:18 -0700 (PDT) Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: On Mon, 16 Oct 2006, Roland Dreier wrote: > OK, cool. Sounds convincing to me. BTW -- are there Altix systems > with PCIe? Have you tested the mthca_arbel_xxx (mem-free PCIe HCA) > changes, or just the mthca_tavor_xxx (PCI-X HCA) parts? Yes, there are PCIe Altices, but I was unable to reproduce the hang on one of them. It was easy to reproduce on a PCIX machine, though. > > I see -- the other mmiowb()s are next to the spin_unlock()s elsewhere > because the other routines might ring doorbells during the loop if > someone passes in a ton of work requests, right? I tried to call mmiowb() only when it was clear that a doorbell write had occurred. In some cases there's an obvious condition to test, but in others (e.g., mthca_tavor_post_srq_recv()), "if (likely(nreq))" doesn't appear to the right condition, as nreq might be reset to 0 above with: if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { nreq = 0; ..... (Which, I suppose, is the long way of answering your question with "yes".) > .... (All the mmiowb()s > look necessary to me but I'm just curious about the level of testing) I've "smoke tested" this by building, installing, running on ia64 (PCIe and PCIX) and x86_64. Just some simple IPoIB apps (no smoke was detected.) But the more testing the better, for sure. > > I'm still a little puzzled by the fact that it affects performance, > because that "if (likely(nreq))" is super-super-likely: under any > normal workload, I would expect it always to be true. It's really > strange that there's any difference between > > mmiowb(); > qp->sq.next_ind = ind; > qp->sq.head += nreq; > > and > > qp->sq.next_ind = ind; > qp->sq.head += nreq; > mmiowb(); > I'd assumed any difference was attributable to a small fraction of times when the "likely" branch wasn't taken, but I didn't verify this (just passing on a very qualitative take on data acquired by a co-worker; maybe the difference was just "in the noise" after all.) I agree that the reordering of instructions as you show above would be very highly unlikely to have any measurable affect. > Anyway, it all looks good. I'll apply your patch and submit it to > -stable for 2.6.18.x. > Thanks. -- Arthur From sean.hefty at intel.com Mon Oct 16 12:33:42 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 12:33:42 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <45328905.9000904@veritas.com> Message-ID: <000101c6f159$fd5f4360$8698070a@amr.corp.intel.com> >now I am able to create connections with pri_path = path1, alt_path = path2 >with the following change in the code. > >I specify the port_num and al_port_num before calling ib_modify_qp() to >change >state to RTR (earlier I was changing this when modifying state to >IB_QPS_INIT). Doesn't ib_cm_init_qp_attr() set this for you? >( but it fails if I do this qp_attr_mask |= IB_QP_PORT, should it fail?) This sets the primary physical port, which is only valid transitioning to INIT. - Sean From akepner at sgi.com Mon Oct 16 12:17:12 2006 From: akepner at sgi.com (akepner at sgi.com) Date: Mon, 16 Oct 2006 12:17:12 -0700 (PDT) Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: On Mon, 16 Oct 2006, Roland Dreier wrote: > > Would it be cleaner just to put an mmiowb() inside update_cons_index() > in the case where the doorbell gets rung? That way there's no > unnecessary mmiowb() in the memfree case (which doesn't use a doorbell > or do any writing to PCI at all). > On second (or eighth, or whatever) look, yeah, that's better and simpler, too. I'll repost a fixed up version. -- Arthur From trimmer at silverstorm.com Mon Oct 16 12:50:21 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 16 Oct 2006 15:50:21 -0400 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: <4533DD81.9030502@ichips.intel.com> Message-ID: > From: Sean Hefty > Sent: Monday, October 16, 2006 3:29 PM > To: openib > Subject: [openib-general] [RFC] Notice/InformInfo event reporting > > I'm beginning work on adding InformInfo/Notice event reporting to the IB > stack, > and I'd like any input on potential implementations, as well as intended > usage. > > Clients use InformInfo to register for events, with registration tracked > on a > per source QP basis. Given this, possible approaches are: > > 1. Clients can perform their own registration using their own QPs. If > several > clients wish to register for the same event, multiple QPs would be used. > Additional traffic would be used when reporting events. But, event > dispatching > is centralized to the SA. > > 2. A single registration manager can perform all registrations. This > would > require reference counting registration requests. At a high level, the > behavior > is similar to what's done for multicast join/leave. This limits use to a > single > QP, and minimizes traffic, but duplicates event dispatching code on every > node. > > 2a. Using option 2, a registration manager could register to receive all > events, then filter based on local registration requests. This would > prevent > overlapping requests to the SA, but increase the number of events seen at > each > end node. > > 2b. Similar to option 2a, but clients would see all events (possibly > filtered > on type only), requiring that they perform additional filtering. > > My current thinking is to register for all events, then require that > clients > filter unwanted events. (Security events would be filtered from userspace > clients.) My recommendation is option 2. In large fabrics the SA can be a bottleneck. It is best for an end node to register with the SA only for the events which are of actual interest to the end node. With regards to "duplicating dispatching code on every node", rather than duplication, think of this as "distributing event dispatching code among the interested nodes". Thinking of it in these terms makes option 2 stand out as more scalable. Todd Rimmer From mst at mellanox.co.il Mon Oct 16 12:50:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 16 Oct 2006 21:50:55 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1160928780.2917.383.camel@fc6.xsintricity.com> References: <1160928780.2917.383.camel@fc6.xsintricity.com> Message-ID: <20061016195055.GC20009@mellanox.co.il> Quoting r. Doug Ledford : > > 1. The first thing would be to list fixes between 2.6.18 and 2.6.19-rc1 and > > backport these. Some of them are in OFED. > > That would be helpful. Since 2.6.19-rc looks to have integrated the > iWARP merge, the fixes are no doubt mixed in with a bunch of new code, > so I didn't pull anything from 2.6.19-rc since I was likely to break > things. Targeted fixes that skip the iWARP changes from someone that > knows them would be helpful. OK. BTW, -stable is a good place to kae things from. Already, you might want to take d083f6d9648646833aed11ce81009128baf897f9 from 2.6.18.1: "IB/mthca: Fix lid used for sending traps" http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.18.1 this fixes a regression from 2.6.17. -- MST From mst at mellanox.co.il Mon Oct 16 13:05:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 16 Oct 2006 22:05:33 +0200 Subject: [openib-general] IPOIB NAPI In-Reply-To: References: <1160929635.5389.2.camel@localhost> Message-ID: <20061016200533.GD20009@mellanox.co.il> Quoting r. Roland Dreier : > There are further changes I would like to add on top of that, but > comments on the two patches there would be appreciated. A small optimization: if (missed_event && netif_rx_reschedule(dev, 0)) should be, I think if (unlikely(missed_event) && netif_rx_reschedule(dev, 0)) since we are talking about an unlikely race where CQ became non-empty just as we were calling req_notify_cq. An API idea: how about instead testing missed_events, we add a flag: IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?) and change ib_req_notify_cq to return int which will keep the missed_events value, only if this flag is set? This has 2 advatages - Less churn updating all users to new API - they just ignore return value - and still almost no overhead for them as they don't set IB_CQ_TEST - For all users we have to push less values on stack - note compiler can't get rid of them as we are calling function through a pointer - For users that do missed_events = ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP | IB_CQ_TEST) we get the result in register. I agree its a minor optimization, but I think quite a similiar change went in in the linux irq code - waste not, want not. Want to see hw a patch like this will look? -- MST From rdreier at cisco.com Mon Oct 16 13:05:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 13:05:38 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: ( akepner@sgi.com's message of "Mon, 16 Oct 2006 12:17:12 -0700 (PDT)") References: Message-ID: akepner> On second (or eighth, or whatever) look, yeah, that's akepner> better and simpler, too. I'll repost a fixed up version. No need -- I'll just revise it in my tree (since I already grabbed your patch). Thanks, Roland From troy at scl.ameslab.gov Mon Oct 16 13:22:35 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Mon, 16 Oct 2006 15:22:35 -0500 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? Message-ID: I am running PVFS2 on OpenIB, with IBM's ehca. When we start writing/reading large files, either with the NetPIPE PVFS module we have or a modified GAMESS executable that uses libpvfs2 directly, the 'ibv_reg_mr' function fails, and we get an error. This is also correlated with kernel log messages like this: Oct 16 11:14:45 p5l8 kernel: PU0003 000e0091:ehca_hcall_7arg_7ret HCAD_ERROR opco de=160 ret=fffffffffffffff7 arg1=1000000003000004 arg2=5 arg3=14f0ebc8 arg4=10000 arg5=e0000000000000 arg6=e3e9f200 arg7=0 out1=0 out2=0 out3=0 out4=0 out5=0 out6=0 out7=0 Oct 16 11:14:45 p5l8 kernel: PU0003 00090454:ehca_reg_mr HCAD_ERROR hipz_alloc_mr failed, h_ret=fffffffffffffff7 hca_hndl=1000000003000004 Oct 16 11:14:45 p5l8 kernel: PU0003 00090478:ehca_reg_mr <<< ret=ffffffea shca=c00 00000e796b000 e_mr=c0000000d22c7d80 iova_start=0000000014f0ebc8 size=10000 acl=7 e _pd=c0000000e3e9f200 pginfo=c0000001ad37fa70 num_pages=11 num_4k=11 Oct 16 11:14:45 p5l8 kernel: PU0003 00090176:ehca_reg_user_mr <<< rc=fffffffffffff fea pd=c0000000e3e9f200 region=c0000000cb73a9d0 mr_access_flags=7 udata=c0000001ad 37fba0 We are able to run on a 4x PCI-X Mellanox HCA, but obviously I'd like to be using the 12x ehca. From dledford at redhat.com Mon Oct 16 13:50:49 2006 From: dledford at redhat.com (Doug Ledford) Date: Mon, 16 Oct 2006 16:50:49 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061016132512.GD14878@mellanox.co.il> References: <347180497203A942A6AA82C85846CBC9034F6001@ES23SNLNT.srn.sandia.gov> <20061016132512.GD14878@mellanox.co.il> Message-ID: <1161031849.2917.400.camel@fc6.xsintricity.com> On Mon, 2006-10-16 at 15:25 +0200, Michael S. Tsirkin wrote: > Quoting r. Maestas, Christopher Daniel : > > Subject: Re: [openib-general] RHEL5 and OFED ... > > > > > Now for userspace - does RHEL5 include at least libibverbs-1.0? > > > This has been released a while back, and Roland makes regular bugfix > > releases. > > > > Here's what I see on a rhel4 u4 system: > > --- > > $ rpm -q libibverbs > > libibverbs-1.0.3-1 > > --- > > > > So I would think rhel5 would have at least that or greater. When I > > compiled rpms for 1.1rc7 it generated: > > --- > > # ls libibverbs-* > > libibverbs-1.0.4-0.x86_64.rpm libibverbs-utils-1.0.4-0.x86_64.rpm > > libibverbs-devel-1.0.4-0.x86_64.rpm > > Dough, would it be possible to update this + libmthca? Possibly. What's the justification? What's in 1.0.4 that is the primary reason for wanting to update from 1.0.3? -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From somenath at veritas.com Sun Oct 15 14:01:18 2006 From: somenath at veritas.com (somenath) Date: Sun, 15 Oct 2006 14:01:18 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000101c6f159$fd5f4360$8698070a@amr.corp.intel.com> References: <000101c6f159$fd5f4360$8698070a@amr.corp.intel.com> Message-ID: <4532A19E.2090102@veritas.com> Sean Hefty wrote: >>now I am able to create connections with pri_path = path1, alt_path = path2 >>with the following change in the code. >> >>I specify the port_num and al_port_num before calling ib_modify_qp() to >>change >>state to RTR (earlier I was changing this when modifying state to >>IB_QPS_INIT). >> >> > >Doesn't ib_cm_init_qp_attr() set this for you? > > No, it doesn't. it returns me attr_mask= 0x12d181 port=0x0 alt_port=0x0 other attribs are: qp_state=0x2 cur_qp_state=0x0 path_mtu=0x4 path_mig_state=0x0 qkey=0x0 rq_psn=0x0 sq_psn=0x0 dest_qp_num=0x407 qp_access_flags=0x0 pkey_index=0x0 min_rnr_timer=0x0 port_num=0x0 timeout=0x0 retry_cnt=0x0 rnr_retry=0x0 alt_port_num=0x0 cap: max_send_wr=0x0 max_recv_wr=0x0 max_send_sge=0x0 max_recv_sge=0x0 max_inline_d ate=0x0 thanks, som., > > >>( but it fails if I do this qp_attr_mask |= IB_QP_PORT, should it fail?) >> >> > >This sets the primary physical port, which is only valid transitioning to INIT. > >- Sean > > From mshefty at ichips.intel.com Mon Oct 16 14:03:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 14:03:50 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <4532A19E.2090102@veritas.com> References: <000101c6f159$fd5f4360$8698070a@amr.corp.intel.com> <4532A19E.2090102@veritas.com> Message-ID: <4533F3B6.1030509@ichips.intel.com> somenath wrote: >> Doesn't ib_cm_init_qp_attr() set this for you? > > No, it doesn't. it returns me > attr_mask= 0x12d181 > port=0x0 alt_port=0x0 Okay - there was a fix to the cm.c file (svn rev 8267) that added setting the alternate port number when initializing the QP attributes. Apparently that fix did not make it into the release that you're using. - Sean From mshefty at ichips.intel.com Mon Oct 16 14:32:56 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 14:32:56 -0700 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: References: Message-ID: <4533FA88.8050105@ichips.intel.com> Rimmer, Todd wrote: > My recommendation is option 2. Thanks for the response. > In large fabrics the SA can be a bottleneck. It is best for an end node > to register with the SA only for the events which are of actual interest > to the end node. Which part of the SA is the bottleneck? Is it the sending of MADs, or the processing of events to determine which end nodes are interested in the event? My thinking was that if events are rare, then having the SA simply forward the events to the end nodes saves processing time on the SA. So, we can trade off SA processing by sending more MADs. I'm not sure which is worse. > With regards to "duplicating dispatching code on every node", rather > than duplication, think of this as "distributing event dispatching code > among the interested nodes". Thinking of it in these terms makes option > 2 stand out as more scalable. To provide the highest level of filtering at the SA, we need an interface based on Informinfo. Trying to reference count at that level would be difficult. (E.g. client 1 wants events for LIDs 2-25, client 2 LIDs 3-4, client 3 LIDs 2-25, client 4 LIDS 15-30, etc.) I'm not sure we need an interface this complex. It increases the processing requirements needed of the SA, and may increase the number of MADs that it needs to send to a given node. (Unless we start trying to be really clever with the registration.) I was thinking of letting clients register for a particular "class" of event, then dispatching the events among the registered clients. But I'm still uncertain about how to define event classes. Some expected usage models would be helpful. - Sean From weiny2 at llnl.gov Mon Oct 16 14:46:58 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 16 Oct 2006 14:46:58 -0700 Subject: [openib-general] OFED 1.1 RC7 In-Reply-To: <4525271E.8070000@dev.mellanox.co.il> References: <4525271E.8070000@dev.mellanox.co.il> Message-ID: <20061016144658.6a27b1f5.weiny2@llnl.gov> Sorry, I don't know much about git... I tried to "git" the module code for OFED 1.1 rc7 this using the following command. What am I doing wrong? 14:42:40 > git clone git://www.mellanox.co.il/~git/infinibandref fatal: unexpected EOF fetch-pack from 'git://www.mellanox.co.il/~git/infinibandref' failed. Also that is for the module code right? Thanks, Ira Weiny weiny2 at llnl.gov On Thu, 05 Oct 2006 17:39:10 +0200 "Aviram Gutman" wrote: > > Release details: > ================ > BUILD_ID: > OFED-1.1-rc7 > > openib-1.1 (REV=9725) > # User space > https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 > ref: refs/heads/ofed_1_1 > commit fde99a7a22e56d6aa90dae9db3d600755efcedb5 > From smaldone at cs.rutgers.edu Mon Oct 16 14:55:03 2006 From: smaldone at cs.rutgers.edu (Steve Smaldone) Date: Mon, 16 Oct 2006 17:55:03 -0400 Subject: [openib-general] uDAPL problem Message-ID: <4533FFB7.2020506@cs.rutgers.edu> Hi, I have been trying to run dapltest with trunk rev 9717 with linux kernel 2.6.18 and I get an error. The error and configuration is shown below. Basically, the rdma_cm device is not created under /dev/infiniband. I am wondering if this is a known problem and how to solve it. Thanks, Steve Smaldone $ ./dapltest -T S -D IB1 ... DAT Registry: dat_ia_openv (IB1,1:2,0) called DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so DAT Registry: dat_registry_add_provider (IB1,1:2,0) librdmacm: couldn't read ABI version. librdmacm: assuming: 2 libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/lib/infiniband CMA: unable to open /dev/infiniband/rdma_cm DT_cs_Server: Could not open IB1 (DAT_INTERNAL_ERROR ) DT_cs_Server (IB1): Exiting. DAT Registry: Stopped (dat_fini) My dat.conf: IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 "hora-1-ib0 0" "" ifconfig: ib0 Link encap:UNSPEC HWaddr 00-00-00-14-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.2.2.135 Bcast:10.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::202:c901:81e:90e1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:1 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:68 (68.0 b) lsmod: Module Size Used by ib_ucm 20612 0 ib_uverbs 40232 1 ib_ucm rdma_cm 33572 0 ib_cm 39824 2 ib_ucm,rdma_cm ib_addr 11524 1 rdma_cm ib_local_sa 15752 1 rdma_cm findex 8576 1 ib_local_sa ib_ipoib 51144 0 ib_multicast 15940 2 rdma_cm,ib_ipoib ib_sa 20004 5 rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast ib_umad 20016 0 ib_mthca 131236 0 ib_mad 42272 5 ib_cm,ib_local_sa,ib_sa,ib_umad,ib_mthca ib_core 52992 11 ib_ucm,ib_uverbs,rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast,ib_sa,ib_umad,ib_mthca,ib_mad udev rules: KERNEL="umad*", NAME="infiniband/%k" KERNEL="issm*", NAME="infiniband/%k" KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" /dev/infiniband: issm0 issm1 ucm0 umad0 umad1 uverbs0 From trimmer at silverstorm.com Mon Oct 16 15:02:34 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Mon, 16 Oct 2006 18:02:34 -0400 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: <4533FA88.8050105@ichips.intel.com> Message-ID: > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Monday, October 16, 2006 5:33 PM > To: Rimmer, Todd; Matt Leininger > Cc: openib > Subject: Re: [openib-general] [RFC] Notice/InformInfo event reporting > > Rimmer, Todd wrote: > > My recommendation is option 2. > > Thanks for the response. > > > In large fabrics the SA can be a bottleneck. It is best for an end node > > to register with the SA only for the events which are of actual interest > > to the end node. > > Which part of the SA is the bottleneck? Is it the sending of MADs, or the > processing of events to determine which end nodes are interested in the > event? Both can be a bottleneck in a big fabric. Since the SA needs to always determine which end nodes are registered for a given event, the fewer are registered the better. Even if all the hosts are registered, there will be other nodes (switches, TCAs, etc) which are not registered, so the SA will need to always check its list of who to send to. Since the notice is not a broadcast, it will need to send a separate packet to each end node. Each notice will then get a response from each end node which will need to be correlated to the outstanding notices so the SA can determine which notices need to be resent vs those which where acknowledged. If you consider a large fabric (say 2000+ nodes) and all the events which the SA can generate (at least 4: Gid in/out multicast in/out of service) that can be a big bursty load on the SA. Factor in the nodes responding to those requests (for example GID in service may trigger path record queries), and even more work occurs on the SA. Most HCAs don't optimize the GSI datapath, so data packet rates for SA packets is less than might be observed on UD or RC QPs. > > My thinking was that if events are rare, then having the SA simply forward > the > events to the end nodes saves processing time on the SA. So, we can trade > off > SA processing by sending more MADs. I'm not sure which is worse. In a functioning fabric, events will be rare. However its when you first boot the fabric, reboot the SM or other similar "start up" actions that things get real busy. > > > With regards to "duplicating dispatching code on every node", rather > > than duplication, think of this as "distributing event dispatching code > > among the interested nodes". Thinking of it in these terms makes option > > 2 stand out as more scalable. > > To provide the highest level of filtering at the SA, we need an interface > based > on Informinfo. Trying to reference count at that level would be > difficult. > (E.g. client 1 wants events for LIDs 2-25, client 2 LIDs 3-4, client 3 > LIDs > 2-25, client 4 LIDS 15-30, etc.) I'm not sure we need an interface this > complex. It increases the processing requirements needed of the SA, and > may > increase the number of MADs that it needs to send to a given node. > (Unless we > start trying to be really clever with the registration.) > > I was thinking of letting clients register for a particular "class" of > event, > then dispatching the events among the registered clients. But I'm still > uncertain about how to define event classes. > > Some expected usage models would be helpful. In my experience, few clients will filter by LID. For example a client interested in GID in service, would want to know about all LIDs. A client such as IPoIB would be interested in all multicast groups. So perhaps the registration with the SA should be for "all lids" and let the client filter by LID as needed. So my interpretation of option 2 is the end node registers once with the SA for "all lids" for the events which clients are interested in. Then the end node can filter appropriately (filtering at the client may be best). In general I have found that only a few clients will use events such as: IPoIb to manage multicast subscriptions (join as send only for new groups) and SA caches/replicas to keep their cache/replica synchronized. In the silverstorm stack we created an API for a client to subscribe to a notice. It allowed the client to specify: trap number, local HCA port subscription was applicable to (in case multi-port HCAs on different fabrics) and information for a callback to the client (client context void*, function). The callback provided the client context void*, the actual NOTICE from the SA and which HCA port it arrived on. The API in the stack dealt with all the issues of remaining subscribed (SA reregistraton, port disconnected/reconnected, etc) so the client merely subscribed, got notice callbacks and later unsubscribed. In this style API any LID based filtering would be done in the client itself. Todd Rimmer From rjwalsh at pathscale.com Mon Oct 16 15:03:19 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Mon, 16 Oct 2006 15:03:19 -0700 Subject: [openib-general] uDAPL problem In-Reply-To: <4533FFB7.2020506@cs.rutgers.edu> References: <4533FFB7.2020506@cs.rutgers.edu> Message-ID: <453401A7.6070906@pathscale.com> Steve Smaldone wrote: > Hi, > > I have been trying to run dapltest with trunk rev 9717 with linux kernel > 2.6.18 and I get an error. The error and configuration is shown below. > Basically, the rdma_cm device is not created under /dev/infiniband. I > am wondering if this is a known problem and how to solve it. You need to load the rdma_ucm module, too. Regards, Robert. From smaldone at cs.rutgers.edu Mon Oct 16 15:01:30 2006 From: smaldone at cs.rutgers.edu (Steve Smaldone) Date: Mon, 16 Oct 2006 18:01:30 -0400 Subject: [openib-general] uDAPL problem In-Reply-To: <4533FFB7.2020506@cs.rutgers.edu> References: <4533FFB7.2020506@cs.rutgers.edu> Message-ID: <4534013A.5000900@cs.rutgers.edu> Hi, Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm device appears. However, it now fails with the following: $ ./dapltest -T S -D IB1 ... DAT Registry: dat_ia_openv (IB1,1:2,0) called DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so DAT Registry: dat_registry_add_provider (IB1,1:2,0) libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/lib/infiniband libibverbs: Warning: no userspace device-specific driver found for uverbs0 driver search path: /usr/local/lib/infiniband DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) DT_cs_Server (IB1): Exiting. DAT Registry: Stopped (dat_fini) The configuration remains the same otherwise. Thanks, Steve Smaldone Steve Smaldone wrote: > Hi, > > I have been trying to run dapltest with trunk rev 9717 with linux > kernel 2.6.18 and I get an error. The error and configuration is > shown below. > Basically, the rdma_cm device is not created under /dev/infiniband. I > am wondering if this is a known problem and how to solve it. > > Thanks, > Steve Smaldone > > $ ./dapltest -T S -D IB1 > ... > DAT Registry: dat_ia_openv (IB1,1:2,0) called > DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so > DAT Registry: dat_registry_add_provider (IB1,1:2,0) > librdmacm: couldn't read ABI version. > librdmacm: assuming: 2 > libibverbs: Warning: no userspace device-specific driver found for > uverbs0 > driver search path: /usr/local/lib/infiniband > CMA: unable to open /dev/infiniband/rdma_cm > DT_cs_Server: Could not open IB1 (DAT_INTERNAL_ERROR ) > DT_cs_Server (IB1): Exiting. > DAT Registry: Stopped (dat_fini) > > My dat.conf: > IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 > "hora-1-ib0 0" "" > > ifconfig: > ib0 Link encap:UNSPEC HWaddr > 00-00-00-14-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.2.2.135 Bcast:10.255.255.255 Mask:255.0.0.0 > inet6 addr: fe80::202:c901:81e:90e1/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1 errors:0 dropped:5 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:68 (68.0 b) > > lsmod: > Module Size Used by > ib_ucm 20612 0 > ib_uverbs 40232 1 ib_ucm > rdma_cm 33572 0 > ib_cm 39824 2 ib_ucm,rdma_cm > ib_addr 11524 1 rdma_cm > ib_local_sa 15752 1 rdma_cm > findex 8576 1 ib_local_sa > ib_ipoib 51144 0 > ib_multicast 15940 2 rdma_cm,ib_ipoib > ib_sa 20004 5 > rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast > ib_umad 20016 0 > ib_mthca 131236 0 > ib_mad 42272 5 ib_cm,ib_local_sa,ib_sa,ib_umad,ib_mthca > ib_core 52992 11 > ib_ucm,ib_uverbs,rdma_cm,ib_cm,ib_local_sa,ib_ipoib,ib_multicast,ib_sa,ib_umad,ib_mthca,ib_mad > > > udev rules: > KERNEL="umad*", NAME="infiniband/%k" > KERNEL="issm*", NAME="infiniband/%k" > KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" > KERNEL="ucm*", NAME="infiniband/%k", MODE="0666" > KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666" > > /dev/infiniband: > issm0 issm1 ucm0 umad0 umad1 uverbs0 > > From sashak at voltaire.com Mon Oct 16 15:21:06 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 17 Oct 2006 00:21:06 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 In-Reply-To: <20061016144658.6a27b1f5.weiny2@llnl.gov> References: <4525271E.8070000@dev.mellanox.co.il> <20061016144658.6a27b1f5.weiny2@llnl.gov> Message-ID: <20061016222106.GE21810@sashak.voltaire.com> On 14:46 Mon 16 Oct , Ira Weiny wrote: > Sorry, I don't know much about git... > > I tried to "git" the module code for OFED 1.1 rc7 this using the following > command. What am I doing wrong? > > 14:42:40 > git clone git://www.mellanox.co.il/~git/infinibandref > fatal: unexpected EOF > fetch-pack from 'git://www.mellanox.co.il/~git/infinibandref' failed. I guess this should be: git clone git://www.mellanox.co.il/~git/infiniband , and then after clone: cd infiniband git checkout ofed_1_1 Sasha > > Also that is for the module code right? > > Thanks, > Ira Weiny > weiny2 at llnl.gov > > > On Thu, 05 Oct 2006 17:39:10 +0200 > "Aviram Gutman" wrote: > > > > > Release details: > > ================ > > BUILD_ID: > > OFED-1.1-rc7 > > > > openib-1.1 (REV=9725) > > # User space > > https://openib.org/svn/gen2/branches/1.1/src/userspace > > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 > > ref: refs/heads/ofed_1_1 > > commit fde99a7a22e56d6aa90dae9db3d600755efcedb5 > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From kliteyn at dev.mellanox.co.il Mon Oct 16 15:15:06 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 17 Oct 2006 00:15:06 +0200 Subject: [openib-general] [PATCH] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <20061016190436.GA21672@sashak.voltaire.com> References: <45335546.5090909@dev.mellanox.co.il> <20061016190436.GA21672@sashak.voltaire.com> Message-ID: <4534046A.5090205@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 11:47 Mon 16 Oct , Yevgeny Kliteynik wrote: >> Hi Hal >> >> Leaving OSM_LOG_DIR or OSM_CACHE_DIR environment variables >> empty will cause OSM to wright log or cache files to / >> since OSM runs as root process. >> >> Although one might say that this is just a question of point >> of view, I really think that to prevent root directory trashing >> (as I did by mistake on my machine), empty variable should be >> treated as if it is not set. >> If a user wants the SM to write something to /, he should specify >> this explicitly by setting "/" as an env. variable value. >> >> -- >> Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik >> >> Index: opensm/osm_db_files.c >> =================================================================== >> --- opensm/osm_db_files.c (revision 9820) >> +++ opensm/osm_db_files.c (working copy) >> @@ -182,6 +182,8 @@ osm_db_init( >> CL_ASSERT( p_db_imp != NULL); >> >> p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); >> + if (p_db_imp->db_dir_name && (strlen(p_db_imp->db_dir_name) == 0)) >> + p_db_imp->db_dir_name = NULL; >> if ( p_db_imp->db_dir_name == NULL ) >> p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; > > Would be better something like: > > if (p_db_imp->db_dir_name == NULL || !*p_db_imp->dir_name) > p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; > > This eliminates invocation of strlen() which will return !=0 in most of > cases. Sure, I thought about it right after I saw that the patch was applied :) -- Yevgeny > > The same is below. > > Sasha > >> >> Index: opensm/osm_subnet.c >> =================================================================== >> --- opensm/osm_subnet.c (revision 9820) >> +++ opensm/osm_subnet.c (working copy) >> @@ -472,6 +472,8 @@ osm_subn_set_default_opt( >> p_opt->honor_guid2lid_file = FALSE; >> >> p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); >> + if (p_opt->dump_files_dir && (strlen(p_opt->dump_files_dir) == 0)) >> + p_opt->dump_files_dir = NULL; >> if (!p_opt->dump_files_dir) >> p_opt->dump_files_dir = OSM_DEFAULT_TMP_DIR; >> >> @@ -719,6 +721,8 @@ osm_subn_rescan_conf_file( >> char *p_key, *p_val ,*p_last; >> >> /* try to open the options file from the cache dir */ >> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) >> + p_cache_dir = NULL; >> if (! p_cache_dir) >> p_cache_dir = OSM_DEFAULT_CACHE_DIR; >> >> @@ -770,6 +774,8 @@ osm_subn_parse_conf_file( >> char *p_key, *p_val ,*p_last; >> >> /* try to open the options file from the cache dir */ >> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) >> + p_cache_dir = NULL; >> if (! p_cache_dir) >> p_cache_dir = OSM_DEFAULT_CACHE_DIR; >> >> @@ -1002,6 +1008,8 @@ osm_subn_write_conf_file( >> FILE *opts_file; >> >> /* try to open the options file from the cache dir */ >> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) >> + p_cache_dir = NULL; >> if (! p_cache_dir) >> p_cache_dir = OSM_DEFAULT_CACHE_DIR; >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >> > From halr at voltaire.com Mon Oct 16 15:14:19 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Oct 2006 18:14:19 -0400 Subject: [openib-general] [PATCH] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <4534046A.5090205@dev.mellanox.co.il> References: <45335546.5090909@dev.mellanox.co.il> <20061016190436.GA21672@sashak.voltaire.com> <4534046A.5090205@dev.mellanox.co.il> Message-ID: <1161036857.32093.361017.camel@hal.voltaire.com> On Mon, 2006-10-16 at 18:15, Yevgeny Kliteynik wrote: > Sasha Khapyorsky wrote: > > On 11:47 Mon 16 Oct , Yevgeny Kliteynik wrote: > >> Hi Hal > >> > >> Leaving OSM_LOG_DIR or OSM_CACHE_DIR environment variables > >> empty will cause OSM to wright log or cache files to / > >> since OSM runs as root process. > >> > >> Although one might say that this is just a question of point > >> of view, I really think that to prevent root directory trashing > >> (as I did by mistake on my machine), empty variable should be > >> treated as if it is not set. > >> If a user wants the SM to write something to /, he should specify > >> this explicitly by setting "/" as an env. variable value. > >> > >> -- > >> Yevgeny > >> > >> Signed-off-by: Yevgeny Kliteynik > >> > >> Index: opensm/osm_db_files.c > >> =================================================================== > >> --- opensm/osm_db_files.c (revision 9820) > >> +++ opensm/osm_db_files.c (working copy) > >> @@ -182,6 +182,8 @@ osm_db_init( > >> CL_ASSERT( p_db_imp != NULL); > >> > >> p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); > >> + if (p_db_imp->db_dir_name && (strlen(p_db_imp->db_dir_name) == 0)) > >> + p_db_imp->db_dir_name = NULL; > >> if ( p_db_imp->db_dir_name == NULL ) > >> p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; > > > > Would be better something like: > > > > if (p_db_imp->db_dir_name == NULL || !*p_db_imp->dir_name) > > p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; > > > > This eliminates invocation of strlen() which will return !=0 in most of > > cases. > > Sure, I thought about it right after I saw that the patch was applied :) So will you supply another patch with this approach ? -- Hal > > -- > Yevgeny > > > > > The same is below. > > > > Sasha > > > >> > >> Index: opensm/osm_subnet.c > >> =================================================================== > >> --- opensm/osm_subnet.c (revision 9820) > >> +++ opensm/osm_subnet.c (working copy) > >> @@ -472,6 +472,8 @@ osm_subn_set_default_opt( > >> p_opt->honor_guid2lid_file = FALSE; > >> > >> p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); > >> + if (p_opt->dump_files_dir && (strlen(p_opt->dump_files_dir) == 0)) > >> + p_opt->dump_files_dir = NULL; > >> if (!p_opt->dump_files_dir) > >> p_opt->dump_files_dir = OSM_DEFAULT_TMP_DIR; > >> > >> @@ -719,6 +721,8 @@ osm_subn_rescan_conf_file( > >> char *p_key, *p_val ,*p_last; > >> > >> /* try to open the options file from the cache dir */ > >> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) > >> + p_cache_dir = NULL; > >> if (! p_cache_dir) > >> p_cache_dir = OSM_DEFAULT_CACHE_DIR; > >> > >> @@ -770,6 +774,8 @@ osm_subn_parse_conf_file( > >> char *p_key, *p_val ,*p_last; > >> > >> /* try to open the options file from the cache dir */ > >> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) > >> + p_cache_dir = NULL; > >> if (! p_cache_dir) > >> p_cache_dir = OSM_DEFAULT_CACHE_DIR; > >> > >> @@ -1002,6 +1008,8 @@ osm_subn_write_conf_file( > >> FILE *opts_file; > >> > >> /* try to open the options file from the cache dir */ > >> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) > >> + p_cache_dir = NULL; > >> if (! p_cache_dir) > >> p_cache_dir = OSM_DEFAULT_CACHE_DIR; > >> > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >> > > From rdreier at cisco.com Mon Oct 16 15:19:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 15:19:53 -0700 Subject: [openib-general] IPOIB NAPI In-Reply-To: <20061016200533.GD20009@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 16 Oct 2006 22:05:33 +0200") References: <1160929635.5389.2.camel@localhost> <20061016200533.GD20009@mellanox.co.il> Message-ID: > A small optimization: > > if (missed_event && netif_rx_reschedule(dev, 0)) > > should be, I think > > if (unlikely(missed_event) && netif_rx_reschedule(dev, 0)) Yes, makes sense. I updated my ipoib-napi branch with this. > An API idea: > how about instead testing missed_events, we add a flag: > > IB_CQ_TEST (or a longer name IB_CQ_REPORT_MISSED_EVENTS?) > and change ib_req_notify_cq to return int which will keep > the missed_events value, only if this flag is set? > > This has 2 advatages > - Less churn updating all users to new API - they just ignore return value - > and still almost no overhead for them as they don't set IB_CQ_TEST > - For all users we have to push less values on stack - note compiler can't > get rid of them as we are calling function through a pointer > - For users that do > missed_events = ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP | IB_CQ_TEST) > we get the result in register. Yes, I like this. So ib_req_notify_cq() gets a return value that is negative if an error occurred, 0 if everything is fine, or positive if a missed event might have happened. I think I prefer the longer name IB_CQ_REPORT_MISSED_EVENTS -- at least there's a chance at guessing what it means even if you don't read the documentation. > Want to see hw a patch like this will look? That would be great. Most convenient would be a patch on top of the first "missed event" patch in my ipoib-napi branch, although a replacement patch for that would be fine too. Otherwise if you're busy I'll do it myself in a few days -- I have a few other things I want to get to first. Thanks, Roland From kliteyn at dev.mellanox.co.il Mon Oct 16 15:24:21 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 17 Oct 2006 00:24:21 +0200 Subject: [openib-general] [PATCH] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <1161036857.32093.361017.camel@hal.voltaire.com> References: <45335546.5090909@dev.mellanox.co.il> <20061016190436.GA21672@sashak.voltaire.com> <4534046A.5090205@dev.mellanox.co.il> <1161036857.32093.361017.camel@hal.voltaire.com> Message-ID: <45340695.2080202@dev.mellanox.co.il> Hal Rosenstock wrote: > On Mon, 2006-10-16 at 18:15, Yevgeny Kliteynik wrote: >> Sasha Khapyorsky wrote: >>> On 11:47 Mon 16 Oct , Yevgeny Kliteynik wrote: >>>> Hi Hal >>>> >>>> Leaving OSM_LOG_DIR or OSM_CACHE_DIR environment variables >>>> empty will cause OSM to wright log or cache files to / >>>> since OSM runs as root process. >>>> >>>> Although one might say that this is just a question of point >>>> of view, I really think that to prevent root directory trashing >>>> (as I did by mistake on my machine), empty variable should be >>>> treated as if it is not set. >>>> If a user wants the SM to write something to /, he should specify >>>> this explicitly by setting "/" as an env. variable value. >>>> >>>> -- >>>> Yevgeny >>>> >>>> Signed-off-by: Yevgeny Kliteynik >>>> >>>> Index: opensm/osm_db_files.c >>>> =================================================================== >>>> --- opensm/osm_db_files.c (revision 9820) >>>> +++ opensm/osm_db_files.c (working copy) >>>> @@ -182,6 +182,8 @@ osm_db_init( >>>> CL_ASSERT( p_db_imp != NULL); >>>> >>>> p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); >>>> + if (p_db_imp->db_dir_name && (strlen(p_db_imp->db_dir_name) == 0)) >>>> + p_db_imp->db_dir_name = NULL; >>>> if ( p_db_imp->db_dir_name == NULL ) >>>> p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; >>> Would be better something like: >>> >>> if (p_db_imp->db_dir_name == NULL || !*p_db_imp->dir_name) >>> p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; >>> >>> This eliminates invocation of strlen() which will return !=0 in most of >>> cases. >> Sure, I thought about it right after I saw that the patch was applied :) > > So will you supply another patch with this approach ? > Yes -- Yevgeny > -- Hal > >> -- >> Yevgeny >> >>> The same is below. >>> >>> Sasha >>> >>>> >>>> Index: opensm/osm_subnet.c >>>> =================================================================== >>>> --- opensm/osm_subnet.c (revision 9820) >>>> +++ opensm/osm_subnet.c (working copy) >>>> @@ -472,6 +472,8 @@ osm_subn_set_default_opt( >>>> p_opt->honor_guid2lid_file = FALSE; >>>> >>>> p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); >>>> + if (p_opt->dump_files_dir && (strlen(p_opt->dump_files_dir) == 0)) >>>> + p_opt->dump_files_dir = NULL; >>>> if (!p_opt->dump_files_dir) >>>> p_opt->dump_files_dir = OSM_DEFAULT_TMP_DIR; >>>> >>>> @@ -719,6 +721,8 @@ osm_subn_rescan_conf_file( >>>> char *p_key, *p_val ,*p_last; >>>> >>>> /* try to open the options file from the cache dir */ >>>> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) >>>> + p_cache_dir = NULL; >>>> if (! p_cache_dir) >>>> p_cache_dir = OSM_DEFAULT_CACHE_DIR; >>>> >>>> @@ -770,6 +774,8 @@ osm_subn_parse_conf_file( >>>> char *p_key, *p_val ,*p_last; >>>> >>>> /* try to open the options file from the cache dir */ >>>> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) >>>> + p_cache_dir = NULL; >>>> if (! p_cache_dir) >>>> p_cache_dir = OSM_DEFAULT_CACHE_DIR; >>>> >>>> @@ -1002,6 +1008,8 @@ osm_subn_write_conf_file( >>>> FILE *opts_file; >>>> >>>> /* try to open the options file from the cache dir */ >>>> + if (p_cache_dir && (strlen(p_cache_dir) == 0)) >>>> + p_cache_dir = NULL; >>>> if (! p_cache_dir) >>>> p_cache_dir = OSM_DEFAULT_CACHE_DIR; >>>> >>>> >>>> _______________________________________________ >>>> openib-general mailing list >>>> openib-general at openib.org >>>> http://openib.org/mailman/listinfo/openib-general >>>> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>>> > From rdreier at cisco.com Mon Oct 16 15:29:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 15:29:00 -0700 Subject: [openib-general] IPOIB NAPI In-Reply-To: (Shirley Ma's message of "Mon, 16 Oct 2006 12:03:00 -0700") References: Message-ID: > Don't know why I have trouble to get this patch from your git tree. Do you > mind to post this patch here so I can test the performance over ehca? No problem, I should post it anyway so people can quote from the patch more easily. I'm very interested in hearing about ehca performance with this (both with and without the "EHCA scaling" config option I guess) - R. From rdreier at cisco.com Mon Oct 16 15:32:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 15:32:58 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: (Roland Dreier's message of "Mon, 16 Oct 2006 15:29:00 -0700") References: Message-ID: The semantics defined by the InfiniBand specification say that completion events are only generated when a completions is added to a completion queue (CQ) after completion notification is requested. In other words, this means that the following race is possible: while (CQ is not empty) ib_poll_cq(CQ); // new completion is added after while loop is exited ib_req_notify_cq(CQ); // no event is generated for the existing completion To close this race, the IB spec recommends doing another poll of the CQ after requesting notification. However, it is not always possible to arrange code this way (for example, we have found that NAPI for IPoIB cannot poll after requesting notification). Also, some hardware (eg Mellanox HCAs) actually will generate an event for completions added before the call to ib_req_notify_cq() -- which is allowed by the spec, since there's no way for any upper-layer consumer to know exactly when a completion was really added -- so the extra poll of the CQ is just a waste. Motivated by this, we add a new parameter "maybe_missed_event" to ib_req_notify_cq() so that it can return a hint about whether the a completion may have been added before the request for notification. If the hint returned is 0, then the consumer knows that it is safe to wait for another event; otherwise, the consumer needs to poll the CQ again to make sure no completions are already there. Signed-off-by: Roland Dreier --- drivers/infiniband/core/mad.c | 4 ++-- drivers/infiniband/core/uverbs_cmd.c | 3 ++- drivers/infiniband/hw/amso1100/c2.h | 3 ++- drivers/infiniband/hw/amso1100/c2_cq.c | 10 +++++++++- drivers/infiniband/hw/ehca/ehca_reqs.c | 10 +++++++++- drivers/infiniband/hw/ehca/ipz_pt_fn.h | 8 ++++++++ drivers/infiniband/hw/ipath/ipath_cq.c | 8 +++++++- drivers/infiniband/hw/ipath/ipath_verbs.h | 3 ++- drivers/infiniband/hw/mthca/mthca_cq.c | 22 ++++++++++++++++++++-- drivers/infiniband/hw/mthca/mthca_dev.h | 6 ++++-- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 2 +- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 2 +- drivers/infiniband/ulp/iser/iser_verbs.c | 4 ++-- drivers/infiniband/ulp/srp/ib_srp.c | 4 ++-- include/rdma/ib_verbs.h | 16 +++++++++++++--- 15 files changed, 84 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 493f4c6..5c4e037 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -2204,7 +2204,7 @@ static void ib_mad_completion_handler(vo struct ib_wc wc; port_priv = (struct ib_mad_port_private *)data; - ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); + ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP, NULL); while (ib_poll_cq(port_priv->cq, 1, &wc) == 1) { if (wc.status == IB_WC_SUCCESS) { @@ -2646,7 +2646,7 @@ static int ib_mad_port_start(struct ib_m } } - ret = ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP); + ret = ib_req_notify_cq(port_priv->cq, IB_CQ_NEXT_COMP, NULL); if (ret) { printk(KERN_ERR PFX "Failed to request completion " "notification: %d\n", ret); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index b72c7f6..8aaeb18 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -969,7 +969,8 @@ ssize_t ib_uverbs_req_notify_cq(struct i return -EINVAL; ib_req_notify_cq(cq, cmd.solicited_only ? - IB_CQ_SOLICITED : IB_CQ_NEXT_COMP); + IB_CQ_SOLICITED : IB_CQ_NEXT_COMP, + NULL); put_cq_read(cq); diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h index 1b17dcd..f37b5f4 100644 --- a/drivers/infiniband/hw/amso1100/c2.h +++ b/drivers/infiniband/hw/amso1100/c2.h @@ -519,7 +519,8 @@ extern void c2_free_cq(struct c2_dev *c2 extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index); extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index); extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + int *maybe_missed_event); /* CM */ extern int c2_llp_connect(struct iw_cm_id *cm_id, diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 9d7bcc5..d491abf 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -217,10 +217,12 @@ int c2_poll_cq(struct ib_cq *ibcq, int n return npolled; } -int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + int *maybe_missed_event) { struct c2_mq_shared __iomem *shared; struct c2_cq *cq; + unsigned long flags; cq = to_c2cq(ibcq); shared = cq->mq.peer; @@ -241,6 +243,12 @@ int c2_arm_cq(struct ib_cq *ibcq, enum i */ readb(&shared->armed); + if (maybe_missed_event) { + spin_lock_irqsave(&cq->lock, flags); + *maybe_missed_event = !c2_mq_empty(&cq->mq); + spin_unlock_irqrestore(&cq->lock, flags); + } + return 0; } diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index b46bda1..acf08e6 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -634,9 +634,11 @@ poll_cq_exit0: return ret; } -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify, + int *maybe_missed_event) { struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); + unsigned long spl_flags; switch (cq_notify) { case IB_CQ_SOLICITED: @@ -649,5 +651,11 @@ int ehca_req_notify_cq(struct ib_cq *cq, return -EINVAL; } + if (maybe_missed_event) { + spin_lock_irqsave(&my_cq->spinlock, spl_flags); + *maybe_missed_event = ipz_qeit_is_valid(my_cq->ipz_queue); + spin_unlock_irqrestore(&my_cq->spinlock, spl_flags); + } + return 0; } diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index 2f13509..0bebb34 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -140,6 +140,14 @@ static inline void *ipz_qeit_get_inc_val return cqe; } +static inline int ipz_qeit_is_valid(struct ipz_queue *queue) +{ + struct ehca_cqe *cqe = ipz_qeit_get(queue); + u32 cqe_flags = cqe->cqe_flags; + + return cqe_flags >> 7 == queue->toggle_state & 1; +} + /* * returns and resets Queue Entry iterator * returns address (kv) of first Queue Entry diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 87462e0..c5d300f 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -307,13 +307,15 @@ int ipath_destroy_cq(struct ib_cq *ibcq) * ipath_req_notify_cq - change the notification type for a completion queue * @ibcq: the completion queue * @notify: the type of notification to request + * @maybe_missed_event: missed event hint * * Returns 0 for success. * * This may be called from interrupt context. Also called by * ib_req_notify_cq() in the generic verbs code. */ -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + int *maybe_missed_event) { struct ipath_cq *cq = to_icq(ibcq); unsigned long flags; @@ -325,6 +327,10 @@ int ipath_req_notify_cq(struct ib_cq *ib */ if (cq->notify != IB_CQ_NEXT_COMP) cq->notify = notify; + + if (maybe_missed_event) + *maybe_missed_event = cq->queue->head != cq->queue->tail; + spin_unlock_irqrestore(&cq->lock, flags); return 0; } diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 8039f6e..e36411d 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -716,7 +716,8 @@ struct ib_cq *ipath_create_cq(struct ib_ int ipath_destroy_cq(struct ib_cq *ibcq); -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + int *maybe_missed_event); int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index e393681..cb49aad 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -716,10 +716,19 @@ repoll: return err == 0 || err == -EAGAIN ? npolled : err; } -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify) +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, + int *missed_event) { __be32 doorbell[2]; + /* + * Mellanox devices trigger an event if an unpolled CQE is + * already in the CQ when the CQ is armed, so there is no + * possibility of missing an event. + */ + if (missed_event) + *missed_event = 0; + doorbell[0] = cpu_to_be32((notify == IB_CQ_SOLICITED ? MTHCA_TAVOR_CQ_DB_REQ_NOT_SOL : MTHCA_TAVOR_CQ_DB_REQ_NOT) | @@ -733,13 +742,22 @@ int mthca_tavor_arm_cq(struct ib_cq *cq, return 0; } -int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, + int *missed_event) { struct mthca_cq *cq = to_mcq(ibcq); __be32 doorbell[2]; u32 sn; __be32 ci; + /* + * Mellanox devices trigger an event if an unpolled CQE is + * already in the CQ when the CQ is armed, so there is no + * possibility of missing an event. + */ + if (missed_event) + *missed_event = 0; + sn = cq->arm_sn & 3; ci = cpu_to_be32(cq->cons_index); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index fe5cecf..0a93a3e 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -493,8 +493,10 @@ void mthca_unmap_eq_icm(struct mthca_dev int mthca_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify); -int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify); +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, + int *missed_event); +int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, + int *missed_event); int mthca_init_cq(struct mthca_dev *dev, int nent, struct mthca_ucontext *ctx, u32 pdn, struct mthca_cq *cq); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 8bf5e9e..6166cb4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -300,7 +300,7 @@ void ipoib_ib_completion(struct ib_cq *c struct ipoib_dev_priv *priv = netdev_priv(dev); int n, i; - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP, NULL); do { n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); for (i = 0; i < n; ++i) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 7b717c6..e65ef47 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -181,7 +181,7 @@ int ipoib_transport_dev_init(struct net_ goto out_free_pd; } - if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) + if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP, NULL)) goto out_free_cq; priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 18a0000..289c3a2 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -81,7 +81,7 @@ static int iser_create_device_ib_res(str if (IS_ERR(device->cq)) goto cq_err; - if (ib_req_notify_cq(device->cq, IB_CQ_NEXT_COMP)) + if (ib_req_notify_cq(device->cq, IB_CQ_NEXT_COMP, NULL)) goto cq_arm_err; tasklet_init(&device->cq_tasklet, @@ -820,7 +820,7 @@ static void iser_cq_tasklet_fn(unsigned } /* #warning "it is assumed here that arming CQ only once its empty" * * " would not cause interrupts to be missed" */ - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP, NULL); } static void iser_cq_callback(struct ib_cq *cq, void *cq_context) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 4b09147..9aa2f66 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -205,7 +205,7 @@ static int srp_create_target_ib(struct s goto out; } - ib_req_notify_cq(target->cq, IB_CQ_NEXT_COMP); + ib_req_notify_cq(target->cq, IB_CQ_NEXT_COMP, NULL); init_attr->event_handler = srp_qp_event; init_attr->cap.max_send_wr = SRP_SQ_SIZE; @@ -858,7 +858,7 @@ static void srp_completion(struct ib_cq struct srp_target_port *target = target_ptr; struct ib_wc wc; - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP, NULL); while (ib_poll_cq(cq, 1, &wc) > 0) { if (wc.status) { printk(KERN_ERR PFX "failed %s status %d\n", diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8eacc35..ed38655 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -941,7 +941,8 @@ struct ib_device { struct ib_wc *wc); int (*peek_cq)(struct ib_cq *cq, int wc_cnt); int (*req_notify_cq)(struct ib_cq *cq, - enum ib_cq_notify cq_notify); + enum ib_cq_notify cq_notify, + int *maybe_missed_event); int (*req_ncomp_notif)(struct ib_cq *cq, int wc_cnt); struct ib_mr * (*get_dma_mr)(struct ib_pd *pd, @@ -1369,11 +1370,20 @@ int ib_peek_cq(struct ib_cq *cq, int wc_ * @cq_notify: If set to %IB_CQ_SOLICITED, completion notification will * occur on the next solicited event. If set to %IB_CQ_NEXT_COMP, * notification will occur on the next completion. + * @maybe_missed_event: If non-NULL, will be used to return an integer + * hint. If this hint is 0, then it is guaranteed that any work + * completions added to the CQ since the last call to ib_poll_cq() + * will trigger a completion notification. If the returned hint is + * non-zero, then it is possible (but not guaranteed) that a work + * completion has been added to the CQ since the last call to + * ib_poll_cq(), but that no notification event will occur for this + * work completion. */ static inline int ib_req_notify_cq(struct ib_cq *cq, - enum ib_cq_notify cq_notify) + enum ib_cq_notify cq_notify, + int *maybe_missed_event) { - return cq->device->req_notify_cq(cq, cq_notify); + return cq->device->req_notify_cq(cq, cq_notify, maybe_missed_event); } /** -- 1.4.1 From rdreier at cisco.com Mon Oct 16 15:36:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 15:36:38 -0700 Subject: [openib-general] [PATCH/RFC 2/2] IPoIB: Convert to NAPI In-Reply-To: (Roland Dreier's message of "Mon, 16 Oct 2006 15:32:58 -0700") References: Message-ID: Convert the IP-over-InfiniBand network device driver over to using NAPI to handle all completions (both receive and send). Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib.h | 1 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 77 +++++++++++++++++++++++------ drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 + 3 files changed, 64 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 0b8a79d..025cef2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -239,6 +239,7 @@ extern struct workqueue_struct *ipoib_wo /* functions */ +int ipoib_poll(struct net_device *dev, int *budget); void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 6166cb4..d0b5720 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -286,26 +286,57 @@ static void ipoib_ib_handle_tx_wc(struct wc->status, wr_id, wc->vendor_err); } -static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) +int ipoib_poll(struct net_device *dev, int *budget) { - if (wc->wr_id & IPOIB_OP_RECV) - ipoib_ib_handle_rx_wc(dev, wc); - else - ipoib_ib_handle_tx_wc(dev, wc); + struct ipoib_dev_priv *priv = netdev_priv(dev); + int max = min(*budget, dev->quota); + int done; + int t; + int empty; + int missed_event; + int n, i; + +repoll: + done = 0; + empty = 0; + + while (max) { + t = min(IPOIB_NUM_WC, max); + n = ib_poll_cq(priv->cq, t, priv->ibwc); + + for (i = 0; i < n; ++i) { + if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) { + ++done; + --max; + ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); + } else + ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); + } + + if (n != t) { + empty = 1; + break; + } + } + + dev->quota -= done; + *budget -= done; + + if (empty) { + netif_rx_complete(dev); + ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP, &missed_event); + if (unlikely(missed_event) && netif_rx_reschedule(dev, 0)) + goto repoll; + + return 0; + } + + return 1; } void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) { - struct net_device *dev = (struct net_device *) dev_ptr; - struct ipoib_dev_priv *priv = netdev_priv(dev); - int n, i; - - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP, NULL); - do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); - for (i = 0; i < n; ++i) - ipoib_ib_handle_wc(dev, priv->ibwc + i); - } while (n == IPOIB_NUM_WC); + netif_rx_schedule(dev_ptr); } static inline int post_send(struct ipoib_dev_priv *priv, @@ -510,9 +541,10 @@ int ipoib_ib_dev_stop(struct net_device struct ib_qp_attr qp_attr; unsigned long begin; struct ipoib_tx_buf *tx_req; - int i; + int i, n; clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); + netif_poll_disable(dev); /* * Move our QP to the error state and then reinitialize in @@ -559,6 +591,16 @@ int ipoib_ib_dev_stop(struct net_device goto timeout; } + do { + n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc); + for (i = 0; i < n; ++i) { + if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) + ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); + else + ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); + } + } while (n == IPOIB_NUM_WC); + msleep(1); } @@ -587,6 +629,9 @@ timeout: msleep(1); } + netif_poll_enable(dev); + ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP, NULL); + return 0; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1eaf00e..d90aafb 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -896,6 +896,8 @@ static void ipoib_setup(struct net_devic dev->hard_header = ipoib_hard_header; dev->set_multicast_list = ipoib_set_mcast_list; dev->neigh_setup = ipoib_neigh_setup_dev; + dev->poll = ipoib_poll; + dev->weight = 100; dev->watchdog_timeo = HZ; -- 1.4.1 From kliteyn at dev.mellanox.co.il Mon Oct 16 15:39:55 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 17 Oct 2006 00:39:55 +0200 Subject: [openib-general] [PATCHv2] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <1161036857.32093.361017.camel@hal.voltaire.com> References: <45335546.5090909@dev.mellanox.co.il> <20061016190436.GA21672@sashak.voltaire.com> <4534046A.5090205@dev.mellanox.co.il> <1161036857.32093.361017.camel@hal.voltaire.com> Message-ID: <45340A3B.1000709@dev.mellanox.co.il> Hi Hal. [snip] > > So will you supply another patch with this approach ? > > -- Hal Here it is. -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_db_files.c =================================================================== --- opensm/osm_db_files.c (revision 9827) +++ opensm/osm_db_files.c (working copy) @@ -182,9 +182,7 @@ osm_db_init( CL_ASSERT( p_db_imp != NULL); p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); - if (p_db_imp->db_dir_name && (strlen(p_db_imp->db_dir_name) == 0)) - p_db_imp->db_dir_name = NULL; - if ( p_db_imp->db_dir_name == NULL ) + if (!p_db_imp->db_dir_name || !(*p_db_imp->db_dir_name)) p_db_imp->db_dir_name = OSM_DEFAULT_CACHE_DIR; /* Create the directory if it doesn't exist */ Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9827) +++ opensm/osm_subnet.c (working copy) @@ -472,9 +472,7 @@ osm_subn_set_default_opt( p_opt->honor_guid2lid_file = FALSE; p_opt->dump_files_dir = getenv("OSM_TMP_DIR"); - if (p_opt->dump_files_dir && (strlen(p_opt->dump_files_dir) == 0)) - p_opt->dump_files_dir = NULL; - if (!p_opt->dump_files_dir) + if (!p_opt->dump_files_dir || !(*p_opt->dump_files_dir)) p_opt->dump_files_dir = OSM_DEFAULT_TMP_DIR; p_opt->log_file = OSM_DEFAULT_LOG_FILE; @@ -721,9 +719,7 @@ osm_subn_rescan_conf_file( char *p_key, *p_val ,*p_last; /* try to open the options file from the cache dir */ - if (p_cache_dir && (strlen(p_cache_dir) == 0)) - p_cache_dir = NULL; - if (! p_cache_dir) + if (!p_cache_dir || !(*p_cache_dir)) p_cache_dir = OSM_DEFAULT_CACHE_DIR; strcpy(file_name, p_cache_dir); @@ -774,9 +770,7 @@ osm_subn_parse_conf_file( char *p_key, *p_val ,*p_last; /* try to open the options file from the cache dir */ - if (p_cache_dir && (strlen(p_cache_dir) == 0)) - p_cache_dir = NULL; - if (! p_cache_dir) + if (!p_cache_dir || !(*p_cache_dir)) p_cache_dir = OSM_DEFAULT_CACHE_DIR; strcpy(file_name, p_cache_dir); @@ -1008,9 +1002,7 @@ osm_subn_write_conf_file( FILE *opts_file; /* try to open the options file from the cache dir */ - if (p_cache_dir && (strlen(p_cache_dir) == 0)) - p_cache_dir = NULL; - if (! p_cache_dir) + if (!p_cache_dir || !(*p_cache_dir)) p_cache_dir = OSM_DEFAULT_CACHE_DIR; strcpy(file_name, p_cache_dir); From mshefty at ichips.intel.com Mon Oct 16 15:56:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 16 Oct 2006 15:56:41 -0700 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: References: Message-ID: <45340E29.30906@ichips.intel.com> Rimmer, Todd wrote: > In a functioning fabric, events will be rare. However its when you > first boot the fabric, reboot the SM or other similar "start up" actions > that things get real busy. Hmm... I need to think more about how to handle the start up scenario. > In general I have found that only a few clients will use events such as: > IPoIb to manage multicast subscriptions (join as send only for new > groups) and SA caches/replicas to keep their cache/replica synchronized. Can you give more details about how ipoib would use the event service? SA caches seem like they would register for traps... 64 (GID in), 65 (GID out), and 128 (switch port change)? Or is it reasonable to limit it to trap 128? Is trap 128 likely to be followed by traps 64 and 65? > In the silverstorm stack we created an API for a client to subscribe to > a notice. It allowed the client to specify: trap number, local HCA port > subscription was applicable to (in case multi-port HCAs on different > fabrics) and information for a callback to the client (client context > void*, function). The callback provided the client context void*, the > actual NOTICE from the SA and which HCA port it arrived on. This sounds like a simple enough interface. So, you tracked references on only the trap numbers then? > The API in the stack dealt with all the issues of remaining subscribed > (SA reregistraton, port disconnected/reconnected, etc) so the client > merely subscribed, got notice callbacks and later unsubscribed. In this > style API any LID based filtering would be done in the client itself. This makes sense. - Sean From ardavis at ichips.intel.com Mon Oct 16 16:10:32 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 16 Oct 2006 16:10:32 -0700 Subject: [openib-general] uDAPL problem In-Reply-To: <4534013A.5000900@cs.rutgers.edu> References: <4533FFB7.2020506@cs.rutgers.edu> <4534013A.5000900@cs.rutgers.edu> Message-ID: <45341168.8070305@ichips.intel.com> Steve Smaldone wrote: >Hi, > >Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >device appears. However, it now fails with the following: > >$ ./dapltest -T S -D IB1 >... >DAT Registry: dat_ia_openv (IB1,1:2,0) called >DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >DAT Registry: dat_registry_add_provider (IB1,1:2,0) >libibverbs: Warning: no userspace device-specific driver found for uverbs0 > driver search path: /usr/local/lib/infiniband >libibverbs: Warning: no userspace device-specific driver found for uverbs0 > driver search path: /usr/local/lib/infiniband >DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) >DT_cs_Server (IB1): Exiting. >DAT Registry: Stopped (dat_fini) > >The configuration remains the same otherwise. > > >>My dat.conf: >>IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 >>"hora-1-ib0 0" "" >> >> Do you have an entry in your /etc/hosts for hora-1-ib0 and 10.2.2.135? there seems to be problems resolving "hora-1-ib0" -arlin From smaldone at cs.rutgers.edu Mon Oct 16 16:39:10 2006 From: smaldone at cs.rutgers.edu (Stephen Smaldone) Date: Mon, 16 Oct 2006 19:39:10 -0400 Subject: [openib-general] uDAPL problem In-Reply-To: <45341168.8070305@ichips.intel.com> References: <4533FFB7.2020506@cs.rutgers.edu> <4534013A.5000900@cs.rutgers.edu> <45341168.8070305@ichips.intel.com> Message-ID: <4534181E.3020703@cs.rutgers.edu> Arlin Davis wrote: > Steve Smaldone wrote: > >> Hi, >> >> Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >> device appears. However, it now fails with the following: >> >> $ ./dapltest -T S -D IB1 >> ... >> DAT Registry: dat_ia_openv (IB1,1:2,0) called >> DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >> DAT Registry: dat_registry_add_provider (IB1,1:2,0) >> libibverbs: Warning: no userspace device-specific driver found for >> uverbs0 >> driver search path: /usr/local/lib/infiniband >> libibverbs: Warning: no userspace device-specific driver found for >> uverbs0 >> driver search path: /usr/local/lib/infiniband >> DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) >> DT_cs_Server (IB1): Exiting. >> DAT Registry: Stopped (dat_fini) >> >> The configuration remains the same otherwise. >> >> >>> My dat.conf: >>> IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so mv_dapl.1.2 >>> "hora-1-ib0 0" "" >>> > Do you have an entry in your /etc/hosts for hora-1-ib0 and 10.2.2.135? > > there seems to be problems resolving "hora-1-ib0" > > -arlin Yes. There is an entry as follows: 10.2.2.135 hora-1-ib0 Thanks, Steve From weiny2 at llnl.gov Mon Oct 16 16:47:43 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 16 Oct 2006 16:47:43 -0700 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 In-Reply-To: <20061016222106.GE21810@sashak.voltaire.com> References: <4525271E.8070000@dev.mellanox.co.il> <20061016144658.6a27b1f5.weiny2@llnl.gov> <20061016222106.GE21810@sashak.voltaire.com> Message-ID: <20061016164743.78375190.weiny2@llnl.gov> Ah, yes, that works. Thanks, Ira On Tue, 17 Oct 2006 00:21:06 +0200 Sasha Khapyorsky wrote: > On 14:46 Mon 16 Oct , Ira Weiny wrote: > > Sorry, I don't know much about git... > > > > I tried to "git" the module code for OFED 1.1 rc7 this using the following > > command. What am I doing wrong? > > > > 14:42:40 > git clone git://www.mellanox.co.il/~git/infinibandref > > fatal: unexpected EOF > > fetch-pack from 'git://www.mellanox.co.il/~git/infinibandref' failed. > > I guess this should be: > > git clone git://www.mellanox.co.il/~git/infiniband > > , and then after clone: > > cd infiniband > git checkout ofed_1_1 > > From changquing.tang at hp.com Mon Oct 16 17:51:37 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Mon, 16 Oct 2006 19:51:37 -0500 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <20061016164743.78375190.weiny2@llnl.gov> Message-ID: We tested RC7, but fork() does not work: 1. system() causes IB to fail. 2. fork(), child calling exit(0) immediately also causes IB to fail. Anyone has tested fork() related issue ? --CQ Tang, HP-MPI -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Ira Weiny Sent: Monday, October 16, 2006 6:48 PM To: Sasha Khapyorsky Cc: openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 Ah, yes, that works. Thanks, Ira On Tue, 17 Oct 2006 00:21:06 +0200 Sasha Khapyorsky wrote: > On 14:46 Mon 16 Oct , Ira Weiny wrote: > > Sorry, I don't know much about git... > > > > I tried to "git" the module code for OFED 1.1 rc7 this using the > > following command. What am I doing wrong? > > > > 14:42:40 > git clone git://www.mellanox.co.il/~git/infinibandref > > fatal: unexpected EOF > > fetch-pack from 'git://www.mellanox.co.il/~git/infinibandref' failed. > > I guess this should be: > > git clone git://www.mellanox.co.il/~git/infiniband > > , and then after clone: > > cd infiniband > git checkout ofed_1_1 > > _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ksharma at silverstorm.com Mon Oct 16 20:10:42 2006 From: ksharma at silverstorm.com (Sharma, Karun) Date: Mon, 16 Oct 2006 23:10:42 -0400 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release schedule References: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> Message-ID: The plan is OK with Silverstorm. I have a question though. What are the plans to support SRP-HA feature on RHEL4 kernels ? Thanks Karun ________________________________ From: openfabrics-ewg-bounces at openib.org on behalf of Tziporet Koren Sent: Mon 10/16/2006 1:03 PM To: Open Fabrics Cc: openib Subject: [openfabrics-ewg] OFED 1.1 release schedule This is the plan to do the 1.1 release this week: We will publish 1.1-pre1 package tomorrow (Tue. 17-Oct) Only blocker issues from RC7 will be updated: 1. SRP fix for Cisco FC gateway 2. Small updates for the install 3. Fix in diagnet to support SM on a switch 4. Activate scaling code of ehca as default in the install 5. Documentation update Each company will have 3 days for latest certification process and then the release can be done on Thursday. Company owners - please approve if this is OK with you. If not please elaborate the blocking reasons. Thanks, Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From krkumar2 at in.ibm.com Mon Oct 16 21:39:09 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:09 +0530 Subject: [openib-general] [PATCH] Use time_after_eq() instead of time_after() in queue_req() Message-ID: <20061017043909.4891.4421.sendpatchset@localhost.localdomain> In queue_req(), use time_after_eq() instead of time_after() for following reasons : - Improves insert time if multiple entries with same time are present. - set_timeout need not be called if entry with same time is added to the list (and that happens to be the entry with the smallest time), saving atomic/locking operations. - Earlier entries with same time are deleted first (fifo). Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/addr.c new/drivers/infiniband/core/addr.c --- org/drivers/infiniband/core/addr.c 2006-10-09 16:54:37.000000000 +0530 +++ new/drivers/infiniband/core/addr.c 2006-10-09 16:55:36.000000000 +0530 @@ -118,7 +118,7 @@ static void queue_req(struct addr_req *r mutex_lock(&lock); list_for_each_entry_reverse(temp_req, &req_list, list) { - if (time_after(req->timeout, temp_req->timeout)) + if (time_after_eq(req->timeout, temp_req->timeout)) break; } From krkumar2 at in.ibm.com Mon Oct 16 21:39:18 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:18 +0530 Subject: [openib-general] [PATCH] Fix some cancellation problems in process_req(). Message-ID: <20061017043918.4891.5249.sendpatchset@localhost.localdomain> Fixes following problems in process_req() relating to cancellation : - Function is wrongly doing another addr_remote() when cancelled, which is not required. - Make failure reporting immediate by using time_after_eq(). - On cancellation, -ETIMEDOUT was returned to the callback routine instead of the more appropriate -ECANCELLED (users getting notified may want to print/return this status, eg ucma_event_handler). Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/addr.c new/drivers/infiniband/core/addr.c --- org/drivers/infiniband/core/addr.c 2006-10-10 15:37:20.000000000 +0530 +++ new/drivers/infiniband/core/addr.c 2006-10-11 10:26:50.000000000 +0530 @@ -204,17 +204,16 @@ static void process_req(void *data) mutex_lock(&lock); list_for_each_entry_safe(req, temp_req, &req_list, list) { - if (req->status) { + if (req->status && req->status != -ECANCELED) { src_in = (struct sockaddr_in *) &req->src_addr; dst_in = (struct sockaddr_in *) &req->dst_addr; req->status = addr_resolve_remote(src_in, dst_in, req->addr); + if (req->status && time_after_eq(jiffies, req->timeout)) + req->status = -ETIMEDOUT; + else if (req->status == -ENODATA) + continue; } - if (req->status && time_after(jiffies, req->timeout)) - req->status = -ETIMEDOUT; - else if (req->status == -ENODATA) - continue; - list_del(&req->list); list_add_tail(&req->list, &done_list); } From krkumar2 at in.ibm.com Mon Oct 16 21:39:11 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:11 +0530 Subject: [openib-general] [PATCH] Rewrite cma_req_handler() to encapsulate common code. Message-ID: <20061017043911.4891.28143.sendpatchset@localhost.localdomain> Rewrite cma_req_handler error handling case to encapsulate common code. Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 16:57:26.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 17:00:07.000000000 +0530 @@ -934,13 +934,8 @@ static int cma_req_handler(struct ib_cm_ mutex_lock(&lock); ret = cma_acquire_dev(conn_id); mutex_unlock(&lock); - if (ret) { - ret = -ENODEV; - cma_exch(conn_id, CMA_DESTROYING); - cma_release_remove(conn_id); - rdma_destroy_id(&conn_id->id); - goto out; - } + if (ret) + goto release_conn_id; conn_id->cm_id.ib = cm_id; cm_id->context = conn_id; @@ -950,13 +945,17 @@ static int cma_req_handler(struct ib_cm_ ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, ib_event->private_data + offset, IB_CM_REQ_PRIVATE_DATA_SIZE - offset); - if (ret) { - /* Destroy the CM ID by returning a non-zero value. */ - conn_id->cm_id.ib = NULL; - cma_exch(conn_id, CMA_DESTROYING); - cma_release_remove(conn_id); - rdma_destroy_id(&conn_id->id); - } + if (!ret) + goto out; + + /* Destroy the CM ID by returning a non-zero value. */ + conn_id->cm_id.ib = NULL; + +release_conn_id: + cma_exch(conn_id, CMA_DESTROYING); + cma_release_remove(conn_id); + rdma_destroy_id(&conn_id->id); + out: cma_release_remove(listen_id); return ret; From krkumar2 at in.ibm.com Mon Oct 16 21:39:14 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:14 +0530 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count Message-ID: <20061017043914.4891.55.sendpatchset@localhost.localdomain> rdma_bind_addr leaks a cma_dev reference count in failure case. Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 17:13:41.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 19:42:31.000000000 +0530 @@ -1749,6 +1749,7 @@ static int cma_get_port(struct rdma_id_p int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; + int did_acquire_dev = 0; int ret; if (addr->sa_family != AF_INET) @@ -1765,18 +1766,20 @@ int rdma_bind_addr(struct rdma_cm_id *id ret = cma_acquire_dev(id_priv); mutex_unlock(&lock); } - if (ret) - goto err; + if (!ret) + did_acquire_dev = 1; + else + goto out; } - memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); ret = cma_get_port(id_priv); - if (ret) - goto err; - return 0; -err: - cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); +out: + if (ret) { + if (did_acquire_dev) + cma_detach_from_dev(id_priv); + cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); + } return ret; } EXPORT_SYMBOL(rdma_bind_addr); From krkumar2 at in.ibm.com Mon Oct 16 21:39:23 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:23 +0530 Subject: [openib-general] [PATCH] Re-send ARP as prev ARP request could have got dropped. Message-ID: <20061017043923.4891.37021.sendpatchset@localhost.localdomain> Re-send ARP, since earlier ARP request could have got dropped/lost. This should be done in addr_resolve_remote() as doing it in rdma_resolve_ip() means sending ARP only once. Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/addr.c new/drivers/infiniband/core/addr.c --- org/drivers/infiniband/core/addr.c 2006-10-10 16:02:01.000000000 +0530 +++ new/drivers/infiniband/core/addr.c 2006-10-10 16:13:22.000000000 +0530 @@ -188,6 +188,8 @@ static int addr_resolve_remote(struct so ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); release: neigh_release(neigh); + if (ret == -ENODATA) + addr_send_arp(dst_in); put: ip_rt_put(rt); out: @@ -300,7 +302,6 @@ int rdma_resolve_ip(struct sockaddr *src case -ENODATA: req->timeout = msecs_to_jiffies(timeout_ms) + jiffies; queue_req(req); - addr_send_arp(dst_in); break; default: ret = req->status; From krkumar2 at in.ibm.com Mon Oct 16 21:39:26 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:26 +0530 Subject: [openib-general] [PATCH] [RFC] cma_new_id can kfree on error instead of destroy_id Message-ID: <20061017043926.4891.43838.sendpatchset@localhost.localdomain> cma_new_id() does not require to do destroy_id(), instead it can kfree(), since nothing is allocated on that id. Posting this as an RFC in case anyone feels that create_id should be cleaned up by destroy_id (even if redundant). Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 20:59:45.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-10 10:17:42.000000000 +0530 @@ -883,6 +883,7 @@ static struct rdma_id_private *cma_new_i if (IS_ERR(id)) goto err; + id_priv = container_of(id, struct rdma_id_private, id); cma_save_net_info(&id->route.addr, &listen_id->route.addr, ip_ver, port, src, dst); @@ -891,7 +892,7 @@ static struct rdma_id_private *cma_new_i rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); if (!rt->path_rec) - goto destroy_id; + goto free_id; rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (rt->num_paths == 2) @@ -902,12 +903,11 @@ static struct rdma_id_private *cma_new_i ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey)); rt->addr.dev_addr.dev_type = RDMA_NODE_IB_CA; - id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; return id_priv; -destroy_id: - rdma_destroy_id(id); +free_id: + kfree(id_priv); err: return NULL; } From krkumar2 at in.ibm.com Mon Oct 16 21:39:20 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Tue, 17 Oct 2006 10:09:20 +0530 Subject: [openib-general] [PATCH] If addr_handler() got error, do not set state as OK Message-ID: <20061017043920.4891.43166.sendpatchset@localhost.localdomain> If addr_handler() got invoked with an error status, do not set id_priv->state to success followed by resettting it to the old value (redundant code). Also encapsulate some common code. Signed-off-by: Krishna Kumar -------- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-10 15:45:27.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-10 15:59:53.000000000 +0530 @@ -1515,6 +1515,8 @@ static void addr_handler(int status, str { struct rdma_id_private *id_priv = context; enum rdma_cm_event_type event; + int did_comp_exch = 0; + int destroy = 0; atomic_inc(&id_priv->dev_remove); @@ -1523,17 +1525,20 @@ static void addr_handler(int status, str * we're trying to acquire it. */ mutex_lock(&lock); - if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) { - mutex_unlock(&lock); - goto out; + if (!status) { + if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, + CMA_ADDR_RESOLVED)) { + mutex_unlock(&lock); + goto out; + } + did_comp_exch = 1; + if (!id_priv->cma_dev) + status = cma_acquire_dev(id_priv); } - - if (!status && !id_priv->cma_dev) - status = cma_acquire_dev(id_priv); mutex_unlock(&lock); - if (status) { - if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ADDR_BOUND)) + if (did_comp_exch && !cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, + CMA_ADDR_BOUND)) goto out; event = RDMA_CM_EVENT_ADDR_ERROR; } else { @@ -1544,14 +1549,13 @@ static void addr_handler(int status, str if (cma_notify_user(id_priv, event, status, NULL, 0)) { cma_exch(id_priv, CMA_DESTROYING); - cma_release_remove(id_priv); - cma_deref_id(id_priv); - rdma_destroy_id(&id_priv->id); - return; + destroy = 1; } out: cma_release_remove(id_priv); cma_deref_id(id_priv); + if (destroy) + rdma_destroy_id(&id_priv->id); } static int cma_resolve_loopback(struct rdma_id_private *id_priv) From mst at mellanox.co.il Mon Oct 16 22:11:40 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 07:11:40 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161031849.2917.400.camel@fc6.xsintricity.com> References: <347180497203A942A6AA82C85846CBC9034F6001@ES23SNLNT.srn.sandia.gov> <20061016132512.GD14878@mellanox.co.il> <1161031849.2917.400.camel@fc6.xsintricity.com> Message-ID: <20061017051140.GB17404@mellanox.co.il> Quoting r. Doug Ledford : > > Dough, would it be possible to update this + libmthca? > > Possibly. What's the justification? What's in 1.0.4 that is the > primary reason for wanting to update from 1.0.3? Hmm, Roland, I went to look into Changelog and I note that you don't label svn versions or release versions there. So it's hard to see what was fixed in what version, or to map to svn versions. I'll dig that info up but I thin we want it in ChangeLog. -- MST From mst at mellanox.co.il Mon Oct 16 22:23:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 07:23:03 +0200 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: <20061017043914.4891.55.sendpatchset@localhost.localdomain> References: <20061017043914.4891.55.sendpatchset@localhost.localdomain> Message-ID: <20061017052303.GC17404@mellanox.co.il> Quoting r. Krishna Kumar : > Subject: [PATCH] rdma_bind_addr() leaks a cma_dev reference count > > rdma_bind_addr leaks a cma_dev reference count in failure > case. > > Signed-off-by: Krishna Kumar > -------- > diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c > --- org/drivers/infiniband/core/cma.c 2006-10-09 17:13:41.000000000 +0530 > +++ new/drivers/infiniband/core/cma.c 2006-10-09 19:42:31.000000000 +0530 > @@ -1749,6 +1749,7 @@ static int cma_get_port(struct rdma_id_p > int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) > { > struct rdma_id_private *id_priv; > + int did_acquire_dev = 0; > int ret; > > if (addr->sa_family != AF_INET) > @@ -1765,18 +1766,20 @@ int rdma_bind_addr(struct rdma_cm_id *id > ret = cma_acquire_dev(id_priv); > mutex_unlock(&lock); > } > - if (ret) > - goto err; > + if (!ret) > + did_acquire_dev = 1; > + else > + goto out; > } > - > memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); > ret = cma_get_port(id_priv); > - if (ret) > - goto err; > > - return 0; > -err: > - cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > +out: > + if (ret) { > + if (did_acquire_dev) > + cma_detach_from_dev(id_priv); > + cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > + } > return ret; > } > EXPORT_SYMBOL(rdma_bind_addr); Ugh, replacing two labels with an if statement is uglifying code: look how nesting got 2-deep already. Please do the error handling the usual linux way without testing flags: if (ret) goto err .... err: cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); out: return ret; is better than if (ret) goto out; .... out: if (ret) cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); look up Documentation/CodingStyle Chapter 7: Centralized exiting of functions which says unconditional statements are easier to understand and follow, and note that with this style nesting is reduced. -- MST From rdreier at cisco.com Mon Oct 16 22:26:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 16 Oct 2006 22:26:05 -0700 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061017051140.GB17404@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 07:11:40 +0200") References: <347180497203A942A6AA82C85846CBC9034F6001@ES23SNLNT.srn.sandia.gov> <20061016132512.GD14878@mellanox.co.il> <1161031849.2917.400.camel@fc6.xsintricity.com> <20061017051140.GB17404@mellanox.co.il> Message-ID: Michael> Hmm, Roland, I went to look into Changelog and I note Michael> that you don't label svn versions or release versions Michael> there. So it's hard to see what was fixed in what Michael> version, or to map to svn versions. Michael> I'll dig that info up but I thin we want it in ChangeLog. I do put the releases into the changelog. There has not been a libibverbs 1.0.4 release yet, but you can find the entry 2006-05-02 Roland Dreier * Release version 1.0.3. in the libibverbs ChangeLog. Everything after that is the changes pending for 1.0.4 (which I really want to do soon). I guess I should change svn right after the release to be 1.0.4-pre until the actual release, rather than bumping to 1.0.4 immediately, to reduce the possibility of confusion. - R. From krkumar2 at in.ibm.com Mon Oct 16 22:37:10 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Tue, 17 Oct 2006 11:07:10 +0530 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: <20061017052303.GC17404@mellanox.co.il> Message-ID: > look up Documentation/CodingStyle Chapter 7: Centralized exiting of functions > which says unconditional statements are easier to understand and follow, > and note that with this style nesting is reduced. Hmmm, OK, I will re-phrase this patch to reduce nesting. thanks, - KK "Michael S. Tsirkin" wrote on 10/17/2006 10:53:03 AM: > Quoting r. Krishna Kumar : > > Subject: [PATCH] rdma_bind_addr() leaks a cma_dev reference count > > > > rdma_bind_addr leaks a cma_dev reference count in failure > > case. > > > > Signed-off-by: Krishna Kumar > > -------- > > diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c > > --- org/drivers/infiniband/core/cma.c 2006-10-09 17:13:41.000000000 +0530 > > +++ new/drivers/infiniband/core/cma.c 2006-10-09 19:42:31.000000000 +0530 > > @@ -1749,6 +1749,7 @@ static int cma_get_port(struct rdma_id_p > > int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) > > { > > struct rdma_id_private *id_priv; > > + int did_acquire_dev = 0; > > int ret; > > > > if (addr->sa_family != AF_INET) > > @@ -1765,18 +1766,20 @@ int rdma_bind_addr(struct rdma_cm_id *id > > ret = cma_acquire_dev(id_priv); > > mutex_unlock(&lock); > > } > > - if (ret) > > - goto err; > > + if (!ret) > > + did_acquire_dev = 1; > > + else > > + goto out; > > } > > - > > memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); > > ret = cma_get_port(id_priv); > > - if (ret) > > - goto err; > > > > - return 0; > > -err: > > - cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > > +out: > > + if (ret) { > > + if (did_acquire_dev) > > + cma_detach_from_dev(id_priv); > > + cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > > + } > > return ret; > > } > > EXPORT_SYMBOL(rdma_bind_addr); > > Ugh, replacing two labels with an if statement is uglifying code: look how > nesting got 2-deep already. Please do the error handling the usual linux way > without testing flags: > > if (ret) > goto err > > .... > err: > cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > out: > return ret; > > is better than > if (ret) > goto out; > > .... > > out: > if (ret) > cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > > look up Documentation/CodingStyle Chapter 7: Centralized exiting of functions > which says unconditional statements are easier to understand and follow, > and note that with this style nesting is reduced. > > -- > MST From mst at mellanox.co.il Mon Oct 16 22:32:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 07:32:04 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: References: Message-ID: <20061017053204.GD17404@mellanox.co.il> Quoting r. Roland Dreier : > I do put the releases into the changelog. There has not been a > libibverbs 1.0.4 release yet, but you can find the entry > > 2006-05-02 Roland Dreier > > * Release version 1.0.3. > > in the libibverbs ChangeLog. Everything after that is the changes > pending for 1.0.4 (which I really want to do soon). Thanks, mised that, sorry. > I guess I should change svn right after the release to be 1.0.4-pre > until the actual release, rather than bumping to 1.0.4 immediately, to > reduce the possibility of confusion. OK, thanks. -- MST From bugzilla-daemon at openib.org Mon Oct 16 23:05:25 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 16 Oct 2006 23:05:25 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061017060525.AC23B2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #10 from sweitzen at cisco.com 2006-10-16 23:05 ------- I'm trying debug_level=1 now, sorry for the delay, but I wanted to finish other rc7 testing. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From xma at us.ibm.com Mon Oct 16 23:03:38 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 16 Oct 2006 23:03:38 -0700 Subject: [openib-general] ethtool support for ipoib Message-ID: I am going to add below ethtool ops in ipoib. Anything comments? Once ethtool support is added, GSO will be get/set directly through ethtool as Michael pointed out earlier. static struct ethtool_ops ipoib_ethtool_ops = { .get_settings = ipoib_get_settings, .set_settings = ipoib_set_settings, .get_drvinfo = ipoib_get_dvrinfo, .get_link = ethtool_op_get_link, .get_stats_count = ipoib_get_stats_count, .get_ethtool_stats = ipoib_get_ethtool_stats, /* can be added later once ipoib support sg .get_sg = ethtool_op_get_sg, .set_sg = ethtool_op_set_sg, */ }; thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Oct 16 23:12:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 08:12:03 +0200 Subject: [openib-general] ethtool support for ipoib In-Reply-To: References: Message-ID: <20061017061203.GE17404@mellanox.co.il> Quoting r. Shirley Ma : > /* can be added later once ipoib support sg > .get_sg = ethtool_op_get_sg, > .set_sg = ethtool_op_set_sg, > */ The difficulty here is that sg currently requires checksum offloading in netdevice. -- MST From kimbrr at melbourne.sgi.com Tue Oct 17 00:02:08 2006 From: kimbrr at melbourne.sgi.com (Michael Newton) Date: Tue, 17 Oct 2006 17:02:08 +1000 Subject: [openib-general] sysfs exposure of port counters useless? Message-ID: On Tue May 9 05:06:13 PDT 2006, Leonid Arsh leonida at voltaire.com posted a patch under the Subject [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters in which /sys/class/infiniband/*/ports/*/counters/* were made writeable, so that they could be reset by writing zero Michael S. Tsirkin mst at mellanox.co.il Tue May 9 06:17:13 PDT 2006 replied with some constructive amendments.. However this is where the thread appears to have finished: Roland Dreier rdreier at cisco.com Wed May 10 08:55:22 PDT 2006 > Leonid> A user space application is an option too, although I > Leonid> think it's nice to have a 'built in' kernel feature. > >As Hal pointed out, there already is an app to do this. So I don't >see much need to put it into the kernel. ..ie the ability to reset using perfquery was felt to be sufficient, and the patch wasnt adopted. I wonder though who would ever use the sysfs exposure of port counters, as they stand? These are 32 bit counters. The rcv/xmit_data counters count 32-bit blocks. Also, these counts do not wrap: they peg at all 1s. At infiniband speeds, these counts can peg out very quickly indeed, to the point they can really only be of use if they can be reset each time there read. Now if anyone who wants to use them has to go the CLI to reset them, and theres little point in reading them without reset, why would anyone read them via sysfs? so why have them? Dr.Michael("Kimba")Newton kimbrr at sgi.com From ogerlitz at voltaire.com Tue Oct 17 00:21:12 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 17 Oct 2006 09:21:12 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: References: Message-ID: <45348468.10807@voltaire.com> Tang, Changqing wrote: > We tested RC7, but fork() does not work: > > 1. system() causes IB to fail. > 2. fork(), child calling exit(0) immediately also causes IB to fail. > > Anyone has tested fork() related issue ? > > --CQ Tang, HP-MPI Hi CQ, The fork() support patches were incorporated into libibverbs1.1 which is not released yet. Both OFED 1.0 and 1.1 use libibverbs1.0 Or. From glebn at voltaire.com Tue Oct 17 00:32:14 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 17 Oct 2006 09:32:14 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <45348468.10807@voltaire.com> References: <45348468.10807@voltaire.com> Message-ID: <20061017073214.GM6145@minantech.com> On Tue, Oct 17, 2006 at 09:21:12AM +0200, Or Gerlitz wrote: Tang, Changqing wrote: > We tested RC7, but fork() does not work: > > 1. system() causes IB to fail. > 2. fork(), child calling exit(0) immediately also causes IB to fail. > > Anyone has tested fork() related issue ? > What kernel are you testing? system() should work (in non threaded apps at least) with modern kernel. -- Gleb. From tziporet at dev.mellanox.co.il Tue Oct 17 00:52:44 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 17 Oct 2006 09:52:44 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <20061017073214.GM6145@minantech.com> References: <45348468.10807@voltaire.com> <20061017073214.GM6145@minantech.com> Message-ID: <45348BCC.3020303@dev.mellanox.co.il> glebn at voltaire.com wrote: > On Tue, Oct 17, 2006 at 09:21:12AM +0200, Or Gerlitz wrote: > Tang, Changqing wrote: > >> We tested RC7, but fork() does not work: >> >> 1. system() causes IB to fail. >> 2. fork(), child calling exit(0) immediately also causes IB to fail. >> >> Anyone has tested fork() related issue ? >> >> > What kernel are you testing? system() should work (in non threaded apps at > least) with modern kernel. > > -- > Gleb. > > _______________________________________________ > From the OFED release notes: 3. Fork support from kernel 2.6.12 and above is available provided that applications do not use threads. Only system() or fork() and immediate exec() are supported. Tziporet From monil at voltaire.com Tue Oct 17 01:13:17 2006 From: monil at voltaire.com (Moni Levy) Date: Tue, 17 Oct 2006 10:13:17 +0200 Subject: [openib-general] [openfabrics-ewg] We wish to do the 1.1 release next week In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACEAA@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C92ACEAA@mtlexch01.mtl.com> Message-ID: <6a122cc00610170113r1df48d88v6446f78b77c0ff32@mail.gmail.com> Sounds like a great idea. We don't have blocking issues, but would be happy to test the pre-release. Moni On 10/16/06, Tziporet Koren wrote: > This patch is already in. > We will publish latest pre-release version tomorrow so everybody can do > latest checks. > > Is this OK? > Tziporet > > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott > Weitzenkamp (sweitzen) > Sent: Sunday, October 15, 2006 10:16 PM > To: Tziporet Koren; openfabrics-ewg at openib.org; OPENIB > Subject: Re: [openfabrics-ewg] [openib-general] We wish to do the 1.1 > release next week > > Yes, bug 273 (http://openib.org/bugzilla/show_bug.cgi?id=273) is a > blocking issue for Cisco. Roland sent a patch last Monday. I'm done > testing the other parts of rc7, and am testing his patch later today. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren > > Sent: Thursday, October 12, 2006 7:44 AM > > To: openfabrics-ewg at openib.org; OPENIB > > Subject: [openib-general] We wish to do the 1.1 release next week > > > > Hi all, > > > > I am back from vacation and found you waited with the release > > for me :-) > > > > From a quick look at status mails I think we can do the official > > release next week. > > > > Please reply if there are still any blocking issues you have. > > > > Also - please update all documents till end of Monday next week. > > > > Tziporet > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > > From glebn at voltaire.com Tue Oct 17 01:19:31 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 17 Oct 2006 10:19:31 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <45348BCC.3020303@dev.mellanox.co.il> References: <45348468.10807@voltaire.com> <20061017073214.GM6145@minantech.com> <45348BCC.3020303@dev.mellanox.co.il> Message-ID: <20061017081930.GO6145@minantech.com> On Tue, Oct 17, 2006 at 09:52:44AM +0200, Tziporet Koren wrote: > glebn at voltaire.com wrote: > >On Tue, Oct 17, 2006 at 09:21:12AM +0200, Or Gerlitz wrote: > >Tang, Changqing wrote: > > > >>We tested RC7, but fork() does not work: > >> > >>1. system() causes IB to fail. > >>2. fork(), child calling exit(0) immediately also causes IB to fail. > >> > >>Anyone has tested fork() related issue ? > >> > >> > >What kernel are you testing? system() should work (in non threaded apps at > >least) with modern kernel. > > > >-- > > Gleb. > > > >_______________________________________________ > > > > > From the OFED release notes: > > 3. Fork support from kernel 2.6.12 and above is available provided that > applications do not use threads. Only system() or fork() and immediate > exec() are supported. > system() works because parent calls wait(). fork() and immediate exec() may very well fail. I propose to fix the release notes. -- Gleb. From dotanb at dev.mellanox.co.il Tue Oct 17 03:21:15 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 17 Oct 2006 12:21:15 +0200 Subject: [openib-general] [ucma] executing the ucmatose with local IPoIB IP address of port 2 fails Message-ID: <4534AE9B.4030907@dev.mellanox.co.il> Hi. I'm using the following configuration: ************************************************************* Host Architecture : x86_64 Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10 Kernel Version : 2.6.16.21-0.8-smp GCC Version : gcc (GCC) 4.1.0 (SUSE Linux) Memory size : 4047700 kB Driver Version : OFED-1.1-rc7-testbuild HCA ID(s) : mthca0 HCA model(s) : 25208 Board(s) : MT_00A0010001 ************************************************************* I have 2 machines connected port 1<-->port 1 and port2 <--> port 2 and i tried the following scenarios (only on one of the machine): scenario 1: passes SM was executed on port 1 i executed ucmatose server and ucmatose client with IPoIB IP address of port 1 scenario 2: fails SM was executed on port 2 i executed ucmatose server and ucmatose client with IPoIB IP address of port 2 here is the output of the client: ucmatose: starting client ucmatose: connecting ucmatose: event: 3, error: 0 receiving data transfers sending replies data transfers complete test complete return status 0 scenario 3: passes SM was executed on port 1 and on port 2 (i have 2 SMs, one on each port) (i executed ucmatose server and ucmatose client with IPoIB IP address of port 1) i executed ucmatose server and ucmatose client with IPoIB IP address of port 2 It seems that when using the IPoIB IP address of port 2 in the client side and there is an SM only on port 2 the test fails but if i add an SM on port 1 the test passes. Did you notice this behavior before? thanks Dotan From halr at voltaire.com Tue Oct 17 03:50:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 06:50:24 -0400 Subject: [openib-general] [PATCHv2] osm: fixing OSM_LOG_DIR and OSM_CACHE_DIR treatment In-Reply-To: <45340A3B.1000709@dev.mellanox.co.il> References: <45335546.5090909@dev.mellanox.co.il> <20061016190436.GA21672@sashak.voltaire.com> <4534046A.5090205@dev.mellanox.co.il> <1161036857.32093.361017.camel@hal.voltaire.com> <45340A3B.1000709@dev.mellanox.co.il> Message-ID: <1161082150.32093.392094.camel@hal.voltaire.com> On Mon, 2006-10-16 at 18:39, Yevgeny Kliteynik wrote: > Hi Hal. > > [snip] > > > > So will you supply another patch with this approach ? > > > > -- Hal > > Here it is. > > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From sashak at voltaire.com Tue Oct 17 04:42:45 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 17 Oct 2006 13:42:45 +0200 Subject: [openib-general] [PATCH] opensm/diags: fix regular expression in dump_lfts.sh Message-ID: <20061017114245.GA25715@sashak.voltaire.com> This fixes regular expression in dump_lfts.sh script, which is used for switch's LIDs extraction. Signed-off-by: Sasha Khapyorsky --- diags/scripts/dump_lfts.sh | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/diags/scripts/dump_lfts.sh b/diags/scripts/dump_lfts.sh index bed4778..5c62ed7 100755 --- a/diags/scripts/dump_lfts.sh +++ b/diags/scripts/dump_lfts.sh @@ -14,7 +14,7 @@ usage () dump_by_lid () { for sw_lid in `ibswitches \ - | sed -ne 's/^.* lid \([1-9a-f]*\) .*$/\1/p'` ; do + | sed -ne 's/^.* lid \([0-9a-f]*\) .*$/\1/p'` ; do ibroute $sw_lid done } -- 1.4.2.3.g128e From jsquyres at cisco.com Tue Oct 17 05:09:41 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 17 Oct 2006 08:09:41 -0400 Subject: [openib-general] [openfabrics-ewg] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 In-Reply-To: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> References: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> Message-ID: <47F763FC-0F90-45C3-B4FB-477438A41B59@cisco.com> A few comments: 1. It would be good to start at 3pm or later; there's still lots of non-OFA people to see/meet on Thursday afternoon. 2. Is there going to be an OFED 1.3, or are we going straight to 2.0? 3. Regardless of what the version number is going to be, I would very much like to see a sizeable discussion about the OFED process for this version. This should include topics such as (but not be limited to): - what are the exact features that will be in this series (delineated into categories such as "blockers", "would like to have", and "if we have time", etc.) - what the testing matrix is going to be (who is testing what, exactly) - what the release criteria are (probably strongly related to the testing matrix) - what the dates are (development, testing, estimated release) - a plan for integrating with RHEL / SLES In general, I would very much like to see the community to come up with a published plan for the next version series. I'm not entirely sure, but this discussion could fit into the Thursday 18:00-end and Friday start-noon discussions. Is this what was intended? If so, it would be great to see the above points worked into the agenda. 4. Discussion / plan for moving to the new OFA server. On Oct 15, 2006, at 5:02 PM, Bill Boas wrote: > To all in the OpenFabrics Community > > > > We will be holding our first Developer Summit in the Tampa > Convention Center courtesy of SC06 starting at 1.30PM in Room 17 on > Thursday November 16, 2006. On Friday November 17, we will start in > Room 13 at 8.00 AM and continue till 5.00PM. We have had to > schedule into these time slots because no other usable space is > available at any other times during the week of SC06! > > > > OpenFabrics will cater food and beverages for afternoon break and > supper on Thursday, breakfast, lunch and two breaks on Friday. We > will set up a registration site at Acteva to collect $$ to cover > our out of pocket expenses – I’ll email out the URL for that site > in the next day or two. > > > > Please review attached Strawman purposes, suggested attendees and > agenda. Any changes or comments, please email them to the community > for all to comment on please. > > > > The Summit has several dimensions and themes throughout our work > there: > > 1) – consistency and robustness of the Linux and Windows software > stacks for Release 2.0 of OpenFabrics; > > 2) - feature selection, development resources and timelines for > Release 2.0; > > 3) - activities, features and processes of the Enterprise Working > Group on OFED 1.x until Release 2.0 is ready hand-off to the EWG; > > 4) – enhancing the resources of the EWG to be ready for 2.0 it so > that it may be subsequently be distributed as OFED 2.0. and adopted > by the OpenFabrics vendor and customer communities for production use. > > > > This is a far too much work for just a day and half! PLEASE START > NOW exchanging ideas for additional features, contact peer > engineers from companies and customers to discuss work sizing, > development resources, identify volunteer developers for items so > that when we meet on the 16th we’re not starting from a blank sheet! > > > > Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal > Rosenstock, Tom Tucker and Bob Woodruff are leading the pre- > meeting, STRAWMAN collation of requirements, feature > prioritization, developer assignments, sizing and processes so that > we have the list largely complete prior to the meeting and people > know has already volunteered for items from the list. > > > > Bill Boas > > VP, Business Development | System Fabric Works > > bboas at systemfabricworks.com | 510-375-8840 > > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From ogerlitz at voltaire.com Tue Oct 17 04:50:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 17 Oct 2006 13:50:26 +0200 (IST) Subject: [openib-general] some OFED source/build questions Message-ID: Hi Vlad, I have few questions on some issues i have run into while working with OFED-1.1-rc7 over RH4 U3. I did some reading before, so if its RTFM, please point me to the relevant doc... 1) backports patches Looking in ./SOURCES/openib-1.1/configure there is a case for each system per kernel version. When all the patches on this directory are applied on the sources when building on this system. Looking in the SOURCES/openib-1.1/kernel_patches/backport/2.6.9_U3 there is only one patch that changes ipoib code and this patch does not touch the ipoib debugfs code or the code that calls to initialize it. So what's the trick? doing nm on the ipoib module i see there is no call to debugfs_create_dir... Is there any alternative to debugfs to get the ipoib path/mcast info on RH4 Ux systems? 2) OPENIB_PARAMS documentation Doing some probing, i undersrand that to set this or that option to the build i need to set this or that --with or --without directive to the SOURCES/openib-1.1/ofed_scripts/configure script and this is done by setting the OPENIB_PARAMS env var while running the install.sh script. Some of these --with/out options which need to be documented somewhere are not, for example to set CONFIG_INFINIBAND_IPOIB_DEBUG i need to add --with-ipoib_debug-mod to the build, correct? Also, is there a way to see after building with which exact CONFIG_INFINIBAND_ directives OFED was built? Or. From tziporet at dev.mellanox.co.il Tue Oct 17 05:35:42 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 17 Oct 2006 14:35:42 +0200 Subject: [openib-general] OFED 1.1 release schedule In-Reply-To: <4533CC4A.2070709@ichips.intel.com> References: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> <4533CC4A.2070709@ichips.intel.com> Message-ID: <4534CE1E.1010106@dev.mellanox.co.il> Arlin Davis wrote: > Can someone double check the ib_cm kernel patch > (sean_cm_drep_on_not_found.patch) again and verify the build process. I > don't see the cm_issue_drep symbol in an RC7 build. From the build logs > it appears that the patch is applied but I do not see the symbol in the > installed ib_cm.ko after the build is complete. > > system with OFED RC7.. > > nm ib_cm.ko | grep issue > 0000000000001689 t cm_issue_rej > > system with latest svn pull.... > > nm ib_cm.ko | grep issue > > 000029f7 t cm_issue_drep > 00001486 t cm_issue_rej > I checked it and saw that the patch is applied, but since in the patch Sean put the cm_issue_drep as a static, thus nm does not show it. from the patch: +static int cm_issue_drep(struct cm_port *port, .... Once I removed the static I get: nm ib_cm.ko | grep issue 00001fca T cm_issue_drep 00001605 t cm_issue_rej Still its not exported since there is no call for EXPORT_SYMBOL. Do you really need the symbol to be exported out of the ib_cm module, or is it enough this way. Please reply ASAP since we want to close the release code today. Tziporet From ishai at mellanox.co.il Tue Oct 17 05:36:00 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 17 Oct 2006 14:36:00 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 release schedule Message-ID: <6C2C79E72C305246B504CBA17B5500C92A7A0F@mtlexch01.mtl.com> Hi, Let me first explain why the current OFED release does not support SRP-HA on RHEL4. SRP-HA is using Device Mapper multipath. Multipath prerequisites include udev of higher version than 050. RHEL4 distributions includes udev 039. udev is an important part of the distribution and I do not think that users will be ready to upgrade it in order to have SRP-HA. To my best knowledge the main reason that multipath needs at least udev 050 is because it uses the RUN option (This option executes its given parameter after the device exist). Multipath uses the RUN option to execute kpartx that handles the partitions of the new device. SRP-HA also uses the RUN option to execute the multipath command. I have an idea on how to overcome this problem. I want to implement a srp-multipath-daemon. This daemon will get kpartx and multipath requests using a shared message queue. The udev will use the PROGRAM option (That executes its given parameter immediately - before the device exist) to post request to this shared message queue and return immediately. The daemon will wait for the device to create and only than it will execute the commands. In any case this technique will not be a part of the coming OFED release. Ishai -----Original Message----- From: Sharma, Karun [mailto:ksharma at silverstorm.com] Sent: Tuesday, October 17, 2006 5:11 AM To: Tziporet Koren; Open Fabrics Cc: openib Subject: RE: [openfabrics-ewg] OFED 1.1 release schedule The plan is OK with Silverstorm. I have a question though. What are the plans to support SRP-HA feature on RHEL4 kernels ? Thanks Karun -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Tue Oct 17 04:56:35 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 17 Oct 2006 07:56:35 -0400 Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 In-Reply-To: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> References: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> Message-ID: <218A56C1-2789-4572-BA43-495749B00715@cisco.com> I have copied this information to the wiki -- please make all updates there so that there is a single reference point to find all the information about the meeting. Thanks! https://openib.org/tiki/tiki-index.php?page=Meeting+Minutes On Oct 15, 2006, at 5:02 PM, Bill Boas wrote: > To all in the OpenFabrics Community > > > > We will be holding our first Developer Summit in the Tampa > Convention Center courtesy of SC06 starting at 1.30PM in Room 17 on > Thursday November 16, 2006. On Friday November 17, we will start in > Room 13 at 8.00 AM and continue till 5.00PM. We have had to > schedule into these time slots because no other usable space is > available at any other times during the week of SC06! > > > > OpenFabrics will cater food and beverages for afternoon break and > supper on Thursday, breakfast, lunch and two breaks on Friday. We > will set up a registration site at Acteva to collect $$ to cover > our out of pocket expenses – I’ll email out the URL for that site > in the next day or two. > > > > Please review attached Strawman purposes, suggested attendees and > agenda. Any changes or comments, please email them to the community > for all to comment on please. > > > > The Summit has several dimensions and themes throughout our work > there: > > 1) – consistency and robustness of the Linux and Windows software > stacks for Release 2.0 of OpenFabrics; > > 2) - feature selection, development resources and timelines for > Release 2.0; > > 3) - activities, features and processes of the Enterprise Working > Group on OFED 1.x until Release 2.0 is ready hand-off to the EWG; > > 4) – enhancing the resources of the EWG to be ready for 2.0 it so > that it may be subsequently be distributed as OFED 2.0. and adopted > by the OpenFabrics vendor and customer communities for production use. > > > > This is a far too much work for just a day and half! PLEASE START > NOW exchanging ideas for additional features, contact peer > engineers from companies and customers to discuss work sizing, > development resources, identify volunteer developers for items so > that when we meet on the 16th we’re not starting from a blank sheet! > > > > Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal > Rosenstock, Tom Tucker and Bob Woodruff are leading the pre- > meeting, STRAWMAN collation of requirements, feature > prioritization, developer assignments, sizing and processes so that > we have the list largely complete prior to the meeting and people > know has already volunteered for items from the list. > > > > Bill Boas > > VP, Business Development | System Fabric Works > > bboas at systemfabricworks.com | 510-375-8840 > > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From jsquyres at cisco.com Tue Oct 17 06:17:27 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 17 Oct 2006 09:17:27 -0400 Subject: [openib-general] Tools for development Message-ID: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> Per the teleconference last week, I'd like to survey the developers about the tools that should be installed on the new OFA server (is there a plan to migrate there yet?). As I understand it (please correct me if I get this wrong): - The community has decided to stay with git for kernel level development --> Was there a plan for any consolidation of the various git repositories?) - The community decided to stay with svn for user space level development - Some version of git and svn are installed on the new server, but that's about it So there still needs to be some discussion about what other tools to install on the new server. There was an aborted discussion about moving from bugzilla to trac on the ewg list. See the following (the web archives didn't thread them totally properly): http://openib.org/pipermail/openfabrics-ewg/2006-October/001732.html http://openib.org/pipermail/openfabrics-ewg/2006-October/001739.html http://openib.org/pipermail/openfabrics-ewg/2006-October/001742.html It seems like trac can integrate with both SVN and git and would also provide us with integrated wiki capabilities. I personally have no problem with bugzilla, but I can attest to the Goodness of trac because we use it extensively in OMPI. See my post (above) for some details, but here's a rollup of pros/cons of switching to trac on the new server: Pros: +++ Integrate SVN and git commit messages with bug tracking (although we might need separate trac instances -- one for SVN and one for git) +++ Built-in wiki support -- one syntax for commit messages, the general wiki, and tickets +++ Track milestones and bugs/tickets together (i.e., help release procedures) +++ Trivially link between SVN/git commit messages, tickets, the wiki, and syntax-colored commit diffs +++ There is a tool for migrating Bugzilla db's to trac (although I have not tried it myself): http://trac.edgewall.org/browser/trunk/ contrib/bugzilla2trac.py +++ Same username/password used for both SVN and Trac Cons: --- A change from the existing system; people will need to learn something new --- Bugzilla ain't broke; we don't necessarily need to fix it --- Will need to map between current Bugzilla fields (product, component, status, resolution, url, hardware, os, version, priority, severity, cc) and new trac fields (component, milestone, severity, priority, type, version) Neutral points: === Neither bugzilla nor trac are on the new server; we need to choose something. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From halr at voltaire.com Tue Oct 17 06:19:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 09:19:05 -0400 Subject: [openib-general] [PATCH] opensm/diags: fix regular expression in dump_lfts.sh In-Reply-To: <20061017114245.GA25715@sashak.voltaire.com> References: <20061017114245.GA25715@sashak.voltaire.com> Message-ID: <1161091140.32093.398152.camel@hal.voltaire.com> On Tue, 2006-10-17 at 07:42, Sasha Khapyorsky wrote: > This fixes regular expression in dump_lfts.sh script, which is used for > switch's LIDs extraction. > > Signed-off-by: Sasha Khapyorsky > --- > diags/scripts/dump_lfts.sh | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) Thanks. Applied to both trunk and 1.1. -- Hal From mst at mellanox.co.il Tue Oct 17 06:35:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 15:35:22 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1160870845.2917.334.camel@fc6.xsintricity.com> References: <1160870845.2917.334.camel@fc6.xsintricity.com> Message-ID: <20061017133522.GB20690@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: RHEL5 and OFED ... > > On Sat, 2006-10-14 at 22:14 +0200, Michael S. Tsirkin wrote: > > Quoting r. Doug Ledford : > > > Sorry. RHEL5 Beta1 has been out for a while, but OFED 1.1 still isn't > > > done yet. Obviously, I wasn't able to get something in RHEL5 that > > > didn't even exist prior to freeze. > > > > Would it be possible to include patches backporting fixes in infiniband kernel > > components from 2.6.18/OFED 1.1 to modules that already ship with RHEL5? > > Maybe. It would depend on the patch. Of course, keep in mind that > RHEL5 Beta1 *has* a 2.6.18 kernel. I looked here ftp://ftp.redhat.com/pub/redhat/linux/beta/RHEL5-Beta1/server/source/ and it seems to be 2.6.17 based. Is something wrong? -- MST From mst at mellanox.co.il Tue Oct 17 06:44:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 15:44:03 +0200 Subject: [openib-general] Tools for development In-Reply-To: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> Message-ID: <20061017134403.GD20690@mellanox.co.il> Quoting Jeff Squyres : > - Some version of git and svn are installed on the new server, but > that's about it The tool versions installed on openib are ancient. Can site admins please install latest svn and git versions from source? -- MST From trimmer at silverstorm.com Tue Oct 17 06:43:54 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 17 Oct 2006 09:43:54 -0400 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: <45340E29.30906@ichips.intel.com> Message-ID: > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Monday, October 16, 2006 6:57 PM > To: Rimmer, Todd > Cc: Matt Leininger; openib > Subject: Re: [openib-general] [RFC] Notice/InformInfo event reporting > > Rimmer, Todd wrote: > > In a functioning fabric, events will be rare. However its when you > > first boot the fabric, reboot the SM or other similar "start up" actions > > that things get real busy. > > Hmm... I need to think more about how to handle the start up scenario. > > > In general I have found that only a few clients will use events such as: > > IPoIb to manage multicast subscriptions (join as send only for new > > groups) and SA caches/replicas to keep their cache/replica synchronized. > > Can you give more details about how ipoib would use the event service? Technically, to meet the Link Layer semantics which TCP/IP expects, IPoIB should join as a senderonly for very IPoIB multicast group which exists. This is because TCP/IP in Linux expects to be able to send to any multicast group without first informing the link layer. It expects to inform the link layer only when it wants to receive from a given multicast group. This is a side effect of how most Ethernet NICs work (multicast filtering is only implemented on receive side of Ethernet NIC) and how Ethernet LANs work (a single subnet will forward multicast sends to all nodes via the spanning tree). Hence IPoIB should subscribe for the multicast GID created notice and use it to manage its sender only status. It should also register for the multicast GID deleted notice and use it to delete its sender only status. (notice that in IBTA 1.2 15.2.5.17.1 SenderOnly status does not count toward group create/delete reference counts, hence the group can be deleted while there are sender only members, hence the interest in GID out of service). > > SA caches seem like they would register for traps... 64 (GID in), 65 (GID > out), > and 128 (switch port change)? Or is it reasonable to limit it to trap > 128? Is > trap 128 likely to be followed by traps 64 and 65? [Todd Rimmer] Our SA replica only needed to use 64 and 65. We found that switch port change did not provide enough information. GID in/GID out tell you the GID which has changed. This allows the replica to begin adjusting its replica and making queries about that specific GID. > > > In the silverstorm stack we created an API for a client to subscribe to > > a notice. It allowed the client to specify: trap number, local HCA port > > subscription was applicable to (in case multi-port HCAs on different > > fabrics) and information for a callback to the client (client context > > void*, function). The callback provided the client context void*, the > > actual NOTICE from the SA and which HCA port it arrived on. > > This sounds like a simple enough interface. So, you tracked references on > only > the trap numbers then? Yes. It reference counted by trap/notice number and registered with the SA only on the transition from 0->1 reference count and deregistered with the SA on the 1->0 reference count transition. We left any LID filtering up to the client. In our uses to date the SA replica was interested in all LIDs. IPoIB filters itself based on the MC Gid such that it ignores non-IPoIB GIDs. > > > The API in the stack dealt with all the issues of remaining subscribed > > (SA reregistraton, port disconnected/reconnected, etc) so the client > > merely subscribed, got notice callbacks and later unsubscribed. In this > > style API any LID based filtering would be done in the client itself. > > This makes sense. > Glad to be of assistance. Todd Rimmer From mst at mellanox.co.il Tue Oct 17 06:45:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 15:45:50 +0200 Subject: [openib-general] Tools for development In-Reply-To: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> Message-ID: <20061017134550.GE20690@mellanox.co.il> Quoting r. Jeff Squyres : > It seems like trac can integrate with both SVN and git and would also > provide us with integrated wiki capabilities. One feature that bugzilla has (and that seems to be disabled in openib bugzilla :() is mail integration, where I can Cc bugzilla and mail contents will get attached to bug report. I was hoping that new server will have this capability. Does trac have this? -- MST From mst at mellanox.co.il Tue Oct 17 06:51:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 15:51:36 +0200 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: References: Message-ID: <20061017135136.GF20690@mellanox.co.il> Quoting Michael Newton : > Also, these counts do not wrap: they peg at all 1s. > At infiniband speeds, these counts can peg out very quickly indeed, > to the point they can really only be of use if they can be reset each time > there read. I mostly use them to verify network health. On typical networks I use these don't normally overflow, so I can read them without need for extra tools and this ability is quite useful for me. Reports from real-life usage in field that state otherwise might be a good reason to re-open this discussion. -- MST From halr at voltaire.com Tue Oct 17 06:49:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 09:49:09 -0400 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: References: Message-ID: <1161092942.32093.399314.camel@hal.voltaire.com> On Tue, 2006-10-17 at 09:43, Rimmer, Todd wrote: > > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > > Sent: Monday, October 16, 2006 6:57 PM > > To: Rimmer, Todd > > Cc: Matt Leininger; openib > > Subject: Re: [openib-general] [RFC] Notice/InformInfo event reporting > > > > Rimmer, Todd wrote: > > > In a functioning fabric, events will be rare. However its when you > > > first boot the fabric, reboot the SM or other similar "start up" > actions > > > that things get real busy. > > > > Hmm... I need to think more about how to handle the start up scenario. > > > > > In general I have found that only a few clients will use events such > as: > > > IPoIb to manage multicast subscriptions (join as send only for new > > > groups) and SA caches/replicas to keep their cache/replica > synchronized. > > > > Can you give more details about how ipoib would use the event service? > Technically, to meet the Link Layer semantics which TCP/IP expects, > IPoIB should join as a senderonly for very IPoIB multicast group which > exists. > > This is because TCP/IP in Linux expects to be able to send to any > multicast group without first informing the link layer. It expects to > inform the link layer only when it wants to receive from a given > multicast group. This is a side effect of how most Ethernet NICs work > (multicast filtering is only implemented on receive side of Ethernet > NIC) and how Ethernet LANs work (a single subnet will forward multicast > sends to all nodes via the spanning tree). This is not how OpenIB IPoIB behaves now. Perhaps that can be changed going forward once the event subscription mechanism is implemented and tied into to IPoIB. > Hence IPoIB should subscribe for the multicast GID created notice and > use it to manage its sender only status. It should also register for > the multicast GID deleted notice and use it to delete its sender only > status. (notice that in IBTA 1.2 15.2.5.17.1 SenderOnly status does not > count toward group create/delete reference counts, Same for NonMembers too (for IPmc routers). -- Hal > hence the group can > be deleted while there are sender only members, hence the interest in > GID out of service). > > > > > SA caches seem like they would register for traps... 64 (GID in), 65 > (GID > > out), > > and 128 (switch port change)? Or is it reasonable to limit it to trap > > 128? Is > > trap 128 likely to be followed by traps 64 and 65? > [Todd Rimmer] Our SA replica only needed to use 64 and 65. We found > that switch port change did not provide enough information. GID in/GID > out tell you the GID which has changed. This allows the replica to > begin adjusting its replica and making queries about that specific GID. > > > > > > In the silverstorm stack we created an API for a client to subscribe > to > > > a notice. It allowed the client to specify: trap number, local HCA > port > > > subscription was applicable to (in case multi-port HCAs on different > > > fabrics) and information for a callback to the client (client > context > > > void*, function). The callback provided the client context void*, > the > > > actual NOTICE from the SA and which HCA port it arrived on. > > > > This sounds like a simple enough interface. So, you tracked > references on > > only > > the trap numbers then? > Yes. It reference counted by trap/notice number and registered with the > SA only on the transition from 0->1 reference count and deregistered > with the SA on the 1->0 reference count transition. We left any LID > filtering up to the client. In our uses to date the SA replica was > interested in all LIDs. IPoIB filters itself based on the MC Gid such > that it ignores non-IPoIB GIDs. > > > > > The API in the stack dealt with all the issues of remaining > subscribed > > > (SA reregistraton, port disconnected/reconnected, etc) so the client > > > merely subscribed, got notice callbacks and later unsubscribed. In > this > > > style API any LID based filtering would be done in the client > itself. > > > > This makes sense. > > > Glad to be of assistance. > > Todd Rimmer > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From trimmer at silverstorm.com Tue Oct 17 06:55:48 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 17 Oct 2006 09:55:48 -0400 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: Message-ID: > From: Michael Newton > Sent: Tuesday, October 17, 2006 3:02 AM > To: openib-general at openib.org > Subject: [openib-general] sysfs exposure of port counters useless? > > > These are 32 bit counters. The rcv/xmit_data counters count 32-bit > blocks. Also, these counts do not wrap: they peg at all 1s. > At infiniband speeds, these counts can peg out very quickly indeed, > to the point they can really only be of use if they can be reset each time > there read. Now if anyone who wants to use them has to go the CLI to reset > them, and theres little point in reading them without reset, why would > anyone read them via sysfs? so why have them? > We have found that while your comment is true for the data movement counters, the error counters should not peg quickly, hence it is valid to read them without resetting. However it is also useful to have an ability to reset them. Of course if there are other CLI commands which do this easily, the sysfs info is of less value. Todd Rimmer From mst at mellanox.co.il Tue Oct 17 06:57:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 15:57:58 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <20061017081930.GO6145@minantech.com> References: <45348468.10807@voltaire.com> <20061017073214.GM6145@minantech.com> <45348BCC.3020303@dev.mellanox.co.il> <20061017081930.GO6145@minantech.com> Message-ID: <20061017135758.GG20690@mellanox.co.il> Quoting r. glebn at voltaire.com : > > From the OFED release notes: > > > > 3. Fork support from kernel 2.6.12 and above is available provided that > > applications do not use threads. Only system() or fork() and immediate > > exec() are supported. > > > system() works because parent calls wait(). fork() and immediate exec() > may very well fail. I propose to fix the release notes. Correct, makes sense. -- MST From changquing.tang at hp.com Tue Oct 17 07:02:19 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Tue, 17 Oct 2006 09:02:19 -0500 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <45348BCC.3020303@dev.mellanox.co.il> Message-ID: >>> >> What kernel are you testing? system() should work (in non threaded >> apps at >> least) with modern kernel. >> >> -- >> Gleb. >> >> _______________________________________________ >> > > > From the OFED release notes: > >3. Fork support from kernel 2.6.12 and above is available provided that > applications do not use threads. Only system() or fork() >and immediate > exec() are supported. Thanks, I still use 2.6.9-34, Or Gerlitz told me that fork() support is only in libibverbs1.1 which is not released yet. Both OFED 1.0 and 1.1 use libibverbs1.0, is it still true ? --CQ > >Tziporet > From halr at voltaire.com Tue Oct 17 06:59:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 09:59:49 -0400 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: References: Message-ID: <1161093586.32093.399737.camel@hal.voltaire.com> On Tue, 2006-10-17 at 09:55, Rimmer, Todd wrote: > > From: Michael Newton > > Sent: Tuesday, October 17, 2006 3:02 AM > > To: openib-general at openib.org > > Subject: [openib-general] sysfs exposure of port counters useless? > > > > > > These are 32 bit counters. The rcv/xmit_data counters count 32-bit > > blocks. Also, these counts do not wrap: they peg at all 1s. > > At infiniband speeds, these counts can peg out very quickly indeed, > > to the point they can really only be of use if they can be reset each > time > > there read. Now if anyone who wants to use them has to go the CLI to > reset > > them, and theres little point in reading them without reset, why would > > anyone read them via sysfs? so why have them? > > > > We have found that while your comment is true for the data movement > counters, the error counters should not peg quickly, hence it is valid > to read them without resetting. However it is also useful to have an > ability to reset them. Of course if there are other CLI commands which > do this easily, the sysfs info is of less value. There are diag tools for this. -- Hal > > Todd Rimmer > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tziporet at mellanox.co.il Tue Oct 17 07:03:27 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 17 Oct 2006 16:03:27 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACEC7@mtlexch01.mtl.com> > system() works because parent calls wait(). fork() and immediate exec() > may very well fail. I propose to fix the release notes. Hi Gleb, Can you send me the correct description for the RN. Thanks, Tziporet From glebn at voltaire.com Tue Oct 17 07:06:20 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 17 Oct 2006 16:06:20 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: References: <45348BCC.3020303@dev.mellanox.co.il> Message-ID: <20061017140620.GS6145@minantech.com> On Tue, Oct 17, 2006 at 09:02:19AM -0500, Tang, Changqing wrote: > >>> > >> What kernel are you testing? system() should work (in non threaded > >> apps at > >> least) with modern kernel. > >> > >> -- > >> Gleb. > >> > >> _______________________________________________ > >> > > > > > > From the OFED release notes: > > > >3. Fork support from kernel 2.6.12 and above is available provided that > > applications do not use threads. Only system() or fork() > >and immediate > > exec() are supported. > > Thanks, I still use 2.6.9-34, Or Gerlitz told me that fork() support is > only in libibverbs1.1 which is not released yet. Both OFED 1.0 and 1.1 > use libibverbs1.0, is it still true ? > There will be no fork support for kernel 2.6.9 in libibverbs1.1. 2.6.9 lacks interface libibverbs needs. The interface was included in kernel 2.6.16 or .17 -- Gleb. From swise at opengridcomputing.com Tue Oct 17 07:08:37 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 17 Oct 2006 09:08:37 -0500 Subject: [openib-general] uDAPL problem In-Reply-To: <4534013A.5000900@cs.rutgers.edu> References: <4533FFB7.2020506@cs.rutgers.edu> <4534013A.5000900@cs.rutgers.edu> Message-ID: <1161094117.30946.0.camel@stevo-desktop> On Mon, 2006-10-16 at 18:01 -0400, Steve Smaldone wrote: > Hi, > > Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm > device appears. However, it now fails with the following: > > $ ./dapltest -T S -D IB1 > ... > DAT Registry: dat_ia_openv (IB1,1:2,0) called > DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so > DAT Registry: dat_registry_add_provider (IB1,1:2,0) > libibverbs: Warning: no userspace device-specific driver found for uverbs0 > driver search path: /usr/local/lib/infiniband > libibverbs: Warning: no userspace device-specific driver found for uverbs0 > driver search path: /usr/local/lib/infiniband Seems like it cannot find the provider library. Is there a libmthca.* in /usr/local/lib/infiniband? From glebn at voltaire.com Tue Oct 17 07:24:38 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 17 Oct 2006 16:24:38 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACEC7@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C92ACEC7@mtlexch01.mtl.com> Message-ID: <20061017142438.GT6145@minantech.com> On Tue, Oct 17, 2006 at 04:03:27PM +0200, Tziporet Koren wrote: > > system() works because parent calls wait(). fork() and immediate > exec() > > may very well fail. I propose to fix the release notes. > > Hi Gleb, > Can you send me the correct description for the RN. > 3. Fork support from kernel 2.6.12 and above is available provided that applications do not use threads. The fork() is supported as long as parent process does not run before child exits or calls exec(). The former can be achieved by calling wait(childpid) the later can be achieved by application specific means. Posix system() call is supported. Something along those lines. -- Gleb. From liakhovitch at mail.ru Tue Oct 17 07:47:26 2006 From: liakhovitch at mail.ru (Mirochnick Natalia) Date: Tue, 17 Oct 2006 18:47:26 +0400 Subject: [openib-general] srp trouble on RHEL4 U4 Message-ID: <015e01c6f1fb$2a3fafe0$05cab4d5@ld.yandex.ru> Hello, I'm trying to setup SRP connection (SRP in OFED 1.0). IB card is Silverstorm 7000. ib_srp module is loaded, but after attempt to to create an SRP device (as it was described in manual srp_release_notes.txt) the error appears in /var/log/messages: kernel: REJ reason 0x0 What's wrong? -- Thanks in advance, Mirochnick Natalia From rdreier at cisco.com Tue Oct 17 07:49:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 07:49:39 -0700 Subject: [openib-general] Tools for development In-Reply-To: <20061017134403.GD20690@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 15:44:03 +0200") References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <20061017134403.GD20690@mellanox.co.il> Message-ID: Michael> The tool versions installed on openib are ancient. Can Michael> site admins please install latest svn and git versions Michael> from source? What distro is on the new openfabrics.org server? If it's something like Fedora or Ubuntu, then it would probably be better to install the distros versions of svn and git, so that keeping up with security updates is easiser. - R. From swise at opengridcomputing.com Tue Oct 17 07:54:43 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 17 Oct 2006 09:54:43 -0500 Subject: [openib-general] Tools for development In-Reply-To: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> Message-ID: <1161096883.30946.2.camel@stevo-desktop> On Tue, 2006-10-17 at 09:17 -0400, Jeff Squyres wrote: > Per the teleconference last week, I'd like to survey the developers > about the tools that should be installed on the new OFA server (is > there a plan to migrate there yet?). > > As I understand it (please correct me if I get this wrong): > > - The community has decided to stay with git for kernel level > development > --> Was there a plan for any consolidation of the various git > repositories?) > - The community decided to stay with svn for user space level > development At the risk of opening a can of worms, is there any reason we don't move the user stuff into its own git tree? This would get rid of svn altogether... From mst at mellanox.co.il Tue Oct 17 07:56:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 16:56:04 +0200 Subject: [openib-general] Tools for development In-Reply-To: References: Message-ID: <20061017145604.GK20690@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] Tools for development > > Michael> The tool versions installed on openib are ancient. Can > Michael> site admins please install latest svn and git versions > Michael> from source? > > What distro is on the new openfabrics.org server? If it's something > like Fedora or Ubuntu, then it would probably be better to install the > distros versions of svn and git, so that keeping up with security > updates is easiser. I think it's Ubuntu, and I think that's was done for now. But I think while generally using distro-supplied packages is the thing to do, for svn/git it makes sense to get the latest and grates and do the updates manually since they are the main services we get from openib.org - so getting more features/speed from there really helps alot. No? -- MST From rdreier at cisco.com Tue Oct 17 07:55:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 07:55:22 -0700 Subject: [openib-general] Tools for development In-Reply-To: <20061017145604.GK20690@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 16:56:04 +0200") References: <20061017145604.GK20690@mellanox.co.il> Message-ID: Michael> But I think while generally using distro-supplied Michael> packages is the thing to do, for svn/git it makes sense Michael> to get the latest and grates and do the updates manually Michael> since they are the main services we get from openib.org - Michael> so getting more features/speed from there really helps Michael> alot. Based on past experience I would expect that whatever we install now will not be updated for quite a while, unless it's done automatically with something like "aptitude dist-upgrade". - R. From rdreier at cisco.com Tue Oct 17 08:01:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 08:01:58 -0700 Subject: [openib-general] Tools for development In-Reply-To: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> (Jeff Squyres's message of "Tue, 17 Oct 2006 09:17:27 -0400") References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> Message-ID: > --> Was there a plan for any consolidation of the various git > repositories?) I don't see any reason to consolidate -- the whole point of git is that it makes distributed development easier. Being able to have a private tree that I can screw up and rebuild whenever I need to is kind of the point. - R. From mst at mellanox.co.il Tue Oct 17 08:04:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 17:04:43 +0200 Subject: [openib-general] Tools for development In-Reply-To: <1161096883.30946.2.camel@stevo-desktop> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <1161096883.30946.2.camel@stevo-desktop> Message-ID: <20061017150442.GA22531@mellanox.co.il> Quoting r. Steve Wise : > At the risk of opening a can of worms, is there any reason we don't move > the user stuff into its own git tree? This would get rid of svn > altogether... If we do, that should probably be multiple git trees - verbs, management, tests are all more or less independent and developed mostly by different people. -- MST From mst at mellanox.co.il Tue Oct 17 08:09:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 17:09:58 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161097275.2917.407.camel@fc6.xsintricity.com> References: <1161097275.2917.407.camel@fc6.xsintricity.com> Message-ID: <20061017150957.GB22531@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: RHEL5 and OFED ... > > On Tue, 2006-10-17 at 15:35 +0200, Michael S. Tsirkin wrote: > > Quoting r. Doug Ledford : > > > Subject: Re: RHEL5 and OFED ... > > > > > > On Sat, 2006-10-14 at 22:14 +0200, Michael S. Tsirkin wrote: > > > > Quoting r. Doug Ledford : > > > > > Sorry. RHEL5 Beta1 has been out for a while, but OFED 1.1 still isn't > > > > > done yet. Obviously, I wasn't able to get something in RHEL5 that > > > > > didn't even exist prior to freeze. > > > > > > > > Would it be possible to include patches backporting fixes in infiniband kernel > > > > components from 2.6.18/OFED 1.1 to modules that already ship with RHEL5? > > > > > > Maybe. It would depend on the patch. Of course, keep in mind that > > > RHEL5 Beta1 *has* a 2.6.18 kernel. > > > > I looked here > > ftp://ftp.redhat.com/pub/redhat/linux/beta/RHEL5-Beta1/server/source/ > > and it seems to be 2.6.17 based. > > Is something wrong? > > Yeah, this is the rolling updates thing I was telling you about. The > Beta1 kernel was 2.6.17+several git repos and patches. We've since > updated to 2.6.18, but that was released as an update to the Beta1 isos > and trees via RHN. So, I don't think you'll see the kernel unless you > either 1) use up2date to refresh the beta system Will that get me the sources too? > or 2) download later > iso images and look at the kernel present. The current kernel version > is 2.6.18-1.2717.el5. So, I'd like to help, but how can one get the updated kernel source? Are the iso's with updated sources available somewhere? -- MST From smaldone at cs.rutgers.edu Tue Oct 17 08:06:06 2006 From: smaldone at cs.rutgers.edu (Stephen Smaldone) Date: Tue, 17 Oct 2006 11:06:06 -0400 Subject: [openib-general] uDAPL problem In-Reply-To: <1161094117.30946.0.camel@stevo-desktop> References: <4533FFB7.2020506@cs.rutgers.edu> <4534013A.5000900@cs.rutgers.edu> <1161094117.30946.0.camel@stevo-desktop> Message-ID: <4534F15E.3000809@cs.rutgers.edu> Steve Wise wrote: > On Mon, 2006-10-16 at 18:01 -0400, Steve Smaldone wrote: > >> Hi, >> >> Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >> device appears. However, it now fails with the following: >> >> $ ./dapltest -T S -D IB1 >> ... >> DAT Registry: dat_ia_openv (IB1,1:2,0) called >> DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >> DAT Registry: dat_registry_add_provider (IB1,1:2,0) >> libibverbs: Warning: no userspace device-specific driver found for uverbs0 >> driver search path: /usr/local/lib/infiniband >> libibverbs: Warning: no userspace device-specific driver found for uverbs0 >> driver search path: /usr/local/lib/infiniband >> > > Seems like it cannot find the provider library. > > Is there a libmthca.* in /usr/local/lib/infiniband? > > > > That was the problem. There was no mthca.so. Now it works. Thanks for the help! Steve Smaldone From rdreier at cisco.com Tue Oct 17 08:17:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 08:17:24 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: ( akepner@sgi.com's message of "Sun, 15 Oct 2006 02:02:44 -0700 (PDT)") References: Message-ID: OK, here's what I actually put in my tree. Can you eyeball this and maybe give it a quick test? If it looks good to you, I'll send it on to the stable team for 2.6.18.x. - R commit 1f5c23e2c10d642a23aa3ebb449670a5184b6aab Author: Arthur Kepner Date: Mon Oct 16 20:22:35 2006 -0700 IB/mthca: Use mmiowb after doorbell ring We discovered a problem when running IPoIB applications on multiple CPUs on an Altix system. Many messages such as: ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq) appear in syslog, and the driver wedges up. Apparently this is because writes to the doorbells from different CPUs reach the device out of order. The following patch adds mmiowb() calls after doorbell rings to ensure the doorbell writes are ordered. Signed-off-by: Arthur Kepner Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index e393681..149b369 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -39,6 +39,8 @@ #include #include +#include + #include #include "mthca_dev.h" @@ -210,6 +212,11 @@ static inline void update_cons_index(str mthca_write64(doorbell, dev->kar + MTHCA_CQ_DOORBELL, MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + /* + * Make sure doorbells don't leak out of CQ spinlock + * and reach the HCA out of order: + */ + mmiowb(); } } diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 5e5c58b..6a7822e 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -39,6 +39,8 @@ #include #include #include +#include + #include #include #include @@ -1732,6 +1734,11 @@ out: mthca_write64(doorbell, dev->kar + MTHCA_SEND_DOORBELL, MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + /* + * Make sure doorbells don't leak out of SQ spinlock + * and reach the HCA out of order: + */ + mmiowb(); } qp->sq.next_ind = ind; @@ -1851,6 +1858,12 @@ out: qp->rq.next_ind = ind; qp->rq.head += nreq; + /* + * Make sure doorbells don't leak out of RQ spinlock and reach + * the HCA out of order: + */ + mmiowb(); + spin_unlock_irqrestore(&qp->rq.lock, flags); return err; } @@ -2112,6 +2125,12 @@ out: MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); } + /* + * Make sure doorbells don't leak out of SQ spinlock and reach + * the HCA out of order: + */ + mmiowb(); + spin_unlock_irqrestore(&qp->sq.lock, flags); return err; } diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 92a72f5..f5d7677 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -35,6 +35,8 @@ #include #include +#include + #include "mthca_dev.h" #include "mthca_cmd.h" #include "mthca_memfree.h" @@ -595,6 +597,12 @@ int mthca_tavor_post_srq_recv(struct ib_ MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); } + /* + * Make sure doorbells don't leak out of SRQ spinlock and + * reach the HCA out of order: + */ + mmiowb(); + spin_unlock_irqrestore(&srq->lock, flags); return err; } From mst at mellanox.co.il Tue Oct 17 08:26:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 17:26:36 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017152636.GA23169@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > OK, here's what I actually put in my tree. Can you eyeball this and > maybe give it a quick test? If it looks good to you, I'll send it on > to the stable team for 2.6.18.x. BTW, something like this will be needed for userspace too? -- MST From xma at us.ibm.com Tue Oct 17 08:30:04 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 17 Oct 2006 08:30:04 -0700 Subject: [openib-general] ethtool support for ipoib In-Reply-To: <20061017061203.GE17404@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 10/16/2006 11:12:03 PM: > Quoting r. Shirley Ma : > > /* can be added later once ipoib support sg > > .get_sg = ethtool_op_get_sg, > > .set_sg = ethtool_op_set_sg, > > */ > > The difficulty here is that sg currently requires checksum offloading in > netdevice. > > -- > MST I read the discussion in net-dev. Since IB packet has its own CRC (ICRC, VCRC). Is it a good idea to enable checksum unnecessary in a pure IB Fabrics for large MTU 64K. It requires some negotiation. Does your prototype implementation for large MTU requires both ends agreement? Practically it can be implemented, but I don't know what RFCs have defined. -------------- next part -------------- An HTML attachment was scrubbed... URL: From changquing.tang at hp.com Tue Oct 17 08:34:23 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Tue, 17 Oct 2006 10:34:23 -0500 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <20061017142438.GT6145@minantech.com> Message-ID: >3. Fork support from kernel 2.6.12 and above is available >provided that applications do not use threads. The fork() is >supported as long as parent process does not run before child >exits or calls exec(). After fork(), in child, before exec(), can we call printf(), putenv(), or even re-direct stdout/stderr ? --CQ >The former can be achieved by calling wait(childpid) the later >can be achieved by application specific means. Posix system() >call is supported. > >Something along those lines. > >-- > Gleb. > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general > > From mlleinin at hpcn.ca.sandia.gov Tue Oct 17 08:37:05 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Tue, 17 Oct 2006 08:37:05 -0700 Subject: [openib-general] Tools for development In-Reply-To: References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <20061017134403.GD20690@mellanox.co.il> Message-ID: <1161099425.24554.48.camel@localhost> On Tue, 2006-10-17 at 07:49 -0700, Roland Dreier wrote: > Michael> The tool versions installed on openib are ancient. Can > Michael> site admins please install latest svn and git versions > Michael> from source? > > What distro is on the new openfabrics.org server? Ubuntu. > If it's something > like Fedora or Ubuntu, then it would probably be better to install the > distros versions of svn and git, so that keeping up with security > updates is easiser. Developers had requested git 1.4, but Ubuntu had an older version. We went ahead and installed git from source. I'd prefer to stick to Ubuntu packages if possible. - Matt From glebn at voltaire.com Tue Oct 17 09:07:29 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 17 Oct 2006 18:07:29 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: References: <20061017142438.GT6145@minantech.com> Message-ID: <20061017160729.GX6145@minantech.com> On Tue, Oct 17, 2006 at 10:34:23AM -0500, Tang, Changqing wrote: > > >3. Fork support from kernel 2.6.12 and above is available > >provided that applications do not use threads. The fork() is > >supported as long as parent process does not run before child > >exits or calls exec(). > > After fork(), in child, before exec(), can we call printf(), putenv(), > or even re-direct stdout/stderr ? > Child can do whatever he wants (except using verbs), but parent can't use verbs until child exits() or execs(). -- Gleb. From kliteyn at dev.mellanox.co.il Tue Oct 17 09:07:03 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 17 Oct 2006 18:07:03 +0200 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c Message-ID: <4534FFA7.3000402@dev.mellanox.co.il> Hi Hal Fixing more things in the multicast test flow. Still have things to do in case when multicast group removal fails, and have to add some cleanup (as we've discussed previously). -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest/osmt_multicast.c =================================================================== --- osmtest/osmt_multicast.c (revision 9856) +++ osmtest/osmt_multicast.c (working copy) @@ -418,6 +418,12 @@ osmt_init_mc_query_rec(IN osmtest_t * c * o15.0.1.16: * - Try GetTable with PortGUID wildcarded and get back some groups. ***********************************************************************/ + +/* The following macro can be used only within the osmt_run_mcast_flow() function */ +#define IS_IPOIB_MGID(p_mgid) \ + ( !memcmp(&osm_ipoib_good_mgid , (p_mgid) , sizeof(osm_ipoib_good_mgid)) || \ + !memcmp(&osm_ts_ipoib_good_mgid ,(p_mgid) , sizeof(osm_ts_ipoib_good_mgid)) ) + ib_api_status_t osmt_run_mcast_flow( IN osmtest_t * const p_osmt ) { ib_api_status_t status; @@ -433,13 +439,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons ib_net16_t max_mlid = cl_hton16(0xFFFE),tmp_mlid; boolean_t ReachedMlidLimit = FALSE; int start_cnt = 0, cnt, middle_cnt = 0, end_cnt = 0; - int IPoIBIsFound = 0, mcg_outside_test_cnt = 0, fail_to_delete_mcg = 0; + int start_ipoib_cnt = 0, end_ipoib_cnt = 0; + int mcg_outside_test_cnt = 0, fail_to_delete_mcg = 0; osmtest_req_context_t context; ib_node_record_t *p_rec; uint32_t num_recs = 0, i; uint8_t mtu_phys = 0, rate_phys = 0; cl_map_t test_created_mlids; /* List of all mlids created in this test */ ib_member_rec_t* p_recvd_rec; + boolean_t got_error = FALSE; static ib_gid_t good_mgid = { { @@ -538,13 +546,19 @@ osmt_run_mcast_flow( IN osmtest_t * cons while( p_mgrp != (osmtest_mgrp_t*)cl_qmap_end( p_mgrp_mlid_tbl ) ) { /* search for ipoib mgid */ - if (!memcmp(&osm_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ipoib_good_mgid)) || - !memcmp(&osm_ts_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ts_ipoib_good_mgid))) + if (IS_IPOIB_MGID(&p_mgrp->mcmember_rec.mgid)) { - IPoIBIsFound = 1; + start_ipoib_cnt++; } else + { + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Non-IPoIB MC Groups exist: mgid=0x%016" PRIx64 ":0x%016" PRIx64 "\n", + cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.prefix), + cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id) ); mcg_outside_test_cnt++; + } p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); } @@ -553,7 +567,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "osmt_run_mcast_flow: " "Found %d non-IPoIB MC Groups\n", mcg_outside_test_cnt); - if (IPoIBIsFound) + if (start_ipoib_cnt) { /* o15-0.2.4 - Check a join request to already created MCG */ osm_log( &p_osmt->log, OSM_LOG_INFO, @@ -576,6 +590,9 @@ osmt_run_mcast_flow( IN osmtest_t * cons osm_log(&p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " + "Joining to an existing IPoIB multicast group\n"); + osm_log(&p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " "Sent Join request with :\n\t\tport_gid=0x%016"PRIx64 ":0x%016" PRIx64 ", mgid=0x%016" PRIx64 ":0x%016" PRIx64 "\n\t\tjoin state= 0x%x, response is : %s\n", @@ -585,6 +602,14 @@ osmt_run_mcast_flow( IN osmtest_t * cons cl_ntoh64(mc_req_rec.mgid.unicast.interface_id), (mc_req_rec.scope_state & 0x0F), ib_get_err_str(status)); + if (status != IB_SUCCESS) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow : ERR 02B3: " + "Failed joining existing IPoIB MCGroup - got %s\n", + ib_get_err_str(status)); + goto Exit; + } /* Check MTU & Rate Value and resend with SA suggested values */ p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); @@ -1473,7 +1498,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 02A5: " - "Failed to create MCG for MGID=0 with higher than minimum RATE\n", + "Failed to create MCG for MGID=0 with higher than minimum RATE - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1518,7 +1543,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0211: " - "Failed to create MCG for MGID=0 with less than highest RATE\n", + "Failed to create MCG for MGID=0 with less than highest RATE - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1562,7 +1587,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0238: " - "Failed to create MCG for MGID=0 with less than highest MTU\n", + "Failed to create MCG for MGID=0 with less than highest MTU - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1605,7 +1630,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0239: " - "Failed to create MCG for MGID=0 with higher than lowest MTU\n", + "Failed to create MCG for MGID=0 with higher than lowest MTU - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1659,7 +1684,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0240: " - "Failed to create MCG for MGID=0 with exact MTU & RATE\n", + "Failed to create MCG for MGID=0 with exact MTU & RATE - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1708,7 +1733,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0241: " - "Failed to create MCG for MGID=0 with exact RATE\n", + "Failed to create MCG for MGID=0 with exact RATE - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1757,7 +1782,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0242: " - "Failed to create MCG for MGID=0 with exact MTU\n", + "Failed to create MCG for MGID=0 with exact MTU - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1840,7 +1865,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0210: " - "Failed to create MCG for MGID=0\n", + "Failed to create MCG for MGID=0 - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1919,11 +1944,11 @@ osmt_run_mcast_flow( IN osmtest_t * cons osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0211: " "Failed to create MCG for MGID=0x%016" PRIx64 " : " - "0x%016" PRIx64 " (o15.0.1.6)...\n", - ib_get_err_str( status ), - ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad)), + "0x%016" PRIx64 " (o15.0.1.6) - got %s/%s\n", cl_ntoh64(good_mgid.unicast.prefix), - cl_ntoh64(good_mgid.unicast.interface_id)); + cl_ntoh64(good_mgid.unicast.interface_id), + ib_get_err_str( status ), + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad)) ); goto Exit; } @@ -1979,7 +2004,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0213: " - "Failed to recognize MGID error for MGID=0xFA......\n", + "Failed to recognize MGID error for MGID=0xFA - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -1993,13 +2018,12 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking BAD MGID=0xFF12A01B..... with link-local scope (o15.0.1.6)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid.raw[0] = 0xFF; mc_req_rec.mgid.raw[3] = 0x1B; comp_mask = comp_mask | IB_MCR_COMPMASK_SCOPE; mc_req_rec.scope_state = mc_req_rec.scope_state & 0x2F; /* local scope */ - + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -2026,14 +2050,12 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking BAD MGID PREFIX=0xEF... (o15.0.1.6)...\n" ); - osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); - - mc_req_rec.mgid = good_mgid; mc_req_rec.mgid.raw[0] = 0xEF; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, comp_mask, @@ -2045,7 +2067,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0215: " - "Failed to recognize MGID PREFIX error for MGID=0xEF....\n", + "Failed to recognize MGID PREFIX error for MGID=0xEF - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2075,11 +2097,11 @@ osmt_run_mcast_flow( IN osmtest_t * cons osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0216: " "Failed to create MCG for MGID=0x%016" PRIx64 " : " - "0x%016" PRIx64 "\n", - ib_get_err_str( status ), - ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ), + "0x%016" PRIx64 " - got %s/%s\n", cl_ntoh64(good_mgid.unicast.prefix), - cl_ntoh64(good_mgid.unicast.interface_id)); + cl_ntoh64(good_mgid.unicast.interface_id), + ib_get_err_str( status ), + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) )); goto Exit; } @@ -2121,7 +2143,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0217: " - "Failed to recognize create with invalid flags value 0x2\n", + "Failed to recognize create with invalid flags value 0x2 - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2146,7 +2168,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0218: " - "Failed to create MCG for MGID=0xFF02:0:0:0:0:0:0:1\n", + "Failed to create MCG for MGID=0xFF02:0:0:0:0:0:0:1 - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2180,7 +2202,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); mc_req_rec.mgid = good_mgid; mc_req_rec.mgid.raw[12] = 0xFF; - mc_req_rec.scope_state = 0x22; /* link-local scope */ + mc_req_rec.scope_state = 0x22; /* link-local scope, non-member state */ status = osmt_send_mcast_request( p_osmt, 1, &mc_req_rec, @@ -2193,7 +2215,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0219: " - "Failed to recognize create with JoinState != FullMember\n", + "Failed to recognize create with JoinState != FullMember - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2219,7 +2241,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0220: " - "Failed to create MCG with valid join state 0x3\n", + "Failed to create MCG with valid join state 0x3 - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2263,7 +2285,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0221: " - "Failed to recognize create with JoinState != FullMember\n", + "Failed to recognize create with JoinState != FullMember - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2274,7 +2296,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* Lets try another valid join scope state */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " - "Checking new MGID with valid join state (o15.0.1.9)...\n" + "Checking new MGID creation with valid join state (o15.0.1.9)...\n" ); mc_req_rec.mgid = good_mgid; @@ -2291,7 +2313,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 0222: " - "Failed to create MCG with valid join state 0xF\n", + "Failed to create MCG with valid join state 0xF - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2358,7 +2380,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 02CD: " - "Failed to update existing MGID\n", + "Failed to update existing MGID - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2426,7 +2448,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 02C0: " - "Failed to update existing MCG\n", + "Failed to update existing MCG - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2459,7 +2481,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 02C2: " - "Failed to update existing MGID\n", + "Failed to update existing MGID - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2493,7 +2515,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 02C4: " - "Failed to update existing MGID\n", + "Failed to update existing MGID - got %s/%s\n", ib_get_err_str( status ), ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ) ); @@ -2848,11 +2870,11 @@ osmt_run_mcast_flow( IN osmtest_t * cons osm_log( &p_osmt->log, OSM_LOG_ERROR, "osmt_run_mcast_flow: ERR 02BE: " "Failed to create MCG for 0x%016" PRIx64 " : " - "0x%016" PRIx64 ".\n", - ib_get_err_str( status ), - ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) ), + "0x%016" PRIx64 " - got %s/%s\n", cl_ntoh64(good_mgid.unicast.prefix), - cl_ntoh64(good_mgid.unicast.interface_id)); + cl_ntoh64(good_mgid.unicast.interface_id), + ib_get_err_str( status ), + ib_get_mad_status_str( (ib_mad_t*)(&res_sa_mad) )); goto Exit; } @@ -3028,11 +3050,12 @@ osmt_run_mcast_flow( IN osmtest_t * cons { osm_log( &p_osmt->log, OSM_LOG_VERBOSE, "osmt_run_mcast_flow: " - "current port_guid = 0x%" PRIx64 "\n", + "remote port_guid = 0x%" PRIx64 "\n", cl_ntoh64(p_rec->node_info.port_guid)); remote_port_guid = p_rec->node_info.port_guid; i = num_recs; + break; } } @@ -3073,16 +3096,25 @@ osmt_run_mcast_flow( IN osmtest_t * cons memcpy(&proxy_mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); /* First try a bad deletion then good one */ + + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Trying deletion of remote port with local port guid\n"); + osmt_init_mc_query_rec(p_osmt, &mc_req_rec); ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); comp_mask = IB_MCR_COMPMASK_MGID | IB_MCR_COMPMASK_PORT_GID | IB_MCR_COMPMASK_JOIN_STATE; + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" ); status = osmt_send_mcast_request( p_osmt, 0, /* delete flag */ &mc_req_rec, comp_mask, &res_sa_mad ); + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" ); if (status == IB_SUCCESS) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -3096,6 +3128,11 @@ osmt_run_mcast_flow( IN osmtest_t * cons status = IB_ERROR; goto Exit; } + + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Trying deletion of remote port with the right port guid\n"); + osmt_init_mc_query_rec(p_osmt, &mc_req_rec); ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); mc_req_rec.mgid = proxy_mgid; @@ -3214,6 +3251,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* - Try GetTable with PortGUID wildcarded and get back some groups. */ status = osmt_query_mcast( p_osmt); + if (status != IB_SUCCESS) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmt_run_mcast_flow: ERR 02B1: " + "Failed to query multicast groups: %s\n", + ib_get_err_str(status)); + goto Exit; + } + cnt = cl_qmap_count(&p_osmt->exp_subn.mgrp_mlid_tbl); osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow (Before Deletion of all MCG): " @@ -3230,8 +3276,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons while( p_mgrp != (osmtest_mgrp_t*)cl_qmap_end( p_mgrp_mlid_tbl ) ) { /* Only if different from IPoIB Mgid try to delete */ - if (memcmp(&osm_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ipoib_good_mgid)) && - memcmp(&osm_ts_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ts_ipoib_good_mgid))) + if (!IS_IPOIB_MGID(&p_mgrp->mcmember_rec.mgid)) { osmt_init_mc_query_rec(p_osmt, &mc_req_rec); mc_req_rec.mgid = p_mgrp->mcmember_rec.mgid; @@ -3261,6 +3306,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons &mc_req_rec, comp_mask, &res_sa_mad ); + status = IB_SUCCESS; if (status != IB_SUCCESS) { osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -3274,6 +3320,10 @@ osmt_run_mcast_flow( IN osmtest_t * cons fail_to_delete_mcg++; } } + else + { + end_ipoib_cnt++; + } p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); } @@ -3282,8 +3332,9 @@ osmt_run_mcast_flow( IN osmtest_t * cons if (status != IB_SUCCESS) { osm_log( &p_osmt->log, OSM_LOG_ERROR, - "osmt_run_mcast_flow: ERR 2FF " - "GetTable of all records has failed!\n"); + "osmt_run_mcast_flow: ERR 02B2 " + "GetTable of all records has failed - got %s\n", + ib_get_err_str( status )); goto Exit; } @@ -3293,17 +3344,34 @@ osmt_run_mcast_flow( IN osmtest_t * cons if (p_osmt->opt.mmode == 1 || p_osmt->opt.mmode == 3) { end_cnt = cl_qmap_count(&p_osmt->exp_subn.mgrp_mlid_tbl); + osm_log( &p_osmt->log, OSM_LOG_INFO, - "osmt_run_mcast_flow (After Deletion of all MCG): " - "Number of MC Records found in SA DB is %d\n",end_cnt); - /* when we comapre num of MCG we should consider an outside source which create other MCGs */ - if ((end_cnt-fail_to_delete_mcg) != (start_cnt - mcg_outside_test_cnt)) + "osmt_run_mcast_flow: " + "Status of MC Records in SA DB during the test flow:\n" + " Beginning of test\n" + " Unrelated to the test: %d\n" + " IPoIB MC Records : %d\n" + " Total : %d\n" + " End of test\n" + " Failed to delete : %d\n" + " IPoIB MC Records : %d\n" + " Total : %d\n", + mcg_outside_test_cnt, /* Non-IPoIB that existed at the beginning */ + start_ipoib_cnt, /* IPoIB records */ + start_cnt, /* Total: IPoIB and MC Records unrelated to the test */ + fail_to_delete_mcg, /* Failed to delete at the end */ + end_ipoib_cnt, /* IPoIB records */ + end_cnt); /* Total MC Records at the end */ + + /* when we compare num of MCG we should consider an outside source which create other MCGs */ + if ((end_cnt-fail_to_delete_mcg-end_ipoib_cnt) != (start_cnt-mcg_outside_test_cnt-start_ipoib_cnt)) { osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " - "Got different number of records stored in SA DB\n\t\t" + "Got different number of non-IPoIB records stored in SA DB\n\t\t" "at Start got %d, at End got %d (IPoIB groups only)\n", - (start_cnt-mcg_outside_test_cnt),(end_cnt-fail_to_delete_mcg)); + (start_cnt-mcg_outside_test_cnt-start_ipoib_cnt), + (end_cnt-fail_to_delete_mcg-end_ipoib_cnt)); } p_mgrp_mlid_tbl = &p_osmt->exp_subn.mgrp_mlid_tbl; @@ -3327,20 +3395,27 @@ osmt_run_mcast_flow( IN osmtest_t * cons cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.prefix), cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id), mlid ); - status=IB_ERROR; - goto Exit; - } + got_error = TRUE; + + /** + * ToDo: Query all the group members of this MC Group + **/ + + } else { - osm_log( &p_osmt->log, OSM_LOG_INFO, - "osmt_run_mcast_flow: " - "Still exists MGID:0x%016" PRIx64 " : 0x%016" - PRIx64 "\n", - cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.prefix), - cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id) ); + osm_log( &p_osmt->log, OSM_LOG_INFO, + "osmt_run_mcast_flow: " + "Still exists %s MGID:0x%016" PRIx64 " : 0x%016" PRIx64 "\n", + (IS_IPOIB_MGID(&p_mgrp->mcmember_rec.mgid)) ? "IPoIB" : "non-IPoIB", + cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.prefix), + cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id) ); } p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); } + + if (got_error) + status = IB_ERROR; } Exit: OSM_LOG_EXIT( &p_osmt->log ); From sashak at voltaire.com Tue Oct 17 09:16:16 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 17 Oct 2006 18:16:16 +0200 Subject: [openib-general] Tools for development In-Reply-To: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> Message-ID: <20061017161616.GA26226@sashak.voltaire.com> On 09:17 Tue 17 Oct , Jeff Squyres wrote: > Per the teleconference last week, I'd like to survey the developers > about the tools that should be installed on the new OFA server (is > there a plan to migrate there yet?). > > As I understand it (please correct me if I get this wrong): > > - The community has decided to stay with git for kernel level > development > --> Was there a plan for any consolidation of the various git > repositories?) > - The community decided to stay with svn for user space level > development This would be nice to have automatic svn -> git mirroring for user space projects too (at least for those projects where developers will like it). Then developers will be able to choose between svn and git. Currently I use such svn -> git mirroring privately for src/userspace/management, have some scripts and of course may help with setup. Sasha From sashak at voltaire.com Tue Oct 17 09:21:46 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 17 Oct 2006 18:21:46 +0200 Subject: [openib-general] Tools for development In-Reply-To: <20061017150442.GA22531@mellanox.co.il> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <1161096883.30946.2.camel@stevo-desktop> <20061017150442.GA22531@mellanox.co.il> Message-ID: <20061017162146.GB26226@sashak.voltaire.com> On 17:04 Tue 17 Oct , Michael S. Tsirkin wrote: > Quoting r. Steve Wise : > > At the risk of opening a can of worms, is there any reason we don't move > > the user stuff into its own git tree? This would get rid of svn > > altogether... > > If we do, that should probably be multiple git trees - verbs, management, > tests are all more or less independent and developed mostly by different people. Reasonable. And generally this should not be too bad. Sasha From tziporet at dev.mellanox.co.il Tue Oct 17 09:26:05 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 17 Oct 2006 18:26:05 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: References: Message-ID: <4535041D.9060404@dev.mellanox.co.il> Tang, Changqing wrote: > Thanks, I still use 2.6.9-34, Or Gerlitz told me that fork() support is > only in libibverbs1.1 which is not released yet. Both OFED 1.0 and 1.1 > use libibverbs1.0, is it still true ? > > --CQ > > > You need to make a difference between full fork support that will be available only in libibverbs1.1 and the system /fork & exec fork support that is depend on the kernel only and available from kernel 2.6.12. See also the explanation from Gleb on this Tziporet From changquing.tang at hp.com Tue Oct 17 09:30:28 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Tue, 17 Oct 2006 11:30:28 -0500 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <4535041D.9060404@dev.mellanox.co.il> Message-ID: Thanks for the clarification. --CQ >> >You need to make a difference between full fork support that >will be available only in libibverbs1.1 and the system /fork & >exec fork support that is depend on the kernel only and >available from kernel 2.6.12. > >See also the explanation from Gleb on this > >Tziporet > From mst at mellanox.co.il Tue Oct 17 09:36:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 18:36:50 +0200 Subject: [openib-general] ethtool support for ipoib In-Reply-To: References: Message-ID: <20061017163650.GA23922@mellanox.co.il> Quoting r. Shirley Ma : > > > /* can be added later once ipoib support sg > > > .get_sg = ethtool_op_get_sg, > > > .set_sg = ethtool_op_set_sg, > > > */ > > > > The difficulty here is that sg currently requires checksum offloading in > > netdevice. > > I read the discussion in net-dev. Hmm, any suggestions? > Since IB packet has its own CRC (ICRC, VCRC). Is it a good idea to enable > checksum unnecessary in a pure IB Fabrics for large MTU 64K. It requires some > negotiation. Not sure what you are saying here. > Does your prototype implementation for large MTU requires both ends agreement? > Practically it can be implemented, but I don't know what RFCs have defined. Look up IPoIB connected mode. -- MST From mst at mellanox.co.il Tue Oct 17 09:45:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 18:45:44 +0200 Subject: [openib-general] Tools for development In-Reply-To: <1161099425.24554.48.camel@localhost> References: <1161099425.24554.48.camel@localhost> Message-ID: <20061017164544.GB23922@mellanox.co.il> Quoting r. Matt Leininger : > Developers had requested git 1.4, but Ubuntu had an older version. We > went ahead and installed git from source. I'd prefer to stick to Ubuntu > packages if possible. We have much to gain from newer versions - just look at gitweb change log. But my assumption here was that someone will keep the built from source tools updated. I don't have a problem alerting the list when new versions come out. If, as Roland suggested, we'll be stuck at this version, its better to stick with distro-supplied ones, assuming that *that* is updated in a timely fashion. So, I guess the question is how is the sytsem supported/updated? -- MST From mst at mellanox.co.il Tue Oct 17 09:48:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 18:48:22 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <20061017160729.GX6145@minantech.com> References: <20061017142438.GT6145@minantech.com> <20061017160729.GX6145@minantech.com> Message-ID: <20061017164822.GD23922@mellanox.co.il> Quoting r. glebn at voltaire.com : > Subject: Re: [openfabrics-ewg] OFED 1.1 RC7 fork() issue. > > On Tue, Oct 17, 2006 at 10:34:23AM -0500, Tang, Changqing wrote: > > > > >3. Fork support from kernel 2.6.12 and above is available > > >provided that applications do not use threads. The fork() is > > >supported as long as parent process does not run before child > > >exits or calls exec(). > > > > After fork(), in child, before exec(), can we call printf(), putenv(), > > or even re-direct stdout/stderr ? > > > Child can do whatever he wants (except using verbs), but parent can't use > verbs until child exits() or execs(). Or even write to registered pages at all. -- MST From glebn at voltaire.com Tue Oct 17 09:50:00 2006 From: glebn at voltaire.com (glebn at voltaire.com) Date: Tue, 17 Oct 2006 18:50:00 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 RC7 fork() issue. In-Reply-To: <20061017164822.GD23922@mellanox.co.il> References: <20061017142438.GT6145@minantech.com> <20061017160729.GX6145@minantech.com> <20061017164822.GD23922@mellanox.co.il> Message-ID: <20061017165000.GY6145@minantech.com> On Tue, Oct 17, 2006 at 06:48:22PM +0200, Michael S. Tsirkin wrote: > Quoting r. glebn at voltaire.com : > > Subject: Re: [openfabrics-ewg] OFED 1.1 RC7 fork() issue. > > > > On Tue, Oct 17, 2006 at 10:34:23AM -0500, Tang, Changqing wrote: > > > > > > >3. Fork support from kernel 2.6.12 and above is available > > > >provided that applications do not use threads. The fork() is > > > >supported as long as parent process does not run before child > > > >exits or calls exec(). > > > > > > After fork(), in child, before exec(), can we call printf(), putenv(), > > > or even re-direct stdout/stderr ? > > > > > Child can do whatever he wants (except using verbs), but parent can't use > > verbs until child exits() or execs(). > > Or even write to registered pages at all. > Right. Forgot that. So parent better be doing nothing, but waiting. -- Gleb. From mshefty at ichips.intel.com Tue Oct 17 09:41:53 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 09:41:53 -0700 Subject: [openib-general] OFED 1.1 release schedule In-Reply-To: <4534CE1E.1010106@dev.mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> <4533CC4A.2070709@ichips.intel.com> <4534CE1E.1010106@dev.mellanox.co.il> Message-ID: <453507D1.6060304@ichips.intel.com> Tziporet Koren wrote: > I checked it and saw that the patch is applied, but since in the patch > Sean put the cm_issue_drep as a static, thus nm does not show it. > from the patch: +static int cm_issue_drep(struct cm_port *port, cm_issue_rej is also static, but shows up. > Do you really need the symbol to be exported out of the ib_cm module, or > is it enough this way. The symbol does not need to be exported. - Sean From mshefty at ichips.intel.com Tue Oct 17 09:51:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 09:51:43 -0700 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: References: Message-ID: <45350A1F.3030602@ichips.intel.com> Krishna Kumar2 wrote: > Hmmm, OK, I will re-phrase this patch to reduce nesting. Something similar to: if (cma_any_addr...) { ret = rdma_translate_ip(..); if (ret) goto err1; mutex_lock ret = cma_acquire_dev mutex_unlock if (ret) goto err2; } should work fine. - Sean From mst at mellanox.co.il Tue Oct 17 09:59:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 18:59:46 +0200 Subject: [openib-general] OFED 1.1 release schedule In-Reply-To: <453507D1.6060304@ichips.intel.com> References: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> <4533CC4A.2070709@ichips.intel.com> <4534CE1E.1010106@dev.mellanox.co.il> <453507D1.6060304@ichips.intel.com> Message-ID: <20061017165946.GE23922@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: OFED 1.1 release schedule > > Tziporet Koren wrote: > > I checked it and saw that the patch is applied, but since in the patch > > Sean put the cm_issue_drep as a static, thus nm does not show it. > > from the patch: +static int cm_issue_drep(struct cm_port *port, > > cm_issue_rej is also static, but shows up. Could be a compiler thing: maybe cm_issue_rej is used in ore than one place? To make sure, you can try removing the static keryword and see if this appears. -- MST From mshefty at ichips.intel.com Tue Oct 17 10:03:41 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 10:03:41 -0700 Subject: [openib-general] [PATCH] If addr_handler() got error, do not set state as OK In-Reply-To: <20061017043920.4891.43166.sendpatchset@localhost.localdomain> References: <20061017043920.4891.43166.sendpatchset@localhost.localdomain> Message-ID: <45350CED.8020705@ichips.intel.com> Krishna Kumar wrote: > diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c > --- org/drivers/infiniband/core/cma.c 2006-10-10 15:45:27.000000000 +0530 > +++ new/drivers/infiniband/core/cma.c 2006-10-10 15:59:53.000000000 +0530 > @@ -1515,6 +1515,8 @@ static void addr_handler(int status, str > { > struct rdma_id_private *id_priv = context; > enum rdma_cm_event_type event; > + int did_comp_exch = 0; > + int destroy = 0; As a general comment, I really don't think that we need to be overly concerned about optimizing error handling at the expense of code readability. Can you rework this patch without adding in extra flags to indicate what has or has not been executed? Thanks, - Sean From mshefty at ichips.intel.com Tue Oct 17 10:06:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 10:06:26 -0700 Subject: [openib-general] OFED 1.1 release schedule In-Reply-To: <20061017165946.GE23922@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C92ACEAF@mtlexch01.mtl.com> <4533CC4A.2070709@ichips.intel.com> <4534CE1E.1010106@dev.mellanox.co.il> <453507D1.6060304@ichips.intel.com> <20061017165946.GE23922@mellanox.co.il> Message-ID: <45350D92.6080905@ichips.intel.com> Michael S. Tsirkin wrote: > Could be a compiler thing: maybe cm_issue_rej is used in ore than > one place? To make sure, you can try removing the static > keryword and see if this appears. That could be. cm_issue_rej is called from multiple locations, whereas cm_issue_drep is not. - Sean From ardavis at ichips.intel.com Tue Oct 17 10:13:24 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 17 Oct 2006 10:13:24 -0700 Subject: [openib-general] uDAPL problem In-Reply-To: <4534181E.3020703@cs.rutgers.edu> References: <4533FFB7.2020506@cs.rutgers.edu> <4534013A.5000900@cs.rutgers.edu> <45341168.8070305@ichips.intel.com> <4534181E.3020703@cs.rutgers.edu> Message-ID: <45350F34.2030408@ichips.intel.com> Stephen Smaldone wrote: > > > Arlin Davis wrote: > >> Steve Smaldone wrote: >> >>> Hi, >>> >>> Sorry for replying to myself, but I loaded rdma_ucm and the rdma_cm >>> device appears. However, it now fails with the following: >>> >>> $ ./dapltest -T S -D IB1 >>> ... >>> DAT Registry: dat_ia_openv (IB1,1:2,0) called >>> DAT Registry: IA IB1, trying to load library /usr/local/lib/libdapl.so >>> DAT Registry: dat_registry_add_provider (IB1,1:2,0) >>> libibverbs: Warning: no userspace device-specific driver found for >>> uverbs0 >>> driver search path: /usr/local/lib/infiniband >>> libibverbs: Warning: no userspace device-specific driver found for >>> uverbs0 >>> driver search path: /usr/local/lib/infiniband >>> DT_cs_Server: Could not open IB1 (DAT_INVALID_ADDRESS ) >>> DT_cs_Server (IB1): Exiting. >>> DAT Registry: Stopped (dat_fini) >>> >>> The configuration remains the same otherwise. >>> >>> >>>> My dat.conf: >>>> IB1 u1.2 nonthreadsafe default /usr/local/lib/libdapl.so >>>> mv_dapl.1.2 "hora-1-ib0 0" "" >>>> >>> >> Do you have an entry in your /etc/hosts for hora-1-ib0 and 10.2.2.135? >> >> there seems to be problems resolving "hora-1-ib0" >> >> -arlin > > Yes. There is an entry as follows: > 10.2.2.135 hora-1-ib0 could you change the "hora-1-ib0 0" to just "ib0 0" in your dat.conf and retry? They may be an issue parsing a hostname instead of a netdev name. > > Thanks, > > Steve > From rdreier at cisco.com Tue Oct 17 10:16:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 10:16:43 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017152636.GA23169@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 17:26:36 +0200") References: <20061017152636.GA23169@mellanox.co.il> Message-ID: Michael> BTW, something like this will be needed for userspace too? Ugh, I forgot about that. I don't think an mmiowb() equivalent is available from userspace. However, the problem only arises if userspace uses the same QP/CQ/SRQ from multiple nodes at the same time -- so maybe we can live with this. - R. From rdreier at cisco.com Tue Oct 17 10:17:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 10:17:40 -0700 Subject: [openib-general] ethtool support for ipoib In-Reply-To: (Shirley Ma's message of "Tue, 17 Oct 2006 08:30:04 -0700") References: Message-ID: Shirley> I read the discussion in net-dev. Since IB packet has its Shirley> own CRC (ICRC, VCRC). Is it a good idea to enable Shirley> checksum unnecessary in a pure IB Fabrics for large MTU Shirley> 64K. It requires some negotiation. Does your prototype Shirley> implementation for large MTU requires both ends Shirley> agreement? No, it's never a good idea to turn off TCP or IP checksums. That leads to possibilities of silent data corruption too easily. From mshefty at ichips.intel.com Tue Oct 17 10:20:17 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 10:20:17 -0700 Subject: [openib-general] [PATCH] [RFC] cma_new_id can kfree on error instead of destroy_id In-Reply-To: <20061017043926.4891.43838.sendpatchset@localhost.localdomain> References: <20061017043926.4891.43838.sendpatchset@localhost.localdomain> Message-ID: <453510D1.5040409@ichips.intel.com> Krishna Kumar wrote: > cma_new_id() does not require to do destroy_id(), instead > it can kfree(), since nothing is allocated on that id. > Posting this as an RFC in case anyone feels that create_id > should be cleaned up by destroy_id (even if redundant). I can go either way on this. It's a little cleaner to match rdma_create_id() with rdma_destroy_id(), rather than matching it with kfree(). It makes maintenance easier. - Sean From mst at mellanox.co.il Tue Oct 17 10:23:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 19:23:37 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017172337.GH23922@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > Michael> BTW, something like this will be needed for userspace too? > > Ugh, I forgot about that. > > I don't think an mmiowb() equivalent is available from userspace. Isn't this just an asm() command? > However, the problem only arises if userspace uses the same QP/CQ/SRQ > from multiple nodes at the same time -- so maybe we can live with this. BTW, I think we really should implement proper rmb/wmb in arch.h. Last time I looked we only had compiler barriers here, and this means, I think, that a read from e.g. CQE contents could bypass the read of the CQE valid bit. -- MST From mshefty at ichips.intel.com Tue Oct 17 10:28:57 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 10:28:57 -0700 Subject: [openib-general] [PATCH] Use time_after_eq() instead of time_after() in queue_req() In-Reply-To: <20061017043909.4891.4421.sendpatchset@localhost.localdomain> References: <20061017043909.4891.4421.sendpatchset@localhost.localdomain> Message-ID: <453512D9.5030703@ichips.intel.com> Acked-by: Sean Hefty Roland, this looks good for 2.6.20. How would you like to handle pulling in patches like these? Once OFA has git up, would it be easier to pull them into my git tree, then request that you pull from there, or does this work okay? > In queue_req(), use time_after_eq() instead of time_after() > for following reasons : > > - Improves insert time if multiple entries with same time are > present. > - set_timeout need not be called if entry with same time > is added to the list (and that happens to be the entry > with the smallest time), saving atomic/locking operations. > - Earlier entries with same time are deleted first (fifo). > > Signed-off-by: Krishna Kumar > -------- > diff -ruNp org/drivers/infiniband/core/addr.c new/drivers/infiniband/core/addr.c > --- org/drivers/infiniband/core/addr.c 2006-10-09 16:54:37.000000000 +0530 > +++ new/drivers/infiniband/core/addr.c 2006-10-09 16:55:36.000000000 +0530 > @@ -118,7 +118,7 @@ static void queue_req(struct addr_req *r > > mutex_lock(&lock); > list_for_each_entry_reverse(temp_req, &req_list, list) { > - if (time_after(req->timeout, temp_req->timeout)) > + if (time_after_eq(req->timeout, temp_req->timeout)) > break; > } From mst at mellanox.co.il Tue Oct 17 10:34:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 19:34:03 +0200 Subject: [openib-general] ethtool support for ipoib In-Reply-To: References: Message-ID: <20061017173403.GI23922@mellanox.co.il> Quoting r. Roland Dreier : > No, it's never a good idea to turn off TCP or IP checksums. That > leads to possibilities of silent data corruption too easily. "never" is probably too strong a word - hardware checksum offloading turns off checksumming in software, moving that to hardware. Some people dislike that, too, but its not a universal thing. Another example is loopback interface which sets NETIF_F_NO_CSUM. But this might be a linux-only thing. Right? -- MST From akepner at sgi.com Tue Oct 17 10:18:40 2006 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 17 Oct 2006 10:18:40 -0700 (PDT) Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017172337.GH23922@mellanox.co.il> References: <20061017172337.GH23922@mellanox.co.il> Message-ID: On Tue, 17 Oct 2006, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : >> Subject: Re: [PATCH] use mmiowb after doorbell ring >> >> Michael> BTW, something like this will be needed for userspace too? >> >> Ugh, I forgot about that. >> >> I don't think an mmiowb() equivalent is available from userspace. > > Isn't this just an asm() command? > Depends on the architecture, but on sn2, it's not. (Actually, on most architectures, it's a no-op. See arch/ia64/sn/kernel/iomv.c:__sn_mmiowb() for the sn2 version.) -- Arthur From xma at us.ibm.com Tue Oct 17 10:38:43 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 17 Oct 2006 10:38:43 -0700 Subject: [openib-general] ethtool support for ipoib In-Reply-To: <20061017163650.GA23922@mellanox.co.il> Message-ID: What I suggested here is when it's connected mode with large MTU, set ib interface flag to CHECKSUM_UNNECESSARY. But this only works on packets not being routed off-net at the TCP layer. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Oct 17 10:44:47 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 10:44:47 -0700 Subject: [openib-general] [PATCH] Fix some cancellation problems in process_req(). In-Reply-To: <20061017043918.4891.5249.sendpatchset@localhost.localdomain> References: <20061017043918.4891.5249.sendpatchset@localhost.localdomain> Message-ID: <4535168F.9050901@ichips.intel.com> Krishna Kumar wrote: > mutex_lock(&lock); > list_for_each_entry_safe(req, temp_req, &req_list, list) { > - if (req->status) { > + if (req->status && req->status != -ECANCELED) { I think we just need: if (req->status == -ENODATA) { > src_in = (struct sockaddr_in *) &req->src_addr; > dst_in = (struct sockaddr_in *) &req->dst_addr; > req->status = addr_resolve_remote(src_in, dst_in, > req->addr); > + if (req->status && time_after_eq(jiffies, req->timeout)) > + req->status = -ETIMEDOUT; > + else if (req->status == -ENODATA) > + continue; > } > - if (req->status && time_after(jiffies, req->timeout)) > - req->status = -ETIMEDOUT; > - else if (req->status == -ENODATA) > - continue; > - The other changes look fine. But note that if req->status == -ECANCELED and time_after() is true, then it seems like a toss up as to which one can be reported to the user. - Sean From mshefty at ichips.intel.com Tue Oct 17 10:56:46 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 10:56:46 -0700 Subject: [openib-general] [PATCH] Rewrite cma_req_handler() to encapsulate common code. In-Reply-To: <20061017043911.4891.28143.sendpatchset@localhost.localdomain> References: <20061017043911.4891.28143.sendpatchset@localhost.localdomain> Message-ID: <4535195E.5060601@ichips.intel.com> Acked-by: Sean Hefty Let me see how Roland would like to handle merging the patches going forward, but this one looks fine. From mshefty at ichips.intel.com Tue Oct 17 11:01:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 11:01:33 -0700 Subject: [openib-general] [PATCH] Re-send ARP as prev ARP request could have got dropped. In-Reply-To: <20061017043923.4891.37021.sendpatchset@localhost.localdomain> References: <20061017043923.4891.37021.sendpatchset@localhost.localdomain> Message-ID: <45351A7D.2060909@ichips.intel.com> Krishna Kumar wrote: > Re-send ARP, since earlier ARP request could have got > dropped/lost. This should be done in addr_resolve_remote() > as doing it in rdma_resolve_ip() means sending ARP only > once. This was intentional. Users can call rdma_resolve_ip() again to retry a timed out request. In any case, we want to avoid resending an ARP until after the first request has timed out. addr_resolve_remote() can be called multiple times for the same destination within the specified time out window. - Sean From rdreier at cisco.com Tue Oct 17 11:03:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 11:03:22 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017172337.GH23922@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 19:23:37 +0200") References: <20061017172337.GH23922@mellanox.co.il> Message-ID: > > I don't think an mmiowb() equivalent is available from userspace. > > Isn't this just an asm() command? Nope, look at the kernel source, specifically arch/ia64/sn/kernel/iomv.c > BTW, I think we really should implement proper rmb/wmb in arch.h. > Last time I looked we only had compiler barriers here, and > this means, I think, that a read from e.g. CQE contents could bypass > the read of the CQE valid bit. I'm not absolutely sure everything there is correct but I did my best, for example #elif defined(__ia64__) #define mb() asm volatile("mf" ::: "memory") Do you know of any specific archs that are broken? - R. From mst at mellanox.co.il Tue Oct 17 11:16:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 20:16:38 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061017172337.GH23922@mellanox.co.il> Message-ID: <20061017181638.GK23922@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > > > I don't think an mmiowb() equivalent is available from userspace. > > > > Isn't this just an asm() command? > > Nope, look at the kernel source, specifically arch/ia64/sn/kernel/iomv.c > > > BTW, I think we really should implement proper rmb/wmb in arch.h. > > Last time I looked we only had compiler barriers here, and > > this means, I think, that a read from e.g. CQE contents could bypass > > the read of the CQE valid bit. > > I'm not absolutely sure everything there is correct but I did my best, > for example > > #elif defined(__ia64__) > > #define mb() asm volatile("mf" ::: "memory") > > Do you know of any specific archs that are broken? Look e.g. on mthca/cq.c cqe = next_cqe_sw(cq); if (!cqe) return CQ_EMPTY; /* * Make sure we read CQ entry contents after we've checked the * ownership bit. */ mb(); qpn = ntohl(cqe->my_qpn); kernel code does rmb rather than mb there. -- MST From sashak at voltaire.com Tue Oct 17 11:28:56 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 17 Oct 2006 20:28:56 +0200 Subject: [openib-general] [PATCH] opensm: misc fixes in lft dump file parser Message-ID: <20061017182856.GA26498@sashak.voltaire.com> There are misc small fixes for lft dump parser: - merge ERROR and SYS logging in single osm_log() call - more strict strtoul() results checking - fix potential bugs with invalid dump files - break too long lines Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_file.c | 69 +++++++++++++++++++++++-------------------- 1 files changed, 37 insertions(+), 32 deletions(-) diff --git a/osm/opensm/osm_ucast_file.c b/osm/opensm/osm_ucast_file.c index da39d1a..446c243 100644 --- a/osm/opensm/osm_ucast_file.c +++ b/osm/opensm/osm_ucast_file.c @@ -132,21 +132,19 @@ static int do_ucast_file_load(void *cont file_name = p_osm->subn.opt.ucast_dump_file; if (!file_name) { - osm_log(&p_osm->log, OSM_LOG_SYS, - "ucast dump file name is not defined; using default routing algorithm\n"); - osm_log(&p_osm->log, OSM_LOG_ERROR, + osm_log(&p_osm->log, OSM_LOG_ERROR|OSM_LOG_SYS, "do_ucast_file_load: ERR 6301: " - "ucast dump file name is not defined; using default routing algorithm\n"); + "ucast dump file name is not defined; " + "using default routing algorithm\n"); return -1; } file = fopen(file_name, "r"); if (!file) { - osm_log(&p_osm->log, OSM_LOG_SYS, - "Cannot open ucast dump file \'%s\'; using default routing algorithm\n", file_name); - osm_log(&p_osm->log, OSM_LOG_ERROR, + osm_log(&p_osm->log, OSM_LOG_ERROR|OSM_LOG_SYS, "do_ucast_file_load: ERR 6302: " - "cannot open ucast dump file \'%s\'; using default routing algorithm\n", file_name); + "cannot open ucast dump file \'%s\'; " + "using default routing algorithm\n", file_name); return -1; } @@ -167,25 +165,25 @@ static int do_ucast_file_load(void *cont continue; if (!strncmp(p, "Multicast mlids", 15)) { - osm_log(&p_osm->log, OSM_LOG_SYS, - "Multicast dump file detected; " - "skipping parsing. Using default routing algorithm\n"); - osm_log(&p_osm->log, OSM_LOG_ERROR, + osm_log(&p_osm->log, OSM_LOG_ERROR|OSM_LOG_SYS, "do_ucast_file_load: ERR 6303: " "Multicast dump file detected; " - "skipping parsing. Using default routing algorithm\n"); + "skipping parsing. Using default " + "routing algorithm\n"); } else if (!strncmp(p, "Unicast lids", 12)) { q = strstr(p, " guid 0x"); if (!q) { - osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " "cannot parse switch definition\n", file_name, lineno); return -1; } - p = q + 6; + p = q + 8; sw_guid = strtoull(p, &q, 16); - if (q && !isspace(*q)) { - osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " + if (q == p || !isspace(*q)) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " "cannot parse switch guid: \'%s\'\n", file_name, lineno, p); return -1; @@ -204,39 +202,46 @@ static int do_ucast_file_load(void *cont continue; } } else if (p_sw && !strncmp(p, "0x", 2)) { + p += 2; lid = (uint16_t)strtoul(p, &q, 16); - if (q && !isspace(*q)) { - osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " - "cannot parse lid: \'%s\'\n", file_name, lineno, p); + if (q == p || !isspace(*q)) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " + "cannot parse lid: \'%s\'\n", + file_name, lineno, p); return -1; } p = q; while (isspace(*p)) p++; port_num = (uint8_t)strtoul(p, &q, 10); - if (q && !isspace(*q)) { - osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " - "cannot parse port: \'%s\'\n", file_name, lineno, p); + if (q == p || !isspace(*q)) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " + "cannot parse port: \'%s\'\n", + file_name, lineno, p); return -1; } p = q; /* additionally try to exract guid */ q = strstr(p, " portguid 0x"); if (!q) { - osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u: " + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "PARSE WARNING: %s:%u: " "cannot find port guid " - "(maybe broken dump): \'%s\'\n", file_name, lineno, p); + "(maybe broken dump): \'%s\'\n", + file_name, lineno, p); port_guid = 0; } - else - { - p = q + 10; + else { + p = q + 12; port_guid = strtoull(p, &q, 16); - if (!q && !isspace(*q) && *q != ':') { - osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u: " + if (q == p || (!isspace(*q) && *q != ':')) { + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "PARSE WARNING: %s:%u: " "cannot parse port guid " - "(maybe broken dump): " - "\'%s\'\n", file_name, lineno, p); + "(maybe broken dump): \'%s\'\n", + file_name, lineno, p); port_guid = 0; } } -- 1.4.2.3 From mshefty at ichips.intel.com Tue Oct 17 11:34:16 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 11:34:16 -0700 Subject: [openib-general] [ucma] executing the ucmatose with local IPoIB IP address of port 2 fails In-Reply-To: <4534AE9B.4030907@dev.mellanox.co.il> References: <4534AE9B.4030907@dev.mellanox.co.il> Message-ID: <45352228.5050208@ichips.intel.com> > scenario 2: fails > SM was executed on port 2 > i executed ucmatose server and ucmatose client with IPoIB IP address > of port 2 > > here is the output of the client: > ucmatose: starting client > ucmatose: connecting > ucmatose: event: 3, error: 0 > receiving data transfers > sending replies > data transfers complete > test complete > return status 0 This is a ROUTE_ERROR (path record query fails). Are the IP addresses on different subnets? Are you having ucmatose bind to the port 2 ip address. > It seems that when using the IPoIB IP address of port 2 in the client > side and there is an SM only on port 2 the test fails but if i add an SM > on port 1 the test passes. > > Did you notice this behavior before? I have not tested this configuration. - Sean From halr at voltaire.com Tue Oct 17 11:39:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 14:39:41 -0400 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: <4534FFA7.3000402@dev.mellanox.co.il> References: <4534FFA7.3000402@dev.mellanox.co.il> Message-ID: <1161110368.32093.410624.camel@hal.voltaire.com> On Tue, 2006-10-17 at 12:07, Yevgeny Kliteynik wrote: > Hi Hal > > Fixing more things in the multicast test flow. > > Still have things to do in case when multicast group removal > fails, and have to add some cleanup (as we've discussed previously). > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Looks good. One question below. -- Hal > Index: osmtest/osmt_multicast.c > =================================================================== > --- osmtest/osmt_multicast.c (revision 9856) > +++ osmtest/osmt_multicast.c (working copy) [snip...] > @@ -3261,6 +3306,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons > &mc_req_rec, > comp_mask, > &res_sa_mad ); > + status = IB_SUCCESS; This doesn't look right to me. > if (status != IB_SUCCESS) > { > osm_log( &p_osmt->log, OSM_LOG_ERROR, > @@ -3274,6 +3320,10 @@ osmt_run_mcast_flow( IN osmtest_t * cons > fail_to_delete_mcg++; > } > } > + else > + { > + end_ipoib_cnt++; > + } > p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); > } > [snip...] From mshefty at ichips.intel.com Tue Oct 17 11:46:40 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 17 Oct 2006 11:46:40 -0700 Subject: [openib-general] [ucma] executing the ucmatose with local IPoIB IP address of port 2 fails In-Reply-To: <45352228.5050208@ichips.intel.com> References: <4534AE9B.4030907@dev.mellanox.co.il> <45352228.5050208@ichips.intel.com> Message-ID: <45352510.7020701@ichips.intel.com> Sean Hefty wrote: > This is a ROUTE_ERROR (path record query fails). Are the IP addresses on > different subnets? Are you having ucmatose bind to the port 2 ip address. Another thing to check is what port ucmatose binds to after calling rdma_resolve_addr(). - Sean From pw at osc.edu Tue Oct 17 11:53:30 2006 From: pw at osc.edu (Pete Wyckoff) Date: Tue, 17 Oct 2006 14:53:30 -0400 Subject: [openib-general] client-server small message performance issues Message-ID: <20061017185330.GA2450@quasar.osc.edu> I'm trying to understand some performance variation in an Openib application, and wrote a small test program to simulate its behavior. Attached are the code and a plot of some results. Each dot in the plot shows the time for a single iteration in the code explained below. One client communicates with some number of servers. In the plot, anywhere from 1 to 10 servers. There is one RC QP from the client to each server, and no QPs between servers. Everything is set up in advance using TCP to exchange lid/key information. Look for the function "multiping" to see the client do the following: start timer foreach s in server: post recv to QP[s] foreach s in server: post send to QP[s], 200 bytes wait for 2 * numserver completions (one for each send and recv) stop timer Each server meanwhile has a preposted number of receives, 20 is plenty. Their loop is: wait for receive completion post send to client, 200 bytes post recv to client QP wait for send completion The results for 1000 iterations (and 20 untimed warmups), invoked as mpiexec -comm=none -pernode -np 11 multiping -s 200 -n 1000 ib30 > x egrep '^#' x look like: # +/- median , all in usec # 1 24.744 +/- 2.117 median 24.080 us # 2 30.352 +/- 2.241 median 30.041 us # 3 36.202 +/- 2.774 median 35.048 us # 4 45.475 +/- 2.347 median 45.061 us # 5 51.843 +/- 2.598 median 51.022 us # 6 58.552 +/- 2.407 median 57.936 us # 7 97.751 +/- 16.427 median 95.129 us # 8 114.346 +/- 16.568 median 113.010 us # 9 188.962 +/- 52.061 median 192.881 us # 10 230.065 +/- 48.299 median 215.054 us Basic ping pong is 25 us. That's fine as this is not a particularly optimal way to communicate. Each additional server adds 6 us. That seems like a lot of overhead just to do another pair of posts and polls, but not my major complaint. Look at the jump from 6 to 7 servers, 41 us. Beyond that, too. And the standard deviation becomes huge. A plot of the individual values shows a large spread, not just a few outliers. I was hoping to see each additional server add a fixed amount of overhead to the overall time. The same application on ethernet starts slower, but scales much better as the number of servers is increased. The hardware is all Mellanox MT25204, with 18-port MT47396 switches. I tried 11 hosts all connected to the same switch, and another 11 hosts to a different switch. Also mixing hosts across switches. No perceptible changes to the results. Also played around with QP attr timeout and retry_count to discover that retries do happen, so the retry_count must be at least 2, but that a timeout from 2 to 10 doesn't have an effect. Software is stock kernel 2.6.17.6 and libibverbs-1.0.3-1.fc4, libmthca-1.0.2-1.fc4. Any suggestions on how to avoid these big jumps? Explanations as to the cause? -- Pete -------------- next part -------------- /* * Test completion time for lots of small conversations. Task 0 of this * parallel code is the "client", who does a number of iterations of a * test involving small transactions with "servers". At each iteration, * the client pre-posts a receive on each QP, posts a small send on each, * then * polls until all sends and receives are completed. Each server * keeps a constant number of receives posted, waits for a message to * arrive and immediatly responds. * * Copyright (C) Pete Wyckoff, 2006. * * Built like this: * gcc -O3 -c -o multiping.o multiping.c * gcc -o multiping multiping.o -libverbs -lm * * Somehow get it started on many nodes, pointing them all to one * which is designated as the master, e.g.: * * for i in piv002 piv004 piv006 ; do rsh -n $i multiping piv002 & done * * Mpiexec users inside a PBS job: * * mpiexec --comm=none -pernode -nostdin multiping $(hostname) * * Bproc users could do: * * bpsh 3-31 ./multiping n3 * * Run the code with no args to see the usage() message. Numbers * can be given with suffix "k", "m", or "g" to scale by 10^3, 6, or 9, * e.g.: multiping -n 1k -s 1m $(hostname) * * Two environment variables adjust QP settings, e.g.: * ARDMA_RETRY_COUNT=2 * ARDMA_TIMEOUT=4 * */ #include #include #include #include #include #include #include #include #include #include #include #include /* * Debugging support. */ #if 0 #define DEBUG_LEVEL 2 #define debug(lvl,fmt,args...) \ do { \ if (lvl <= DEBUG_LEVEL) \ info(fmt,##args); \ } while (0) #define assert(cond,fmt,args...) \ do { \ if (__builtin_expect(!(cond),0)) \ error(fmt,##args); \ } while (0) #else /* no debug version */ # define debug(lvl,cond,fmt,...) # define assert(cond,fmt,...) #endif /* * Handy macros. */ #define ptr_from_int64(p) (void *)(unsigned long)(p) #define int64_from_ptr(p) (u_int64_t)(unsigned long)(p) /* * Some shared variables. */ const char *progname; int myid, numproc; char *myhostname; unsigned long pagesize; unsigned long bufsize = 4096 * 16; int numiter = 10; static void __attribute__((noreturn)) usage(void) { fprintf(stderr, "Usage: %s [-n ] [-s ] \n", progname); exit(1); } #ifdef DEBUG_LEVEL static void __attribute__((format(printf,1,2))) info(const char *fmt, ...) { char s[2048]; va_list ap; va_start(ap, fmt); vsprintf(s, fmt, ap); va_end(ap); fprintf(stderr, "[%d/%d %s]: %s.\n", myid, numproc, myhostname, s); } #endif /* * Warning, non-fatal. */ static void warning(const char *fmt, ...) { va_list ap; fprintf(stderr, "[%d/%d %s]: %s: Warning: ", myid, numproc, myhostname, progname); va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); fprintf(stderr, ".\n"); } /* * Error, fatal. */ static void error(const char *fmt, ...) { va_list ap; fprintf(stderr, "[%d/%d %s]: %s: Error: ", myid, numproc, myhostname, progname); va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); fprintf(stderr, ".\n"); exit(1); } /* * Error, fatal, with the errno message. */ static void error_errno(const char *fmt, ...) { va_list ap; fprintf(stderr, "[%d/%d %s]: %s: Error: ", myid, numproc, myhostname, progname); va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); fprintf(stderr, ": %s.\n", strerror(errno)); exit(1); } /* * Error-checking malloc. */ static void * Malloc(unsigned long n) { void *x; if (n == 0) error("%s: called on zero bytes", __func__); x = malloc(n); if (!x) error("%s: couldn't get %lu bytes", __func__, n); return x; } /* * For reading from a pipe, can't always get the full buf in one chunk. */ static ssize_t saferead(int fd, void *buf, size_t num) { int i, offset = 0; int total = num; while (num > 0) { i = read(fd, (char *)buf + offset, num); if (i < 0) error_errno("%s: %d bytes", __func__, num); if (i == 0) { if (offset == 0) return 0; /* end of file on a block boundary */ error("EOF in saferead, only %d of %d bytes", offset, total); } num -= i; offset += i; } return total; } static ssize_t safewrite(int fd, const void *buf, size_t num) { int i, offset = 0; int total = num; while (num > 0) { i = write(fd, (const char *)buf + offset, num); if (i < 0) error_errno("%s: %d bytes", __func__, num); if (i == 0) error("EOF in safewrite, only %d of %d bytes", offset, total); num -= i; offset += i; } return total; } static unsigned long parse_number(const char *cp) { unsigned long v; char *cq; v = strtoul(cp, &cq, 0); if (*cq) { if (!strcasecmp(cq, "k")) v *= 1000; else if (!strcasecmp(cq, "m")) v *= 1000000; else if (!strcasecmp(cq, "g")) v *= 1000000000; else usage(); } return v; } /* * Find the next power of two above or equal to this number. 32-bit. * Fails on n<=0. */ static int higher_power_of_2(int n) { int x = n; x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x = (x >> 1) + 1; if (x != n) x <<= 1; return x; } /* * Set the program name, first statement of code usually. */ static void set_progname(int argc __attribute__ ((unused)), char *const argv[]) { const char *cp; char s[1024]; for (cp=progname=argv[0]; *cp; cp++) if (*cp == '/') progname = cp+1; if (gethostname(s, sizeof(s)) < 0) error_errno("%s: gethostname", __func__); s[sizeof(s)-1] = '\0'; myhostname = strdup(s); } /* constants used to initialize infiniband device */ static const int IB_PORT = 1; static const unsigned int IB_NUM_CQ_ENTRIES = 30; static const enum ibv_mtu IB_MTU = IBV_MTU_1024; /* default mtu */ /* IB device vars */ static struct ibv_context *nic_handle; /* NIC reference */ static struct ibv_pd *nic_pd; /* single protection domain for all memory/QP */ static struct ibv_cq *nic_cq; /* single completion queue for all QPs */ static uint16_t nic_lid; /* TCP connection management */ static const unsigned short int port = 5207; static int fd; /* non-master only */ static int *fds; /* master only */ /* barrier code */ static int numproc_higher_power_of_2; typedef struct { int id; /* "myid" assigned to the other end of this qp */ uint16_t lid; uint32_t qp_num; void * recv_buf; uint32_t rkey; } remote_info_t; typedef struct { struct ibv_qp * qp; uint32_t qp_num; void * send_buf; void * recv_buf; uint32_t send_lkey; uint32_t recv_lkey; uint32_t recv_rkey; remote_info_t remote_info; } qp_t; static qp_t *qps; static struct { char hostname[16]; uint16_t lid; } *host; static const char *masterhost = 0; static void barrier(void); static void ib_init(void); static void ib_build_qp(qp_t *qp); static void ib_bringup_qp(qp_t *qp); static void full_connect_socket(void); static void multiping(int np); static void rdma_send(qp_t *to, void *buf, unsigned int len, int offset); static void post_send(qp_t *to); static void post_recv(qp_t *from); static void reap_completion(void); int main(int argc, char **argv) { int i; set_progname(argc, argv); pagesize = getpagesize(); myid = -1; numproc = 0; if (strlen(myhostname) + 1 > sizeof(*host)) error("%s: my hostname too big for static structure", __func__); while (++argv, --argc > 0) { const char *cp; if (**argv == '-') switch ((*argv)[1]) { case 'n': cp = *argv + 2; for (i = 1; *cp && *cp == "numiter"[i]; ++cp, ++i) ; if (*cp) usage(); if (++argv, --argc == 0) usage(); numiter = parse_number(*argv); break; case 's': cp = *argv + 2; for (i = 1; *cp && *cp == "size"[i]; ++cp, ++i) ; if (*cp) usage(); if (++argv, --argc == 0) usage(); bufsize = parse_number(*argv); break; } else { if (masterhost) usage(); masterhost = *argv; } } if (!masterhost) usage(); if (bufsize < 4 * sizeof(int)) { if (myid == 0) error("%s: bufsize must be at least %d for barrier to work", __func__, 4 * sizeof(int)); else return 1; } if (!strcmp(masterhost, myhostname)) myid = 0; full_connect_socket(); /* for barrier */ numproc_higher_power_of_2 = higher_power_of_2(numproc); barrier(); for (i=2; i<=numproc; i++) multiping(i); return 0; } /* * Use TCP to glue everybody together, with a * QP between each pair of hosts. */ static void full_connect_socket(void) { struct hostent *hp; struct sockaddr_in skin; socklen_t sin_len = sizeof(skin); struct timeval tv, now, delta; int i; hp = gethostbyname(masterhost); if (!hp) error("host \"%s\" not resolvable", masterhost); memset(&skin, 0, sin_len); skin.sin_family = hp->h_addrtype; memcpy(&skin.sin_addr, hp->h_addr_list[0], hp->h_length); skin.sin_port = htons(port); fd = socket(PF_INET, SOCK_STREAM, 0); if (fd < 0) error_errno("%s: socket", __func__); ib_init(); if (myid == 0) { int maxproc; int flags; flags = 1; if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &flags, sizeof(flags)) < 0) error_errno("setsockopt reuseaddr"); if (bind(fd, (struct sockaddr *)&skin, sin_len) < 0) error_errno("bind"); if (listen(fd, 1024) < 0) error_errno("listen"); flags = fcntl(fd, F_GETFL); if (flags < 0) error_errno("%s: get listen socket flags", __func__); if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) < 0) error_errno("%s: set listen socket nonblocking", __func__); maxproc = 10; fds = Malloc(maxproc * sizeof(*fds)); fds[0] = -1; /* talk to myself, no */ numproc = 1; gettimeofday(&tv, 0); for (;;) { int t = accept(fd, 0, 0); if (t < 0) { if (errno == EAGAIN || errno == EINTR) { usleep(100); gettimeofday(&now, 0); timersub(&now, &tv, &delta); if (delta.tv_sec >= 3) /* wait 3sec for all to connect */ break; continue; } else error_errno("accept"); } if (numproc == maxproc) { void *x = fds; maxproc += 10; fds = Malloc(maxproc * sizeof(*fds)); if (x) { memcpy(fds, x, numproc * sizeof(*fds)); free(x); } } fds[numproc] = t; safewrite(t, &numproc, sizeof(numproc)); ++numproc; } close(fd); if (numproc == 1) error("no one connected to master"); debug(2, "%d procs", numproc); for (i=1; i= 30) error("failed to connect to master"); } else error_errno("connect"); } saferead(fd, &myid, sizeof(myid)); saferead(fd, &numproc, sizeof(numproc)); } host = Malloc(numproc * sizeof(*host)); qps = Malloc(numproc * sizeof(*qps)); /* allbroadcast hostname/lid */ strcpy(host[myid].hostname, myhostname); host[myid].lid = nic_lid; if (myid == 0) { for (i=1; i= 8 sleep(1); #else usleep(250); #endif } stride <<= 1; } ++generation; debug(2, "%s: done", __func__); } static inline double Wtime(void) { struct timeval tv; gettimeofday(&tv, NULL); return (double) tv.tv_sec + (double) tv.tv_usec / 1000000; } static inline int double_compare(const void *x1, const void *x2) { const double *d1 = x1; const double *d2 = x2; double diff = *d1 - *d2; return (diff < 0. ? -1 : (diff > 0. ? 1 : 0.)); } /* * Core test, np is number of servers. Task 0 is always client. */ static void multiping(int np) { int i, j; int numpost_left = 0; const int recv_depth = 20, warmup = 10; double *v = NULL, *w; double avg = 0., stddev = 0., median = 0.; if (myid == 0) v = Malloc(numiter * sizeof(*v)); /* prepost */ if (myid > 0 && myid < np) { numpost_left = numiter + warmup; for (i=0; i= warmup) v[j-warmup] = end - start; } if (myid > 0 && myid < np) { /* one of many "servers" */ reap_completion(); /* wait for client to send to me */ post_send(&qps[0]); if (numpost_left > 0) { post_recv(&qps[0]); /* refill prepost while waiting... */ --numpost_left; } reap_completion(); /* wait for completion of my send */ } } barrier(); if (myid == 0) { if (numiter > 0) { for (i=0; i 1) { for (i=0; i rkey %x", __func__, to->remote_info.id, buf, len, offset, to->remote_info.rkey); if (buf) memcpy(to->send_buf, buf, len); sg.addr = int64_from_ptr(to->send_buf); sg.length = buf ? len : bufsize; sg.lkey = to->send_lkey; memset(&sr, 0, sizeof(sr)); sr.wr_id = int64_from_ptr(to); sr.opcode = IBV_WR_RDMA_WRITE; sr.send_flags = IBV_SEND_SIGNALED; sr.wr.rdma.remote_addr = int64_from_ptr(to->remote_info.recv_buf) + offset; sr.wr.rdma.rkey = to->remote_info.rkey; sr.sg_list = &sg; sr.num_sge = 1; sr.next = NULL; debug(4, "%s: lkey %x rkey %x rbuf %llx", __func__, sg.lkey, sr.wr.rdma.rkey, (unsigned long long) sr.wr.rdma.remote_addr); ret = ibv_post_send(to->qp, &sr, &bad_wr); if (ret < 0) error("%s: ibv_post_send", __func__); } static void post_send(qp_t *to) { struct ibv_sge sg; struct ibv_send_wr sr, *bad_wr; int ret; debug(2, "%s: to %d", __func__, to->remote_info.id); sg.addr = int64_from_ptr(to->send_buf); sg.length = bufsize; sg.lkey = to->send_lkey; memset(&sr, 0, sizeof(sr)); sr.wr_id = int64_from_ptr(to); sr.opcode = IBV_WR_SEND; sr.send_flags = IBV_SEND_SIGNALED; sr.sg_list = &sg; sr.num_sge = 1; sr.next = NULL; ret = ibv_post_send(to->qp, &sr, &bad_wr); if (ret < 0) error("%s: ibv_post_send", __func__); } static void post_recv(qp_t *from) { struct ibv_sge sg; struct ibv_recv_wr rr, *bad_wr; int ret; debug(2, "%s: from %d", __func__, from->remote_info.id); sg.addr = int64_from_ptr(from->recv_buf); sg.length = bufsize; sg.lkey = from->recv_lkey; memset(&rr, 0, sizeof(rr)); rr.wr_id = int64_from_ptr(from); rr.sg_list = &sg; rr.num_sge = 1; rr.next = NULL; ret = ibv_post_recv(from->qp, &rr, &bad_wr); if (ret < 0) error("%s: ibv_post_recv", __func__); } /* * Return string form of work completion status field. */ #define CASE(e) case e: s = #e; break static const char *openib_wc_status_string(int status) { const char *s = "(UNKNOWN)"; switch (status) { CASE(IBV_WC_SUCCESS); CASE(IBV_WC_LOC_LEN_ERR); CASE(IBV_WC_LOC_QP_OP_ERR); CASE(IBV_WC_LOC_EEC_OP_ERR); CASE(IBV_WC_LOC_PROT_ERR); CASE(IBV_WC_WR_FLUSH_ERR); CASE(IBV_WC_MW_BIND_ERR); CASE(IBV_WC_BAD_RESP_ERR); CASE(IBV_WC_LOC_ACCESS_ERR); CASE(IBV_WC_REM_INV_REQ_ERR); CASE(IBV_WC_REM_ACCESS_ERR); CASE(IBV_WC_REM_OP_ERR); CASE(IBV_WC_RETRY_EXC_ERR); CASE(IBV_WC_RNR_RETRY_EXC_ERR); CASE(IBV_WC_LOC_RDD_VIOL_ERR); CASE(IBV_WC_REM_INV_RD_REQ_ERR); CASE(IBV_WC_REM_ABORT_ERR); CASE(IBV_WC_INV_EECN_ERR); CASE(IBV_WC_INV_EEC_STATE_ERR); CASE(IBV_WC_FATAL_ERR); CASE(IBV_WC_GENERAL_ERR); } return s; } /* * Spin until one completion arrives on the CQ. */ static void reap_completion(void) { struct ibv_wc desc; qp_t *sender; for (;;) { int vret = ibv_poll_cq(nic_cq, 1, &desc); if (vret < 0) error_errno("%s: ibv_poll_cq (%d)", __func__, vret); if (vret > 0) break; } sender = (qp_t *) ptr_from_int64(desc.wr_id); if (desc.status != IBV_WC_SUCCESS) error("%s: entry id 0x%Lx (%s) opcode %d status %d = %s", __func__, desc.wr_id, numproc ? host[sender->remote_info.id].hostname : "host unknown", desc.opcode, desc.status, openib_wc_status_string(desc.status)); if (desc.opcode == IBV_WC_RDMA_WRITE) { debug(4, "%s: rdma write to %d complete", __func__, sender->remote_info.id); } else if (desc.opcode == IBV_WC_SEND) { debug(4, "%s: send to %d complete", __func__, sender->remote_info.id); } else if (desc.opcode == IBV_WC_RECV) { debug(4, "%s: recv data from %d complete", __func__, sender->remote_info.id); } else { error("%s: cq entry id %d opcode %d unexpected", __func__, sender->remote_info.id, desc.opcode); } } static void ib_init(void) { int ret; struct ibv_port_attr nic_port_props; struct ibv_device_attr device_attr; int cqe_num; struct ibv_device **devs; int numdevs; devs = ibv_get_device_list(&numdevs); if (numdevs != 1) error("%s: expecting 1 device, not %d", __func__, numdevs); nic_handle = ibv_open_device(devs[0]); if (!nic_handle) error("%s: ibv_open_device", __func__); ibv_free_device_list(devs); /* connect an asynchronous event handler to look for weirdness */ /* XXX: how? */ /* get my lid */ ret = ibv_query_port(nic_handle, IB_PORT, &nic_port_props); if (ret < 0) error_errno("%s: ibv_query_port", __func__); nic_lid = nic_port_props.lid; /* build a protection domain */ nic_pd = ibv_alloc_pd(nic_handle); if (!nic_pd) error("%s: ibv_alloc_pd", __func__); debug(2, "%s: built pd %p", __func__, nic_pd); /* see how many cq entries we are allowed to have */ ret = ibv_query_device(nic_handle, &device_attr); if (ret < 0) error_errno("%s: ibv_query_device", __func__); debug(4, "%s: max %d completion queue entries", __func__, device_attr.max_cqe); cqe_num = IB_NUM_CQ_ENTRIES; if (device_attr.max_cqe < cqe_num) { cqe_num = device_attr.max_cqe; warning("%s: hardly enough completion queue entries %d, hoping for %d", __func__, device_attr.max_cqe, cqe_num); } /* build a CQ (ignore actual number returned) */ debug(4, "%s: asking for %d completion queue entries", __func__, cqe_num); nic_cq = ibv_create_cq(nic_handle, cqe_num, NULL, NULL, 0); if (!nic_cq) error("%s: ibv_create_cq", __func__); } #define page_round(x) \ (void *)(((unsigned long)x + pagesize - 1) & ~(pagesize - 1)); static void ib_build_qp(qp_t *qp) { void *x; int ret; struct ibv_mr *mr; struct ibv_qp_init_attr qp_init_attr; struct ibv_qp_attr attr; enum ibv_qp_attr_mask mask; /* register memory region, recv and recv buf for barriers */ x = Malloc(bufsize + pagesize + pagesize); x = page_round(x); qp->recv_buf = x; mr = ibv_reg_mr(nic_pd, x, bufsize + pagesize, IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE); if (!mr) error("%s: ibv_reg_mr recv", __func__); qp->recv_lkey = mr->lkey; qp->recv_rkey = mr->rkey; /* init for barriers */ memset((char *) qp->recv_buf + bufsize, 0xff, pagesize); /* register memory region, send */ x = Malloc(bufsize + pagesize); x = page_round(x); memset(x, myid & 255, bufsize); qp->send_buf = x; mr = ibv_reg_mr(nic_pd, x, bufsize, 0); if (!mr) error("%s: ibv_reg_mr send", __func__); qp->send_lkey = mr->lkey; /* build qp */ memset(&qp_init_attr, 0, sizeof(qp_init_attr)); /* wire both send and recv to the same CQ */ qp_init_attr.send_cq = nic_cq; qp_init_attr.recv_cq = nic_cq; qp_init_attr.cap.max_send_wr = 5; /* outstanding WQEs */ qp_init_attr.cap.max_recv_wr = 20; qp_init_attr.cap.max_send_sge = 1; /* scatter/gather entries */ qp_init_attr.cap.max_recv_sge = 1; qp_init_attr.qp_type = IBV_QPT_RC; /* only generate completion queue entries if requested */ qp_init_attr.sq_sig_all = 0; qp->qp = ibv_create_qp(nic_pd, &qp_init_attr); if (!qp->qp) error("%s: ibv_create_qp", __func__); qp->qp_num = qp->qp->qp_num; /* see HCA/vip/qpm/qp_xition.h for important settings */ /* transition qp to init */ mask = IBV_QP_STATE | IBV_QP_ACCESS_FLAGS | IBV_QP_PKEY_INDEX | IBV_QP_PORT; memset(&attr, 0, sizeof(attr)); attr.qp_state = IBV_QPS_INIT; attr.qp_access_flags = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ; attr.pkey_index = 0; attr.port_num = IB_PORT; ret = ibv_modify_qp(qp->qp, &attr, mask); if (ret < 0) error("%s: ibv_modify_qp RST -> INIT", __func__); } static unsigned long hasenv(const char *env, unsigned long v) { const char *cp = getenv(env); if (cp) v = parse_number(cp); return v; } static void ib_bringup_qp(qp_t *qp) { int ret; struct ibv_qp_attr attr; enum ibv_qp_attr_mask mask; /* transition qp to ready-to-receive */ mask = IBV_QP_STATE | IBV_QP_MAX_DEST_RD_ATOMIC | IBV_QP_AV | IBV_QP_PATH_MTU | IBV_QP_RQ_PSN | IBV_QP_DEST_QPN | IBV_QP_MIN_RNR_TIMER; memset(&attr, 0, sizeof(attr)); attr.qp_state = IBV_QPS_RTR; attr.max_dest_rd_atomic = 1; attr.ah_attr.dlid = qp->remote_info.lid; attr.ah_attr.port_num = IB_PORT; attr.path_mtu = IB_MTU; attr.rq_psn = 0; attr.dest_qp_num = qp->remote_info.qp_num; attr.min_rnr_timer = 31; /* rnr never happens */ ret = ibv_modify_qp(qp->qp, &attr, mask); if (ret < 0) error("%s: ibv_modify_qp INIT -> RTR", __func__); /* transition qp to ready-to-send */ mask = IBV_QP_STATE | IBV_QP_SQ_PSN | IBV_QP_MAX_QP_RD_ATOMIC | IBV_QP_TIMEOUT | IBV_QP_RETRY_CNT | IBV_QP_RNR_RETRY; memset(&attr, 0, sizeof(attr)); attr.qp_state = IBV_QPS_RTS; attr.sq_psn = 0; attr.max_rd_atomic = 1; attr.timeout = hasenv("ARDMA_TIMEOUT", 10); /* 4.096us * 2^10 = 4 ms */ attr.retry_cnt = hasenv("ARDMA_RETRY_COUNT", 7); attr.rnr_retry = 20; /* RNR never happens */ ret = ibv_modify_qp(qp->qp, &attr, mask); if (ret < 0) error("%s: ibv_modify_qp RTR -> RTS", __func__); } -------------- next part -------------- A non-text attachment was scrubbed... Name: res.png Type: image/png Size: 10916 bytes Desc: not available URL: From tziporet at mellanox.co.il Tue Oct 17 12:09:17 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 17 Oct 2006 21:09:17 +0200 Subject: [openib-general] OFED-1.1-pre1 is ready Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACECB@mtlexch01.mtl.com> Hi All, OFED 1.1-pre1 is available: URL: https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-pre1.tgz According to the 1.1 release schedule I published yesterday and got all partners approval (Qlogic have not answered so I assumed its OK with them too). Each company has 3 days for basic "dead or alive tests" and making sure that no blocker issues are still open. If everything goes well we will do the release at the end of this Thursday. Components owners: Please remember to update the release notes till Wednesday. Documents should be the only component that will be changed from this pre-release to the official release. Tziporet & Vlad ======================================================================== ======== Release details: BUILD_ID: OFED-1.1-pre1 openib-1.1 (REV=9854) # User space https://openib.org/svn/gen2/branches/1.1/src/userspace Git: ref: refs/heads/ofed_1_1 commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 # MPI mpi_osu-0.9.7-mlx2.2.0.tgz openmpi-1.1.1-1.src.rpm mpitests-2.0-0.src.rpm Fixed bugs: BUG 273: OFED 1.1 rc7 does not work with Cisco FC Gateway BUG 274: OFED 1.1: RENICE_IB_MAD=yes hangs dual-HCA system with dual-port HCAs BUG 277: OFED 1.1 rc7: uninitialized value during IPoIB failover in ipoib_ha.pl BUG 278: OFED 1.1: two copies of openib.spec in openib-1.1.tgz Other changes from OFED-1.1-rc7: - Fix in ibdiagnet to support SM on a switch - Activate scaling code of ehca as default in the install - Documentation update - Dapl: removed SCM from the configuration file dat.conf. From akepner at sgi.com Tue Oct 17 11:52:12 2006 From: akepner at sgi.com (akepner at sgi.com) Date: Tue, 17 Oct 2006 11:52:12 -0700 (PDT) Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: On Tue, 17 Oct 2006, Roland Dreier wrote: > OK, here's what I actually put in my tree. Can you eyeball this and > maybe give it a quick test? If it looks good to you, I'll send it on > to the stable team for 2.6.18.x. > Yep, looks fine, and it works on my Altix. Thanks, Roland. -- Arthur From rdreier at cisco.com Tue Oct 17 12:25:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 12:25:24 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017181638.GK23922@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 20:16:38 +0200") References: <20061017172337.GH23922@mellanox.co.il> <20061017181638.GK23922@mellanox.co.il> Message-ID: Michael> kernel code does rmb rather than mb there. OK, but that's an optimization rather than a correctness issue: mb is stronger than rmb. The reason I did it that way was because I wasn't sure it was worth defining mb, rmb and wmb for userspace. - R. From mst at mellanox.co.il Tue Oct 17 12:33:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 21:33:07 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017193307.GM23922@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > Michael> kernel code does rmb rather than mb there. > > OK, but that's an optimization rather than a correctness issue: mb is > stronger than rmb. Very strange. Let's consider amd64: libibverbs has #elif defined(__x86_64__) #define mb() asm volatile("" ::: "memory") So its just a compiler barrier there. While linux has asm-x86_64/system.h #define rmb() asm volatile("lfence":::"memory") So rmb seems to be stronger than mb: it will prevent the CPU from reordering reads while mb won't. Hmm? -- MST From Chris.Dennett at texmemsys.com Tue Oct 17 12:46:18 2006 From: Chris.Dennett at texmemsys.com (Chris Dennett) Date: Tue, 17 Oct 2006 14:46:18 -0500 Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 Message-ID: <200610171446.18009.Chris.Dennett@texmemsys.com> I've been trying to install OFED 1.1 RC7 on an x86 server with a fresh install of SLES10 (32-bit). It errors out when trying to build the kernel modules. I've included what I think are the relevant log messages below. I've tried installing everything (minus iser and tvflash) or just the modules needed for SRP. I've installed 1.1 RC7 successfully on other RedHat servers without any problems. I am installing as root. Any help would be appreciated. Thanks. -Chris ============================================== + make kernel Building kernel modules Kernel version: 2.6.16.21-0.8-smp Modules directory: //lib/modules/2.6.16.21-0.8-smp Kernel sources: /lib/modules/2.6.16.21-0.8-smp/build env EXTRA_CFLAGS=" -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib \ -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug" \ make -C /lib/modules/2.6.16.21-0.8-smp/build SUBDIRS="/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband" KERNELRELEASE=2.6.16.21-0.8-smp \ EXTRAVERSION=.21-0.8-smp V=1 \ CONFIG_INFINIBAND=m \ CONFIG_INFINIBAND_IPOIB=m \ CONFIG_INFINIBAND_SDP= \ CONFIG_INFINIBAND_SRP=m \ CONFIG_INFINIBAND_USER_MAD=m \ CONFIG_INFINIBAND_USER_ACCESS=m \ CONFIG_INFINIBAND_ADDR_TRANS=y \ CONFIG_INFINIBAND_MTHCA=m \ CONFIG_INFINIBAND_IPOIB_DEBUG=y \ CONFIG_INFINIBAND_ISER= \ CONFIG_INFINIBAND_EHCA= \ CONFIG_INFINIBAND_RDS= \ CONFIG_INFINIBAND_RDS_DEBUG= \ CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ CONFIG_INFINIBAND_SDP_DEBUG= \ CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ CONFIG_INFINIBAND_IPATH= \ CONFIG_INFINIBAND_MTHCA_DEBUG=y \ CONFIG_INFINIBAND_MADEYE= \ LINUXINCLUDE='-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include \ -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ -Iinclude \ $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ -include include/linux/autoconf.h \ -include /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h \ ' \ modules make[1]: Entering directory `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' make[1]: *** No rule to make target `modules'. Stop. make[1]: Leaving directory `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.92052 (%install) RPM build errors: user vlad does not exist - using root group mtl does not exist - using root user vlad does not exist - using root group mtl does not exist - using root Bad exit status from /var/tmp/rpm-tmp.92052 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libmthca --with-opensm --with-librdmacm --with-openib-diags --with-srptools --with-mstflint --with-perftest --with-ipoib-mod --with-mthca-mod --with-srp-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define 'KVERSION 2.6.16.21-0.8-smp' --define 'KSRC /lib/modules/2.6.16.21-0.8-smp/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 0' --define 'NETWORK_CONF_DIR /etc/sysconfig/network' --define 'modprobe_update 1' --define 'include_ipoib_conf 0' --define 'build_32bit 0' /root/OFED-1.1-rc7/SRPMS/openib-1.1-0.src.rpm" =================================================== smx32:~ # uname -a Linux linux-yeez 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686 i686 i386 GNU/Linux smx32:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp Module.symvers -- Chris Dennett Design Engineer Texas Memory Systems, Inc. 713-266-3200 x430 Chris.Dennett at texmemsys.com From parks at lanl.gov Tue Oct 17 13:12:48 2006 From: parks at lanl.gov (Parks Fields) Date: Tue, 17 Oct 2006 14:12:48 -0600 Subject: [openib-general] ethtool support for ipoib In-Reply-To: References: Message-ID: <7.0.1.0.2.20061017141218.025718f0@lanl.gov> > >No, it's never a good idea to turn off TCP or IP checksums. That >leads to possibilities of silent data corruption too easily. I totally agree... ***** Correspondence ***** This email contains no programmatic content that requires independent ADC review From dledford at redhat.com Tue Oct 17 13:16:06 2006 From: dledford at redhat.com (Doug Ledford) Date: Tue, 17 Oct 2006 16:16:06 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061017150957.GB22531@mellanox.co.il> References: <1161097275.2917.407.camel@fc6.xsintricity.com> <20061017150957.GB22531@mellanox.co.il> Message-ID: <1161116166.2917.434.camel@fc6.xsintricity.com> On Tue, 2006-10-17 at 17:09 +0200, Michael S. Tsirkin wrote: > > Yeah, this is the rolling updates thing I was telling you about. The > > Beta1 kernel was 2.6.17+several git repos and patches. We've since > > updated to 2.6.18, but that was released as an update to the Beta1 isos > > and trees via RHN. So, I don't think you'll see the kernel unless you > > either 1) use up2date to refresh the beta system > > Will that get me the sources too? Evidently, I was mistaken and rhn is still populated with the beta1 rpms. So, I've made the latest kernel available on my web page as referenced below (amongst other rpms as well). However, it may still be a while before the rpms are fully populated as I've had to request an increase to my quota limit on that web server in order to hold the kernel rpms tree. > > or 2) download later > > iso images and look at the kernel present. The current kernel version > > is 2.6.18-1.2717.el5. > > So, I'd like to help, but how can one get the updated kernel source? > Are the iso's with updated sources available somewhere? > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From kliteyn at dev.mellanox.co.il Tue Oct 17 13:21:49 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 17 Oct 2006 22:21:49 +0200 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: <1161110368.32093.410624.camel@hal.voltaire.com> References: <4534FFA7.3000402@dev.mellanox.co.il> <1161110368.32093.410624.camel@hal.voltaire.com> Message-ID: <45353B5D.3020806@dev.mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2006-10-17 at 12:07, Yevgeny Kliteynik wrote: >> Hi Hal >> >> Fixing more things in the multicast test flow. >> >> Still have things to do in case when multicast group removal >> fails, and have to add some cleanup (as we've discussed previously). >> -- >> Yevgeny >> >> Signed-off-by: Yevgeny Kliteynik > > Looks good. One question below. > > -- Hal > >> Index: osmtest/osmt_multicast.c >> =================================================================== >> --- osmtest/osmt_multicast.c (revision 9856) >> +++ osmtest/osmt_multicast.c (working copy) > > [snip...] > >> @@ -3261,6 +3306,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons >> &mc_req_rec, >> comp_mask, >> &res_sa_mad ); >> + status = IB_SUCCESS; > > This doesn't look right to me. Right, this must be some cut-and-paste bug. This line shouldn't be there. Good catch. Thanks. -- Yevgeny. >> if (status != IB_SUCCESS) >> { >> osm_log( &p_osmt->log, OSM_LOG_ERROR, >> @@ -3274,6 +3320,10 @@ osmt_run_mcast_flow( IN osmtest_t * cons >> fail_to_delete_mcg++; >> } >> } >> + else >> + { >> + end_ipoib_cnt++; >> + } >> p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); >> } >> > > [snip...] > > From mst at mellanox.co.il Tue Oct 17 13:23:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 22:23:25 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161116166.2917.434.camel@fc6.xsintricity.com> References: <1161116166.2917.434.camel@fc6.xsintricity.com> Message-ID: <20061017202325.GQ23922@mellanox.co.il> Quoting Doug Ledford : > Evidently, I was mistaken and rhn is still populated with the beta1 > rpms. So, I've made the latest kernel available on my web page as > referenced below (amongst other rpms as well). However, it may still be > a while before the rpms are fully populated as I've had to request an > increase to my quota limit on that web server in order to hold the > kernel rpms tree. When available, I gather they will be here: http://people.redhat.com/dledford/Infiniband/kernel/2.6.18/1.2729.el5/src/ is that right? -- MST From mst at mellanox.co.il Tue Oct 17 13:28:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 22:28:18 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161116166.2917.434.camel@fc6.xsintricity.com> References: <1161116166.2917.434.camel@fc6.xsintricity.com> Message-ID: <20061017202818.GR23922@mellanox.co.il> On a tangent, is there a way to set up a cross-build environment that will build kernel modules for e.g. RHEL amd64 kernel on a 32 bit machine? I'm doing this now with gcc and kernel.org kernel I built myself from source. I guess I mostly need to get gcc and binutils SRPMs to generate cross-compiling tools - has anyone done that? -- MST From halr at voltaire.com Tue Oct 17 13:31:20 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 16:31:20 -0400 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: <45353B5D.3020806@dev.mellanox.co.il> References: <4534FFA7.3000402@dev.mellanox.co.il> <1161110368.32093.410624.camel@hal.voltaire.com> <45353B5D.3020806@dev.mellanox.co.il> Message-ID: <1161117027.32093.414606.camel@hal.voltaire.com> On Tue, 2006-10-17 at 16:21, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Tue, 2006-10-17 at 12:07, Yevgeny Kliteynik wrote: > >> Hi Hal > >> > >> Fixing more things in the multicast test flow. > >> > >> Still have things to do in case when multicast group removal > >> fails, and have to add some cleanup (as we've discussed previously). > >> -- > >> Yevgeny > >> > >> Signed-off-by: Yevgeny Kliteynik > > > > Looks good. One question below. > > > > -- Hal > > > >> Index: osmtest/osmt_multicast.c > >> =================================================================== > >> --- osmtest/osmt_multicast.c (revision 9856) > >> +++ osmtest/osmt_multicast.c (working copy) > > > > [snip...] > > > >> @@ -3261,6 +3306,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons > >> &mc_req_rec, > >> comp_mask, > >> &res_sa_mad ); > >> + status = IB_SUCCESS; > > > > This doesn't look right to me. > > Right, this must be some cut-and-paste bug. > This line shouldn't be there. Good catch. > Thanks. Thanks. Applied with some cosmetic changes. -- Hal > -- > Yevgeny. > > >> if (status != IB_SUCCESS) > >> { > >> osm_log( &p_osmt->log, OSM_LOG_ERROR, > >> @@ -3274,6 +3320,10 @@ osmt_run_mcast_flow( IN osmtest_t * cons > >> fail_to_delete_mcg++; > >> } > >> } > >> + else > >> + { > >> + end_ipoib_cnt++; > >> + } > >> p_mgrp = (osmtest_mgrp_t*)cl_qmap_next( &p_mgrp->map_item ); > >> } > >> > > > > [snip...] > > > > From dledford at redhat.com Tue Oct 17 13:39:31 2006 From: dledford at redhat.com (Doug Ledford) Date: Tue, 17 Oct 2006 16:39:31 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061017202325.GQ23922@mellanox.co.il> References: <1161116166.2917.434.camel@fc6.xsintricity.com> <20061017202325.GQ23922@mellanox.co.il> Message-ID: <1161117571.2917.440.camel@fc6.xsintricity.com> On Tue, 2006-10-17 at 22:23 +0200, Michael S. Tsirkin wrote: > Quoting Doug Ledford : > > Evidently, I was mistaken and rhn is still populated with the beta1 > > rpms. So, I've made the latest kernel available on my web page as > > referenced below (amongst other rpms as well). However, it may still be > > a while before the rpms are fully populated as I've had to request an > > increase to my quota limit on that web server in order to hold the > > kernel rpms tree. > > When available, I gather they will be here: > http://people.redhat.com/dledford/Infiniband/kernel/2.6.18/1.2729.el5/src/ > is that right? > Yep. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From xma at us.ibm.com Tue Oct 17 13:41:32 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 17 Oct 2006 13:41:32 -0700 Subject: [openib-general] ethtool support for ipoib In-Reply-To: <7.0.1.0.2.20061017141218.025718f0@lanl.gov> Message-ID: Parks Fields wrote on 10/17/2006 01:12:48 PM: > > > > >No, it's never a good idea to turn off TCP or IP checksums. That > >leads to possibilities of silent data corruption too easily. > > I totally agree... Have we ever seen silent data corruption in CHECKSUM_HW? Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Oct 17 14:12:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:12:09 -0700 Subject: [openib-general] ethtool support for ipoib In-Reply-To: (Shirley Ma's message of "Tue, 17 Oct 2006 13:41:32 -0700") References: Message-ID: Shirley> Have we ever seen silent data corruption in CHECKSUM_HW? Well, a quick web search finds stuff like http://my.adsm.org/modules.php?op=modload&name=phpBB_14&file=index&action=viewtopic&topic=2362&0 But what I was really talking about was the risk of sending IP packets without a checksum. It may be fine within the IB fabric, because the ICRC protects traffic end-to-end. But if you have a gateway/router between the IPoIB-connected-mode fabric and some other IP network, then that router would have to generate the TCP checksums, which means that router potentially becomes a source of silent corruption. There are interesting surveys such as http://portal.acm.org/citation.cfm?id=347561 about this. - R. From rdreier at cisco.com Tue Oct 17 14:22:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:22:04 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017193307.GM23922@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 21:33:07 +0200") References: <20061017193307.GM23922@mellanox.co.il> Message-ID: > Very strange. Let's consider amd64: libibverbs has > > #elif defined(__x86_64__) > > #define mb() asm volatile("" ::: "memory") > > So its just a compiler barrier there. > > While linux has asm-x86_64/system.h > > #define rmb() asm volatile("lfence":::"memory") > > So rmb seems to be stronger than mb: it will prevent the CPU from reordering > reads while mb won't. OK, that's a difference between the kernel and libibverbs -- and it may be a bug. I have a faint memory of deciding when I wrote the code that mfence/lfence were only needed for dealing with non-temporal stores, but looking at asm-x86_64 I see /* * Force strict CPU ordering. * And yes, this is required on UP too when we're talking * to devices. */ #define mb() asm volatile("mfence":::"memory") #define rmb() asm volatile("lfence":::"memory") so maybe this is wrong. I know that x86 can do loads speculatively and out of order, so perhaps we are living dangerously. Another confusing thing is that asm-i386 defines mb() and rmb() just to be compiler barriers, but I would think that the same ordering issues apply in 32-bit mode. But of course not all x86 processors support lfence/mfence which leads to some ugly issues of how to handle this -- runtime detection seems important but I don't know a good way to do that. Probably the best thing would be just to do "lock; addl $0,0(%%esp)" by default and add a special compile flag or something to enable mfence. - R. From rdreier at cisco.com Tue Oct 17 14:24:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:24:38 -0700 Subject: [openib-general] client-server small message performance issues In-Reply-To: <20061017185330.GA2450@quasar.osc.edu> (Pete Wyckoff's message of "Tue, 17 Oct 2006 14:53:30 -0400") References: <20061017185330.GA2450@quasar.osc.edu> Message-ID: > Basic ping pong is 25 us. That's fine as this is not a particularly > optimal way to communicate. Each additional server adds 6 us. That > seems like a lot of overhead just to do another pair of posts and > polls, but not my major complaint. Look at the jump from 6 to 7 > servers, 41 us. Beyond that, too. And the standard deviation > becomes huge. A plot of the individual values shows a large spread, > not just a few outliers. > The hardware is all Mellanox MT25204 I would guess you are seeing the effect of exceeding the size of some internal HCA cache, maybe the QP state cache. But I don't know enough details of the HCA internals to know if this is true and if so which limit you're hitting. - R. From mst at mellanox.co.il Tue Oct 17 14:31:21 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 23:31:21 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017213121.GS23922@mellanox.co.il> Quoting r. Roland Dreier : > But of course not all x86 processors support lfence/mfence True, but I dont think anyone us still running libibverbs on processors that don't. What happens if an older processors when you call lfence? -- MST From rdreier at cisco.com Tue Oct 17 14:32:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:32:39 -0700 Subject: [openib-general] [PATCH] Use time_after_eq() instead of time_after() in queue_req() In-Reply-To: <453512D9.5030703@ichips.intel.com> (Sean Hefty's message of "Tue, 17 Oct 2006 10:28:57 -0700") References: <20061017043909.4891.4421.sendpatchset@localhost.localdomain> <453512D9.5030703@ichips.intel.com> Message-ID: > Roland, this looks good for 2.6.20. How would you like to handle > pulling in patches like these? Once OFA has git up, would it be > easier to pull them into my git tree, then request that you pull from > there, or does this work okay? Git pulls are definitely the easiest, but I'm fine with applying patches from email too (git has good tools for that). However it does make my life easier if the patch applies cleanly. In this case I had the following problems (I applied it to for-2.6.20 anyway): > [PATCH] Use time_after_eq() instead of time_after() in queue_req() Please add something like "RDMA/addr: " before the "Use" there, so that someone skimming the kernel log knows what subsystem/specific area the patch touches. (I added that by hand) > -------- Git just wants three -s like "---" between changelog entry and actual patch. > diff -ruNp org/drivers/infiniband/core/addr.c new/drivers/infiniband/core/addr.c > --- org/drivers/infiniband/core/addr.c 2006-10-09 16:54:37.000000000 +0530 > +++ new/drivers/infiniband/core/addr.c 2006-10-09 16:55:36.000000000 +0530 > @@ -118,7 +118,7 @@ static void queue_req(struct addr_req *r > > mutex_lock(&lock); > list_for_each_entry_reverse(temp_req, &req_list, list) { > - if (time_after(req->timeout, temp_req->timeout)) > + if (time_after_eq(req->timeout, temp_req->timeout)) > break; > } > the last line in the original mail was blank, when it should have a single space. This makes git complain (correctly) about a corrupt patch. Please make sure your mailer doesn't corrupt whitespace. - R. From rdreier at cisco.com Tue Oct 17 14:34:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:34:39 -0700 Subject: [openib-general] [PATCH] Rewrite cma_req_handler() to encapsulate common code. In-Reply-To: <4535195E.5060601@ichips.intel.com> (Sean Hefty's message of "Tue, 17 Oct 2006 10:56:46 -0700") References: <20061017043911.4891.28143.sendpatchset@localhost.localdomain> <4535195E.5060601@ichips.intel.com> Message-ID: OK, queued for 2.6.20 From dledford at redhat.com Tue Oct 17 14:34:13 2006 From: dledford at redhat.com (Doug Ledford) Date: Tue, 17 Oct 2006 17:34:13 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061017202818.GR23922@mellanox.co.il> References: <1161116166.2917.434.camel@fc6.xsintricity.com> <20061017202818.GR23922@mellanox.co.il> Message-ID: <1161120853.2917.475.camel@fc6.xsintricity.com> On Tue, 2006-10-17 at 22:28 +0200, Michael S. Tsirkin wrote: > On a tangent, is there a way to set up a cross-build environment that will > build kernel modules for e.g. RHEL amd64 kernel on a 32 bit machine? > I'm doing this now with gcc and kernel.org kernel I built myself from source. > I guess I mostly need to get gcc and binutils SRPMs to generate > cross-compiling tools - has anyone done that? At least for Red Hat, rpm already mostly supports this with only a few examples of breakage (apps needing gfortran like openmpi are an example that might break depending on usage). This is one of the reason I totally ignore the install.sh script in the OFED releases. For any arch that supports multiple run time variants, the default installation installs compilers for all supported run time variants, but not necessarily for other variants (aka, on x86_64 gcc will support x86_64 or ia32, but ia32 won't necessarily support x86_64, and neither will necessarily support building ppc or ppc64 or ia64 or s390(x)). You can call rpmbuild with the --target option to specify the mode you want the package built as, and in the process that automatically changes all of the configure options present as part of the %configure macro of rpm to the right paths (hence why I also strip out all of the %_libdir and friends settings from the spec file, rpm gets this right itself) and changes the CFLAGS and CPPFLAGS environment variables to force compiling in the right mode. Now, that being said, kernel modules in particular are a different beast. I originally had a module build kit for the 2.4 kernels that would cross build kernel modules. That's long since deprecated though. Now a days, the rule is to build a proper kernel looking source tree, install the kernel-devel package, then the build command is basically something like: cd /lib/modules/`uname -r`/build make SUBDIRS= In order to cross compile, you would likely need to pass ARCH= to the make command line. In addition, for a cross compile you very well may need to install the kernel-devel from that arch instead of the native one. However, given the limitations I listed above about cross compiling, it's likely that you can only cross compile from certain arches to certain other arches. All that being said, a kernel-ib spec file could include something like this: BuildConflicts: kernel-devel (I'm not sure build conflicts is a proper tag, if not, you might have to script a little more carefully in %setup) %prep %setup -q rpm -i /usr/src/redhat/RPMS/%{arch}/kernel-devel-`uname -r`.%{arch}.rpm %build cd /lib/modules/`uname -r`/build make ARCH=%{arch} SUBDIRS=${RPM_BUILD_DIR}/%{name}-%{version} modules %install cd /lib/modules/`uname -r`/build make ARCH=%{arch} SUBDIRS=${RPM_BUILD_DIR}/%{name}-%{version} INSTALL_MOD_PATH=${RPM_BUILD_ROOT} modules_install %clean rm -fr ${RPM_BUILD_ROOT} rpm -e kernel-devel Then, to cross compile, you would simply run rpmbuild multiple times: for i in i686 x86_64; do rpmbuild --ba --target=$i kernel-ib.spec done BTW, all the rpms are live on my site now. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Tue Oct 17 14:37:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 23:37:06 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017213706.GT23922@mellanox.co.il> Quoting r. Roland Dreier : > Another confusing thing is that asm-i386 defines mb() and rmb() just > to be compiler barriers, I see: #define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2) as for mb() - I don't thnk our kernel code uses that so I think userspace should switch to wmb as well. wmb isjust a compiler barrier on most arhitectures. -- MST From rdreier at cisco.com Tue Oct 17 14:36:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:36:01 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017213121.GS23922@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 23:31:21 +0200") References: <20061017213121.GS23922@mellanox.co.il> Message-ID: Michael> True, but I dont think anyone us still running libibverbs Michael> on processors that don't. What happens if an older Michael> processors when you call lfence? You get an illegal instruction signal and the process dies I guess. From rdreier at cisco.com Tue Oct 17 14:40:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:40:13 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017213706.GT23922@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 23:37:06 +0200") References: <20061017213706.GT23922@mellanox.co.il> Message-ID: > > Another confusing thing is that asm-i386 defines mb() and rmb() just > > to be compiler barriers, > > I see: > #define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2) Oops, you're right. I misread that file. OK, we probably want mb() to be more than a compiler barrier on i386 and x86-64. I'll fix up the libibverbs code. > as for mb() - I don't thnk our kernel code uses that so I think userspace > should switch to wmb as well. wmb isjust a compiler barrier on most > arhitectures. I'm not sure it's worth the trouble to split up the two cases at this point. Does it make a bit performance difference? - R. From mst at mellanox.co.il Tue Oct 17 14:42:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 23:42:18 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017214218.GU23922@mellanox.co.il> Quoting r. Roland Dreier : > But of course not all x86 processors > support lfence/mfence which leads to some ugly issues of how to handle > this lfence seems to be part of SSE2, and I don't think we really need sfence/mfence. We can just require SSE2 support: http://en.wikipedia.org/wiki/SSE2#CPUs_supporting_SSE2 > > -- runtime detection seems important but I don't know a good way > to do that. Well, at startup we can read /proc/cpuinfo and look for sse2 in the flags: line. Seems simple enough. > Probably the best thing would be just to do "lock; addl > $0,0(%%esp)" by default and add a special compile flag or something to > enable mfence. I hope we can do something without compile flags - most people don't know enough to turn them on, and distros commonly compile for least common denominator. -- MST From rdreier at cisco.com Tue Oct 17 14:44:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:44:19 -0700 Subject: [openib-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This includes various fixes found since 2.6.19-rc2: Adrian Bunk: RDMA/amso1100: Fix a NULL dereference in error path Arthur Kepner: IB/mthca: Use mmiowb after doorbell ring Henrik Kretzschmar: RDMA/amso1100: pci_module_init() conversion Robert Walsh: IB/ipath: Initialize diagpkt file on device init only drivers/infiniband/hw/amso1100/c2.c | 2 - drivers/infiniband/hw/amso1100/c2_rnic.c | 4 +- drivers/infiniband/hw/ipath/ipath_diag.c | 65 ++++++++++++++++------------ drivers/infiniband/hw/ipath/ipath_driver.c | 10 ---- drivers/infiniband/hw/ipath/ipath_kernel.h | 3 - drivers/infiniband/hw/mthca/mthca_cq.c | 7 +++ drivers/infiniband/hw/mthca/mthca_qp.c | 19 ++++++++ drivers/infiniband/hw/mthca/mthca_srq.c | 8 +++ 8 files changed, 75 insertions(+), 43 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2.c b/drivers/infiniband/hw/amso1100/c2.c index dc1ebea..9e7bd94 100644 --- a/drivers/infiniband/hw/amso1100/c2.c +++ b/drivers/infiniband/hw/amso1100/c2.c @@ -1243,7 +1243,7 @@ static struct pci_driver c2_pci_driver = static int __init c2_init_module(void) { - return pci_module_init(&c2_pci_driver); + return pci_register_driver(&c2_pci_driver); } static void __exit c2_exit_module(void) diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index e37c568..30409e1 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -150,8 +150,8 @@ static int c2_rnic_query(struct c2_dev * (struct c2wr_rnic_query_rep *) (unsigned long) (vq_req->reply_msg); if (!reply) err = -ENOMEM; - - err = c2_errno(reply); + else + err = c2_errno(reply); if (err) goto bail2; diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c index 29958b6..28c087b 100644 --- a/drivers/infiniband/hw/ipath/ipath_diag.c +++ b/drivers/infiniband/hw/ipath/ipath_diag.c @@ -67,19 +67,54 @@ static struct file_operations diag_file_ .release = ipath_diag_release }; +static ssize_t ipath_diagpkt_write(struct file *fp, + const char __user *data, + size_t count, loff_t *off); + +static struct file_operations diagpkt_file_ops = { + .owner = THIS_MODULE, + .write = ipath_diagpkt_write, +}; + +static atomic_t diagpkt_count = ATOMIC_INIT(0); +static struct cdev *diagpkt_cdev; +static struct class_device *diagpkt_class_dev; + int ipath_diag_add(struct ipath_devdata *dd) { char name[16]; + int ret = 0; + + if (atomic_inc_return(&diagpkt_count) == 1) { + ret = ipath_cdev_init(IPATH_DIAGPKT_MINOR, + "ipath_diagpkt", &diagpkt_file_ops, + &diagpkt_cdev, &diagpkt_class_dev); + + if (ret) { + ipath_dev_err(dd, "Couldn't create ipath_diagpkt " + "device: %d", ret); + goto done; + } + } snprintf(name, sizeof(name), "ipath_diag%d", dd->ipath_unit); - return ipath_cdev_init(IPATH_DIAG_MINOR_BASE + dd->ipath_unit, name, - &diag_file_ops, &dd->diag_cdev, - &dd->diag_class_dev); + ret = ipath_cdev_init(IPATH_DIAG_MINOR_BASE + dd->ipath_unit, name, + &diag_file_ops, &dd->diag_cdev, + &dd->diag_class_dev); + if (ret) + ipath_dev_err(dd, "Couldn't create %s device: %d", + name, ret); + +done: + return ret; } void ipath_diag_remove(struct ipath_devdata *dd) { + if (atomic_dec_and_test(&diagpkt_count)) + ipath_cdev_cleanup(&diagpkt_cdev, &diagpkt_class_dev); + ipath_cdev_cleanup(&dd->diag_cdev, &dd->diag_class_dev); } @@ -275,30 +310,6 @@ bail: return ret; } -static ssize_t ipath_diagpkt_write(struct file *fp, - const char __user *data, - size_t count, loff_t *off); - -static struct file_operations diagpkt_file_ops = { - .owner = THIS_MODULE, - .write = ipath_diagpkt_write, -}; - -static struct cdev *diagpkt_cdev; -static struct class_device *diagpkt_class_dev; - -int __init ipath_diagpkt_add(void) -{ - return ipath_cdev_init(IPATH_DIAGPKT_MINOR, - "ipath_diagpkt", &diagpkt_file_ops, - &diagpkt_cdev, &diagpkt_class_dev); -} - -void __exit ipath_diagpkt_remove(void) -{ - ipath_cdev_cleanup(&diagpkt_cdev, &diagpkt_class_dev); -} - /** * ipath_diagpkt_write - write an IB packet * @fp: the diag data device file pointer diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 12cefa6..b4ffaa7 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -2005,18 +2005,8 @@ static int __init infinipath_init(void) goto bail_group; } - ret = ipath_diagpkt_add(); - if (ret < 0) { - printk(KERN_ERR IPATH_DRV_NAME ": Unable to create " - "diag data device: error %d\n", -ret); - goto bail_ipathfs; - } - goto bail; -bail_ipathfs: - ipath_exit_ipathfs(); - bail_group: ipath_driver_remove_group(&ipath_driver.driver); diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 7c43669..06d5020 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -869,9 +869,6 @@ int ipath_device_create_group(struct dev void ipath_device_remove_group(struct device *, struct ipath_devdata *); int ipath_expose_reset(struct device *); -int ipath_diagpkt_add(void); -void ipath_diagpkt_remove(void); - int ipath_init_ipathfs(void); void ipath_exit_ipathfs(void); int ipathfs_add_device(struct ipath_devdata *); diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index e393681..149b369 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -39,6 +39,8 @@ #include #include +#include + #include #include "mthca_dev.h" @@ -210,6 +212,11 @@ static inline void update_cons_index(str mthca_write64(doorbell, dev->kar + MTHCA_CQ_DOORBELL, MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + /* + * Make sure doorbells don't leak out of CQ spinlock + * and reach the HCA out of order: + */ + mmiowb(); } } diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 5e5c58b..6a7822e 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -39,6 +39,8 @@ #include #include #include +#include + #include #include #include @@ -1732,6 +1734,11 @@ out: mthca_write64(doorbell, dev->kar + MTHCA_SEND_DOORBELL, MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + /* + * Make sure doorbells don't leak out of SQ spinlock + * and reach the HCA out of order: + */ + mmiowb(); } qp->sq.next_ind = ind; @@ -1851,6 +1858,12 @@ out: qp->rq.next_ind = ind; qp->rq.head += nreq; + /* + * Make sure doorbells don't leak out of RQ spinlock and reach + * the HCA out of order: + */ + mmiowb(); + spin_unlock_irqrestore(&qp->rq.lock, flags); return err; } @@ -2112,6 +2125,12 @@ out: MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); } + /* + * Make sure doorbells don't leak out of SQ spinlock and reach + * the HCA out of order: + */ + mmiowb(); + spin_unlock_irqrestore(&qp->sq.lock, flags); return err; } diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 92a72f5..f5d7677 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -35,6 +35,8 @@ #include #include +#include + #include "mthca_dev.h" #include "mthca_cmd.h" #include "mthca_memfree.h" @@ -595,6 +597,12 @@ int mthca_tavor_post_srq_recv(struct ib_ MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); } + /* + * Make sure doorbells don't leak out of SRQ spinlock and + * reach the HCA out of order: + */ + mmiowb(); + spin_unlock_irqrestore(&srq->lock, flags); return err; } From mst at mellanox.co.il Tue Oct 17 14:48:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 23:48:28 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161120853.2917.475.camel@fc6.xsintricity.com> References: <1161120853.2917.475.camel@fc6.xsintricity.com> Message-ID: <20061017214828.GV23922@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: RHEL5 and OFED ... > > On Tue, 2006-10-17 at 22:28 +0200, Michael S. Tsirkin wrote: > > On a tangent, is there a way to set up a cross-build environment that will > > build kernel modules for e.g. RHEL amd64 kernel on a 32 bit machine? > > I'm doing this now with gcc and kernel.org kernel I built myself from source. > > I guess I mostly need to get gcc and binutils SRPMs to generate > > cross-compiling tools - has anyone done that? > > At least for Red Hat, rpm already mostly supports this with only a few > examples of breakage (apps needing gfortran like openmpi are an example > that might break depending on usage). So, you are saying I shuld rpmbuild binutils and gcc rpms? Hmm, I'll give it a try. > You can > call rpmbuild with the --target option to specify the mode you want the > package built as Hmm, no, I really want to take a srpm from amd64 and get a 32 bit gcc executable that will build 64 bit binaries that match these built on native amd64 system exectly. -- MST From rdreier at cisco.com Tue Oct 17 14:47:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 14:47:41 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061017214218.GU23922@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 17 Oct 2006 23:42:18 +0200") References: <20061017214218.GU23922@mellanox.co.il> Message-ID: > Well, at startup we can read /proc/cpuinfo and look for sse2 in the flags: line. > Seems simple enough. Detecting SSE2 is easy -- we could just do the cpuid ourselves if we wanted to. The problem is what do you do when you see that the CPU does or doesn't have the instruction? The runtime patching that the kernel does is way too complicated, and if you're going to move mb() out of line then just doing a regular serializing instruction is probably just as good. > I hope we can do something without compile flags - most people > don't know enough to turn them on, and distros commonly > compile for least common denominator. And rightfully so -- the default compile of libibverbs/libmthca should run on a least common denominator CPU. - R. From mst at mellanox.co.il Tue Oct 17 14:50:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 23:50:31 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017215031.GW23922@mellanox.co.il> Quoting r. Roland Dreier : > > as for mb() - I don't thnk our kernel code uses that so I think userspace > > should switch to wmb as well. wmb isjust a compiler barrier on most > > arhitectures. > > I'm not sure it's worth the trouble to split up the two cases at this > point. Shouldn't be hard - just look at kernel code and make userspac to match. > Does it make a bit performance difference? I imagine it does, latency-wise. -- MST From mst at mellanox.co.il Tue Oct 17 14:58:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 17 Oct 2006 23:58:24 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061017215824.GX23922@mellanox.co.il> Quoting r. Roland Dreier : > Detecting SSE2 is easy -- we could just do the cpuid ourselves if we > wanted to. The problem is what do you do when you see that the CPU > does or doesn't have the instruction? The runtime patching that the > kernel does is way too complicated, and if you're going to move mb() > out of line then just doing a regular serializing instruction is > probably just as good. Maybe just do the test, print a warning and exit for now? I don't think anyone is gonnu run libibverbs on Pentium III. -- MST From HNGUYEN at de.ibm.com Tue Oct 17 15:13:43 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 18 Oct 2006 00:13:43 +0200 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: Message-ID: Hi Troy! > I am running PVFS2 on OpenIB, with IBM's ehca. > When we start writing/reading large files, either with the NetPIPE > PVFS module we have or a modified GAMESS executable that uses > libpvfs2 directly, the 'ibv_reg_mr' function fails, and we get an error. > This is also correlated with kernel log messages like this: > Oct 16 11:14:45 p5l8 kernel: PU0003 000e0091:ehca_hcall_7arg_7ret > HCAD_ERROR opco > de=160 ret=fffffffffffffff7 arg1=1000000003000004 arg2=5 > arg3=14f0ebc8 arg4=10000 > arg5=e0000000000000 arg6=e3e9f200 arg7=0 out1=0 out2=0 out3=0 out4=0 > out5=0 out6=0 > out7=0 Return code f7 from firmware/hvcall means H_NO_MEM. I'm wondering if you could provide me with some pre-history of this problem. Is this a permanent problem? If yes, could you give me more infos on your testcase resp. scenario eg large file size, NetPIPE options? Which version of ehca are you using? And which kernel version? Thanks! Hoang-Nam Nguyen From parks at lanl.gov Tue Oct 17 15:35:13 2006 From: parks at lanl.gov (Parks Fields) Date: Tue, 17 Oct 2006 16:35:13 -0600 Subject: [openib-general] ethtool support for ipoib In-Reply-To: References: <7.0.1.0.2.20061017141218.025718f0@lanl.gov> Message-ID: <7.0.1.0.2.20061017163416.024b4d88@lanl.gov> >\ >Have we ever seen silent data corruption in CHECKSUM_HW? Here at lanl we have seen silent corruption on other types of networks but not IB yet that we know of. So we are a little gun shy... ***** Correspondence ***** This email contains no programmatic content that requires independent ADC review -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnip at sgi.com Tue Oct 17 15:38:41 2006 From: johnip at sgi.com (John Partridge) Date: Tue, 17 Oct 2006 17:38:41 -0500 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> Message-ID: <45355B71.8020606@sgi.com> I'm going back and comparing analyzer traces with the fix and without and the machine doing an MCA. John Roland Dreier wrote: > chas> i would guess the read to the mmio region is flushing the > chas> writes to the config register but the read happens "too > chas> soon" after those writes. on a more mundance computer, the > chas> write/write/read probably wouldnt be batched together. > > config writes can't be posted though, so that doesn't make sense. > > - R. -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From sweitzen at cisco.com Tue Oct 17 15:45:42 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 17 Oct 2006 15:45:42 -0700 Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 Message-ID: You need the kernel-source RPM, I guess the OFED install.sh should check for that RPM. svbu-qa-opteron-1:~ # uname -a Linux svbu-qa-opteron-1 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i68 6 athlon i386 GNU/Linux svbu-qa-opteron-1:~ # rpm -qa | fgrep kernel kernel-source-2.6.16.21-0.8 kernel-smp-2.6.16.21-0.8 kernel-ib-1.1-2.6.16.21_0.8_smp kernel-ib-devel-1.1-2.6.16.21_0.8_smp svbu-qa-opteron-1:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp .config Makefile arch include2 .kernelrelease Module.symvers include scripts Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Chris Dennett > Sent: Tuesday, October 17, 2006 12:46 PM > To: openfabrics-ewg at openib.org; openib-general at openib.org > Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 > > I've been trying to install OFED 1.1 RC7 on an x86 server > with a fresh install > of SLES10 (32-bit). It errors out when trying to build the > kernel modules. > I've included what I think are the relevant log messages > below. I've tried > installing everything (minus iser and tvflash) or just the > modules needed for > SRP. I've installed 1.1 RC7 successfully on other RedHat > servers without any > problems. I am installing as root. Any help would be appreciated. > > Thanks. > > -Chris > > ============================================== > + make kernel > Building kernel modules > Kernel version: 2.6.16.21-0.8-smp > Modules directory: //lib/modules/2.6.16.21-0.8-smp > Kernel sources: /lib/modules/2.6.16.21-0.8-smp/build > env EXTRA_CFLAGS=" -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib \ > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug" \ > make -C /lib/modules/2.6.16.21-0.8-smp/build > SUBDIRS="/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband" > KERNELRELEASE=2.6.16.21-0.8-smp \ > EXTRAVERSION=.21-0.8-smp V=1 \ > CONFIG_INFINIBAND=m \ > CONFIG_INFINIBAND_IPOIB=m \ > CONFIG_INFINIBAND_SDP= \ > CONFIG_INFINIBAND_SRP=m \ > CONFIG_INFINIBAND_USER_MAD=m \ > CONFIG_INFINIBAND_USER_ACCESS=m \ > CONFIG_INFINIBAND_ADDR_TRANS=y \ > CONFIG_INFINIBAND_MTHCA=m \ > CONFIG_INFINIBAND_IPOIB_DEBUG=y \ > CONFIG_INFINIBAND_ISER= \ > CONFIG_INFINIBAND_EHCA= \ > CONFIG_INFINIBAND_RDS= \ > CONFIG_INFINIBAND_RDS_DEBUG= \ > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ > CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ > CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ > CONFIG_INFINIBAND_SDP_DEBUG= \ > CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ > CONFIG_INFINIBAND_IPATH= \ > CONFIG_INFINIBAND_MTHCA_DEBUG=y \ > CONFIG_INFINIBAND_MADEYE= \ > LINUXINCLUDE='-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include \ > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ > -Iinclude \ > $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ > -include include/linux/autoconf.h \ > -include > /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h \ > ' \ > modules > make[1]: Entering directory > `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' > make[1]: *** No rule to make target `modules'. Stop. > make[1]: Leaving directory `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' > make: *** [kernel] Error 2 > error: Bad exit status from /var/tmp/rpm-tmp.92052 (%install) > > > RPM build errors: > user vlad does not exist - using root > group mtl does not exist - using root > user vlad does not exist - using root > group mtl does not exist - using root > Bad exit status from /var/tmp/rpm-tmp.92052 (%install) > ERROR: Failed executing "rpmbuild --rebuild --define '_topdir > /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define > 'build_root > /var/tmp/OFED' --define 'configure_options --with-libibcommon > --with-libibmad > --with-libibumad --with-libibverbs --with-libmthca --with-opensm > --with-librdmacm --with-openib-diags --with-srptools --with-mstflint > --with-perftest --with-ipoib-mod --with-mthca-mod --with-srp-mod > --with-core-mod --with-user_mad-mod --with-user_access-mod > --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define > 'KVERSION 2.6.16.21-0.8-smp' --define 'KSRC > /lib/modules/2.6.16.21-0.8-smp/build' --define > 'build_kernel_ib 1' --define > 'build_kernel_ib_devel 0' --define 'NETWORK_CONF_DIR > /etc/sysconfig/network' > --define 'modprobe_update 1' --define 'include_ipoib_conf 0' --define > 'build_32bit 0' /root/OFED-1.1-rc7/SRPMS/openib-1.1-0.src.rpm" > > =================================================== > > smx32:~ # uname -a > Linux linux-yeez 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 > UTC 2006 i686 > i686 i386 GNU/Linux > smx32:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp > Module.symvers > > > > -- > Chris Dennett > Design Engineer > Texas Memory Systems, Inc. > 713-266-3200 x430 > Chris.Dennett at texmemsys.com > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From dledford at redhat.com Tue Oct 17 16:12:52 2006 From: dledford at redhat.com (Doug Ledford) Date: Tue, 17 Oct 2006 19:12:52 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061017214828.GV23922@mellanox.co.il> References: <1161120853.2917.475.camel@fc6.xsintricity.com> <20061017214828.GV23922@mellanox.co.il> Message-ID: <1161126773.2917.494.camel@fc6.xsintricity.com> On Tue, 2006-10-17 at 23:48 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: RHEL5 and OFED ... > > > > On Tue, 2006-10-17 at 22:28 +0200, Michael S. Tsirkin wrote: > > > On a tangent, is there a way to set up a cross-build environment that will > > > build kernel modules for e.g. RHEL amd64 kernel on a 32 bit machine? > > > I'm doing this now with gcc and kernel.org kernel I built myself from source. > > > I guess I mostly need to get gcc and binutils SRPMs to generate > > > cross-compiling tools - has anyone done that? > > > > At least for Red Hat, rpm already mostly supports this with only a few > > examples of breakage (apps needing gfortran like openmpi are an example > > that might break depending on usage). > > So, you are saying I shuld rpmbuild binutils and gcc rpms? > Hmm, I'll give it a try. No. To build an i686 binary on an x86_64 only requires passing -m32 to the compiler. It will then build the 32bit variant instead of the 64bit. I'm not saying rpmbuild gcc and binutils, because the variants installed will already do what you want (assuming building 32bit on 64bit is what you want, but I see that isn't the case on down). > > You can > > call rpmbuild with the --target option to specify the mode you want the > > package built as > > Hmm, no, I really want to take a srpm from amd64 and get a 32 bit > gcc executable that will build 64 bit binaries that match these > built on native amd64 system exectly. Between just i386 and x86_64, you might be able to do that. However, in general, byte for byte identical cross compiling can't be done. That's one of the reasons we build all the packages on the arch they are being built for, so if a user rebuilds a package on the arch then it will most likely match the package we built. If we cross compiled, that would be false more often than true. Minor things like variance in how CPUs handle floating point math and precision of said math effect gcc optimization decisions and change the generated byte code. In any case, I'm certainly no gcc build expert, so I don't know the magic incantations to get the gcc sources to spit out a 32bit binary that builds 64bit code. Far easier would be to go the other way around, run on x86_64 and build for i386, in which case gcc supports that out of the box. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From rdreier at cisco.com Tue Oct 17 16:31:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 16:31:00 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: (Roland Dreier's message of "Tue, 17 Oct 2006 14:47:41 -0700") References: <20061017214218.GU23922@mellanox.co.il> Message-ID: OK, you convinced me to add rmb()/wmb() and use it in libmthca. I just checked a bunch of changes to do that into svn. Please survey the wreckage of libibverbs/libmthca and let me know if you see where I broke anything. For now I just used lock; addl %0 to implement rmb on i386. I'm really not comfortable making libmthca depend on sse2, and I don't see a good way to detect and use sse2 at runtime. Thanks, Roland From xma at us.ibm.com Tue Oct 17 16:48:16 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 17 Oct 2006 16:48:16 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Hi, Roland, There were a couple errors and warning when I applied this patch to OFED-1.1-rc7. 1. ehca_req_notify_cq() in ehca_iverbs.h is not updated. 2. *maybe_missed_event = ipz_qeit_is_valid(my_cq->ipz_queue) should be =ipz_qeit_is_valid(&my_cq->ipz_queue) 3. a compile warning this line return cqe_flags >> 7 == queue->toggle_state & 1; Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From kimbrr at melbourne.sgi.com Tue Oct 17 17:10:16 2006 From: kimbrr at melbourne.sgi.com (Michael Newton) Date: Wed, 18 Oct 2006 10:10:16 +1000 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: <1161093586.32093.399737.camel@hal.voltaire.com> References: <1161093586.32093.399737.camel@hal.voltaire.com> Message-ID: On Tue, 17 Oct 2006, Hal Rosenstock wrote: > On Tue, 2006-10-17 at 09:55, Rimmer, Todd wrote: > > > From: Michael Newton > > > Sent: Tuesday, October 17, 2006 3:02 AM > > > To: openib-general at openib.org > > > Subject: [openib-general] sysfs exposure of port counters useless? > > > > > > > > > These are 32 bit counters. The rcv/xmit_data counters count 32-bit > > > blocks. Also, these counts do not wrap: they peg at all 1s. > > > At infiniband speeds, these counts can peg out very quickly indeed, > > > to the point they can really only be of use if they can be reset each > > time > > > there read. Now if anyone who wants to use them has to go the CLI to > > reset > > > them, and theres little point in reading them without reset, why would > > > anyone read them via sysfs? so why have them? > > > > > > > We have found that while your comment is true for the data movement > > counters, the error counters should not peg quickly, hence it is valid its true i overstated the case just a little;) .. yes error counters should be fine and its mainly the data counters that are problematic (tho now im not sure i havent seen the packet counters freeze when the data ones peg out).. > > to read them without resetting. However it is also useful to have an > > ability to reset them. Of course if there are other CLI commands which > > do this easily, the sysfs info is of less value. > > There are diag tools for this. thats where we started.. the point im making is that exposing the data counters in sysfs is of little use, because if you have to go to other tools to reset, why wouldnt you use them to read as well? i was looking at exposing infiniband stats via PCP (http://oss.sgi.com/projects/pcp/). This would be useful for folk doing IB performance testing. Its very easy to just feed in the sysfs values.. unfortunately they turn out to be of little value. Life would be so much easier if there were 64 bit counters available. Instead I will probably need to have an additional daemon to construct them. From jgunthorpe at obsidianresearch.com Tue Oct 17 17:10:05 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 17 Oct 2006 18:10:05 -0600 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061017214218.GU23922@mellanox.co.il> Message-ID: <20061018001005.GQ4054@obsidianresearch.com> On Tue, Oct 17, 2006 at 04:31:00PM -0700, Roland Dreier wrote: > For now I just used lock; addl %0 to implement rmb on i386. I'm > really not comfortable making libmthca depend on sse2, and I don't see > a good way to detect and use sse2 at runtime. I think the typical way this is done would be to use ld.so's 'hwcap' handling and stick an optimized library in /usr/lib/sse2. Jason From sweitzen at cisco.com Tue Oct 17 17:18:34 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 17 Oct 2006 17:18:34 -0700 Subject: [openib-general] sysfs exposure of port counters useless? Message-ID: I agree the 32-bit byte and packet counters are useless as they get pegged in a few seconds on a busy IB networks. I thought there was an effort in IBTA to fix this. For IB counters in a Cisco switch, we read and reset the 32-bit counters once per second and keep 64-bit counters internally. This would be possible in OF too, right? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Michael Newton > Sent: Tuesday, October 17, 2006 5:10 PM > To: Hal Rosenstock > Cc: openib-general at openib.org > Subject: Re: [openib-general] sysfs exposure of port counters useless? > > On Tue, 17 Oct 2006, Hal Rosenstock wrote: > > On Tue, 2006-10-17 at 09:55, Rimmer, Todd wrote: > > > > From: Michael Newton > > > > Sent: Tuesday, October 17, 2006 3:02 AM > > > > To: openib-general at openib.org > > > > Subject: [openib-general] sysfs exposure of port > counters useless? > > > > > > > > > > > > These are 32 bit counters. The rcv/xmit_data counters > count 32-bit > > > > blocks. Also, these counts do not wrap: they peg at all 1s. > > > > At infiniband speeds, these counts can peg out very > quickly indeed, > > > > to the point they can really only be of use if they can > be reset each > > > time > > > > there read. Now if anyone who wants to use them has to > go the CLI to > > > reset > > > > them, and theres little point in reading them without > reset, why would > > > > anyone read them via sysfs? so why have them? > > > > > > > > > > We have found that while your comment is true for the > data movement > > > counters, the error counters should not peg quickly, > hence it is valid > > its true i overstated the case just a little;) .. yes error counters > should be fine and its mainly the data counters that are problematic > (tho now im not sure i havent seen the packet counters freeze when the > data ones peg out).. > > > > to read them without resetting. However it is also > useful to have an > > > ability to reset them. Of course if there are other CLI > commands which > > > do this easily, the sysfs info is of less value. > > > > There are diag tools for this. > > thats where we started.. the point im making is that exposing the data > counters in sysfs is of little use, because if you have to go to other > tools to reset, why wouldnt you use them to read as well? > > i was looking at exposing infiniband stats via PCP > (http://oss.sgi.com/projects/pcp/). This would be useful for > folk doing IB > performance testing. Its very easy to just feed in the sysfs values.. > unfortunately they turn out to be of little value. Life would > be so much > easier if there were 64 bit counters available. Instead I > will probably > need to have an additional daemon to construct them. > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From greg.lindahl at qlogic.com Tue Oct 17 18:08:21 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Tue, 17 Oct 2006 18:08:21 -0700 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: References: Message-ID: <20061018010821.GA5567@greglaptop.rchland.ibm.com> On Tue, Oct 17, 2006 at 05:18:34PM -0700, Scott Weitzenkamp (sweitzen) wrote: > I agree the 32-bit byte and packet counters are useless as they get > pegged in a few seconds on a busy IB networks. I thought there was an > effort in IBTA to fix this. Yes, it's in the management working group. > For IB counters in a Cisco switch, we read and reset the 32-bit counters > once per second and keep 64-bit counters internally. This would be > possible in OF too, right? Yep. We keep 64 bit counters internally and dumb them down as required to meet the standard. -- greg From rdreier at cisco.com Tue Oct 17 20:41:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 20:41:59 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: (Shirley Ma's message of "Tue, 17 Oct 2006 16:48:16 -0700") References: Message-ID: Sorry, I just noticed my cross-compilation test setup was messed up, so I never actually built the modified ehca, even though I thought I did. Anyway, the patch below on top of what I sent out should fix everything up. I've also merged this into my ipoib-napi branch, so what's there should be OK for ehca now. Anyway, I'm eagerly awaiting your NAPI results with ehca. Thanks, Roland From halr at voltaire.com Tue Oct 17 20:38:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 23:38:42 -0400 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: References: <1161093586.32093.399737.camel@hal.voltaire.com> Message-ID: <1161142710.32093.432016.camel@hal.voltaire.com> On Tue, 2006-10-17 at 20:10, Michael Newton wrote: > On Tue, 17 Oct 2006, Hal Rosenstock wrote: > > On Tue, 2006-10-17 at 09:55, Rimmer, Todd wrote: > > > > From: Michael Newton > > > > Sent: Tuesday, October 17, 2006 3:02 AM > > > > To: openib-general at openib.org > > > > Subject: [openib-general] sysfs exposure of port counters useless? > > > > > > > > > > > > These are 32 bit counters. The rcv/xmit_data counters count 32-bit > > > > blocks. Also, these counts do not wrap: they peg at all 1s. > > > > At infiniband speeds, these counts can peg out very quickly indeed, > > > > to the point they can really only be of use if they can be reset each > > > time > > > > there read. Now if anyone who wants to use them has to go the CLI to > > > reset > > > > them, and theres little point in reading them without reset, why would > > > > anyone read them via sysfs? so why have them? > > > > > > > > > > We have found that while your comment is true for the data movement > > > counters, the error counters should not peg quickly, hence it is valid > > its true i overstated the case just a little;) .. yes error counters > should be fine and its mainly the data counters that are problematic > (tho now im not sure i havent seen the packet counters freeze when the > data ones peg out).. > > > > to read them without resetting. However it is also useful to have an > > > ability to reset them. Of course if there are other CLI commands which > > > do this easily, the sysfs info is of less value. > > > > There are diag tools for this. > > thats where we started.. Guess I missed that. > the point im making is that exposing the data > counters in sysfs is of little use, because if you have to go to other > tools to reset, why wouldnt you use them to read as well? You can. They support this. > i was looking at exposing infiniband stats via PCP > (http://oss.sgi.com/projects/pcp/). This would be useful for folk doing IB > performance testing. Its very easy to just feed in the sysfs values.. > unfortunately they turn out to be of little value. Life would be so much > easier if there were 64 bit counters available. Instead I will probably > need to have an additional daemon to construct them. Depends on what you mean by available. They are defined in the IB spec (PortCountersExtended) but are optional and not available in all PMAs. -- Hal From rdreier at cisco.com Tue Oct 17 20:44:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 20:44:34 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061018001005.GQ4054@obsidianresearch.com> (Jason Gunthorpe's message of "Tue, 17 Oct 2006 18:10:05 -0600") References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> Message-ID: Roland> For now I just used lock; addl %0 to implement rmb on Roland> i386. I'm really not comfortable making libmthca depend Roland> on sse2, and I don't see a good way to detect and use sse2 Roland> at runtime. Jason> I think the typical way this is done would be to use Jason> ld.so's 'hwcap' handling and stick an optimized library in Jason> /usr/lib/sse2. It's a good suggestion, but the problem is that the CPU-dependent code is in the mthca.so driver-dependent plugin, which libibverbs dlopen()s at runtime. Do you know how to use the hwcap stuff with dlopen()? I'm not thrilled about creating an sse2 special case in libibverbs just to handle libmthca on i386. - R. From halr at voltaire.com Tue Oct 17 20:49:13 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Oct 2006 23:49:13 -0400 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: References: Message-ID: <1161143268.32093.432393.camel@hal.voltaire.com> On Tue, 2006-10-17 at 20:18, Scott Weitzenkamp (sweitzen) wrote: > I agree the 32-bit byte and packet counters are useless as they get > pegged in a few seconds on a busy IB networks. I thought there was an > effort in IBTA to fix this. The fix at least in terms of the spec has been there for a while. PortCountersExtended are in the 1.2 spec but not all hardware/PMA supports these (they are optional). > For IB counters in a Cisco switch, we read and reset the 32-bit counters > once per second and keep 64-bit counters internally. 32 bit byte counters can be pegged in only 16 seconds on a 4x SDR link and there are 4x DDR links now (8 seconds) and 12x links (5 seconds) so that strategy is inaccurate on busy networks. > This would be possible in OF too, right? This is part of a performance manager (which is part of fabric management) and is not standardized (specific to each fabric management offering). Most offer this manager as part of their solution. OpenSM will be adding a performance manager in the not distant future. An RFC will initially be published on this list so I look forward to comments since this seems to be an area of interest. -- Hal > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of Michael Newton > > Sent: Tuesday, October 17, 2006 5:10 PM > > To: Hal Rosenstock > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] sysfs exposure of port counters useless? > > > > On Tue, 17 Oct 2006, Hal Rosenstock wrote: > > > On Tue, 2006-10-17 at 09:55, Rimmer, Todd wrote: > > > > > From: Michael Newton > > > > > Sent: Tuesday, October 17, 2006 3:02 AM > > > > > To: openib-general at openib.org > > > > > Subject: [openib-general] sysfs exposure of port > > counters useless? > > > > > > > > > > > > > > > These are 32 bit counters. The rcv/xmit_data counters > > count 32-bit > > > > > blocks. Also, these counts do not wrap: they peg at all 1s. > > > > > At infiniband speeds, these counts can peg out very > > quickly indeed, > > > > > to the point they can really only be of use if they can > > be reset each > > > > time > > > > > there read. Now if anyone who wants to use them has to > > go the CLI to > > > > reset > > > > > them, and theres little point in reading them without > > reset, why would > > > > > anyone read them via sysfs? so why have them? > > > > > > > > > > > > > We have found that while your comment is true for the > > data movement > > > > counters, the error counters should not peg quickly, > > hence it is valid > > > > its true i overstated the case just a little;) .. yes error counters > > should be fine and its mainly the data counters that are problematic > > (tho now im not sure i havent seen the packet counters freeze when the > > data ones peg out).. > > > > > > to read them without resetting. However it is also > > useful to have an > > > > ability to reset them. Of course if there are other CLI > > commands which > > > > do this easily, the sysfs info is of less value. > > > > > > There are diag tools for this. > > > > thats where we started.. the point im making is that exposing the data > > counters in sysfs is of little use, because if you have to go to other > > tools to reset, why wouldnt you use them to read as well? > > > > i was looking at exposing infiniband stats via PCP > > (http://oss.sgi.com/projects/pcp/). This would be useful for > > folk doing IB > > performance testing. Its very easy to just feed in the sysfs values.. > > unfortunately they turn out to be of little value. Life would > > be so much > > easier if there were 64 bit counters available. Instead I > > will probably > > need to have an additional daemon to construct them. > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > From mst at mellanox.co.il Tue Oct 17 21:01:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 06:01:41 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161126773.2917.494.camel@fc6.xsintricity.com> References: <1161126773.2917.494.camel@fc6.xsintricity.com> Message-ID: <20061018040141.GA24394@mellanox.co.il> Quoting r. Doug Ledford : > Far easier would be to go the other way around, > run on x86_64 and build for i386, in which case gcc supports that out of > the box. All that's left is to convince Lenovo there's a market for x86_64 thinkpads. -- MST From mst at mellanox.co.il Tue Oct 17 21:15:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 06:15:56 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> Message-ID: <20061018041555.GD24394@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > Roland> For now I just used lock; addl %0 to implement rmb on > Roland> i386. I'm really not comfortable making libmthca depend > Roland> on sse2, and I don't see a good way to detect and use sse2 > Roland> at runtime. > > Jason> I think the typical way this is done would be to use > Jason> ld.so's 'hwcap' handling and stick an optimized library in > Jason> /usr/lib/sse2. > > It's a good suggestion, but the problem is that the CPU-dependent code > is in the mthca.so driver-dependent plugin, which libibverbs dlopen()s > at runtime. Do you know how to use the hwcap stuff with dlopen()? > I'm not thrilled about creating an sse2 special case in libibverbs > just to handle libmthca on i386. Off the top of my head, an easy way seems to be to split sse2-dependent code in a separate library, which can then be installed on ld path, have mthca pull that in. -- MST From jgunthorpe at obsidianresearch.com Tue Oct 17 21:20:03 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 17 Oct 2006 22:20:03 -0600 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> Message-ID: <20061018042003.GA25360@obsidianresearch.com> On Tue, Oct 17, 2006 at 08:44:34PM -0700, Roland Dreier wrote: > Jason> I think the typical way this is done would be to use > Jason> ld.so's 'hwcap' handling and stick an optimized library in > Jason> /usr/lib/sse2. > It's a good suggestion, but the problem is that the CPU-dependent code > is in the mthca.so driver-dependent plugin, which libibverbs dlopen()s > at runtime. Do you know how to use the hwcap stuff with dlopen()? > I'm not thrilled about creating an sse2 special case in libibverbs > just to handle libmthca on i386. It is automatic, I just doubled checked to be sure: $ cat t.c #include int main(int argc, const char *argv[]) { dlopen(argv[1],RTLD_NOW); } $ find /usr/lib -name "libcrypto.so.0.9.8" ./i486/libcrypto.so.0.9.8 ./libcrypto.so.0.9.8 ./i586/libcrypto.so.0.9.8 ./i686/cmov/libcrypto.so.0.9.8 $ strace ./t libcrypto.so.0.9.8 [..] open("/usr/lib/i686/cmov/libcrypto.so.0.9.8", O_RDONLY) = 3 $ mv libcrypto.so.0.9.8 /tmp/libcrypto.so.0.9.8.x $ strace ./t libcrypto.so.0.9.8 [.. 34 occurances of open /usr/..../libcrypto.so.0.9.8 ..] open("/usr/lib/libcrypto.so.0.9.8", O_RDONLY) = 3 $ ldconfig $ strace ./t libcrypto.so.0.9.8 [..] open("/usr/lib/libcrypto.so.0.9.8", O_RDONLY) = 3 Undocumented, but it does something very close to what you'd want.. ldconfig caches the soname mapping in /etc/ld.so.cache so you have to be careful when experimenting. Several packages that get big gains with specific optimizations use this already. Strace on the above test program with a non-existing library shows the search path and all permutations. It is also probably worth benchmarking a full cmov+i686+sse2 build of everything and look at always providing it if it is faster, like is often done for glibc. Jason From rdreier at cisco.com Tue Oct 17 21:27:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 17 Oct 2006 21:27:05 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061018040141.GA24394@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 18 Oct 2006 06:01:41 +0200") References: <1161126773.2917.494.camel@fc6.xsintricity.com> <20061018040141.GA24394@mellanox.co.il> Message-ID: Michael> All that's left is to convince Lenovo there's a market Michael> for x86_64 thinkpads. Actually you just have to wait a few months -- Core 2 (Merom) is 64-bit capable so once Lenovo catches up to everyone else, you'll be able to get a 64-bit thinkpad. - R. From mst at mellanox.co.il Tue Oct 17 21:43:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 06:43:54 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061018042003.GA25360@obsidianresearch.com> References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> <20061018042003.GA25360@obsidianresearch.com> Message-ID: <20061018044353.GA24817@mellanox.co.il> Quoting r. Jason Gunthorpe : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > On Tue, Oct 17, 2006 at 08:44:34PM -0700, Roland Dreier wrote: > > Jason> I think the typical way this is done would be to use > > Jason> ld.so's 'hwcap' handling and stick an optimized library in > > Jason> /usr/lib/sse2. > > > It's a good suggestion, but the problem is that the CPU-dependent code > > is in the mthca.so driver-dependent plugin, which libibverbs dlopen()s > > at runtime. Do you know how to use the hwcap stuff with dlopen()? > > I'm not thrilled about creating an sse2 special case in libibverbs > > just to handle libmthca on i386. > > It is automatic, I just doubled checked to be sure: The difference here is that libibverbs insists on putting all plugins in a separate directory and passing full path to dlopen, which of course breaks this. Roland, I've been looking at changing the way we handle plugins and this might be a good reason to finally do this before 1.1: rather than look for plugins in a pre-configured path, let's just have a config file (or files) and ask users to put the list of plugins there. As it is, it is already painful to keep both 32 and 64 bit libibverbs on the same system - we have to invent a methodology for where to put 64/32 bit libraries. And when I have to keep several library versions around for testing it's much easier (at least, for me) to just use LD_LIBRARY_PATH for everything and stick each version in a separate directory, than to remember playing with special environment that was invented just for libibverbs. Does this make sense? -- MST From krkumar2 at in.ibm.com Tue Oct 17 22:08:53 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Wed, 18 Oct 2006 10:38:53 +0530 Subject: [openib-general] [PATCH] Use time_after_eq() instead of time_after() in queue_req() In-Reply-To: Message-ID: > Please add something like "RDMA/addr: " before the "Use" there, so > that someone skimming the kernel log knows what subsystem/specific > area the patch touches. (I added that by hand) > Git just wants three -s like "---" between changelog entry and actual patch. > the last line in the original mail was blank, when it should have a > single space. This makes git complain (correctly) about a corrupt > patch. Please make sure your mailer doesn't corrupt whitespace. OK, all points noted. Sorry for the extra work :) - KK From krkumar2 at in.ibm.com Tue Oct 17 22:15:32 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Wed, 18 Oct 2006 10:45:32 +0530 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: <45350A1F.3030602@ichips.intel.com> Message-ID: > Something similar to: > > if (cma_any_addr...) { > ret = rdma_translate_ip(..); > if (ret) > goto err1; > > mutex_lock > ret = cma_acquire_dev > mutex_unlock > if (ret) > goto err2; > } > > should work fine. Actually that will not work, since the undo operation is for when the next operation (cma_get_port()) fails after we did an acquire_dev, and in that case the refcount needs to be dropped. So I am not able to avoid using an extra flag to indicate that a ref was got some time in the past, and drop it in the error path. I will send that out now. Thanks, - KK From jgunthorpe at obsidianresearch.com Tue Oct 17 22:14:58 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 17 Oct 2006 23:14:58 -0600 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061018044353.GA24817@mellanox.co.il> References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> <20061018042003.GA25360@obsidianresearch.com> <20061018044353.GA24817@mellanox.co.il> Message-ID: <20061018051458.GB25360@obsidianresearch.com> On Wed, Oct 18, 2006 at 06:43:54AM +0200, Michael S. Tsirkin wrote: > The difference here is that libibverbs insists on putting all plugins > in a separate directory and passing full path to dlopen, which of course > breaks this. Yeah, plugins in a seperate dir are not well supported by all the fancy things that dl does behind the scenes.. Unfortunately dlopen will not permute a relative path with the search parameters so there is no way to make it work other that fiddling LD_LIBRARY_PATH prior to calling dlopen: old = setenv("LD_LIBRARY_PATH",dirname(foo)); dlopen(basename(foo)); setenv("LD_LIBRARY_PATH",old); I just look a quick look at the directory setup and if you are changing things I'd say you should also arrange to have the libibverbs soname stamped into the plugin path and soname. Something like libmthca-libibverbs.2.so.0. Once you do that it is pretty safe to put it in /usr/lib* For libraries it is always best to design in support for multiple major versions being installed at once since invariably someone will need to do that down the road. Jason From krkumar2 at in.ibm.com Tue Oct 17 22:16:00 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Wed, 18 Oct 2006 10:46:00 +0530 Subject: [openib-general] [PATCH] RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count Message-ID: <20061018051600.6523.29836.sendpatchset@localhost.localdomain> rdma_bind_addr() leaks a cma_dev reference count in failure case. Signed-off-by: Krishna Kumar --- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 17:13:41.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 19:42:31.000000000 +0530 @@ -1749,6 +1749,7 @@ static int cma_get_port(struct rdma_id_p int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; + int did_acquire_dev = 0; int ret; if (addr->sa_family != AF_INET) @@ -1767,6 +1768,7 @@ int rdma_bind_addr(struct rdma_cm_id *id } if (ret) goto err; + did_acquire_dev = 1; } memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); @@ -1776,6 +1778,8 @@ int rdma_bind_addr(struct rdma_cm_id *id return 0; err: + if (did_acquire_dev) + cma_detach_from_dev(id_priv); cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); return ret; } From dledford at redhat.com Tue Oct 17 22:25:38 2006 From: dledford at redhat.com (Doug Ledford) Date: Wed, 18 Oct 2006 01:25:38 -0400 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <20061018040141.GA24394@mellanox.co.il> References: <1161126773.2917.494.camel@fc6.xsintricity.com> <20061018040141.GA24394@mellanox.co.il> Message-ID: <1161149139.2917.499.camel@fc6.xsintricity.com> On Wed, 2006-10-18 at 06:01 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Far easier would be to go the other way around, > > run on x86_64 and build for i386, in which case gcc supports that out of > > the box. > > All that's left is to convince Lenovo there's a market for x86_64 > thinkpads. I would place that behind convincing them not to ship exploding batteries (thankfully, they actually saw the light on that after some spectacular examples of why they should). -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From krkumar2 at in.ibm.com Tue Oct 17 22:36:37 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Wed, 18 Oct 2006 11:06:37 +0530 Subject: [openib-general] [PATCH] If addr_handler() got error, do not set state as OK In-Reply-To: <45350CED.8020705@ichips.intel.com> Message-ID: Sean Hefty wrote on 10/17/2006 10:33:41 PM: > Can you rework this patch without adding in extra flags to indicate what has or > has not been executed? OK, will fix it accordingly. thanks, - KK > Krishna Kumar wrote: > > diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c > > --- org/drivers/infiniband/core/cma.c 2006-10-10 15:45:27.000000000 +0530 > > +++ new/drivers/infiniband/core/cma.c 2006-10-10 15:59:53.000000000 +0530 > > @@ -1515,6 +1515,8 @@ static void addr_handler(int status, str > > { > > struct rdma_id_private *id_priv = context; > > enum rdma_cm_event_type event; > > + int did_comp_exch = 0; > > + int destroy = 0; > > As a general comment, I really don't think that we need to be overly concerned > about optimizing error handling at the expense of code readability. > > > Thanks, > - Sean From mst at mellanox.co.il Tue Oct 17 22:50:10 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 07:50:10 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161149139.2917.499.camel@fc6.xsintricity.com> References: <1161126773.2917.494.camel@fc6.xsintricity.com> <20061018040141.GA24394@mellanox.co.il> <1161149139.2917.499.camel@fc6.xsintricity.com> Message-ID: <20061018055010.GC24817@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: RHEL5 and OFED ... > > On Wed, 2006-10-18 at 06:01 +0200, Michael S. Tsirkin wrote: > > Quoting r. Doug Ledford : > > > Far easier would be to go the other way around, > > > run on x86_64 and build for i386, in which case gcc supports that out of > > > the box. > > > > All that's left is to convince Lenovo there's a market for x86_64 > > thinkpads. > > I would place that behind convincing them not to ship exploding > batteries (thankfully, they actually saw the light on that after some > spectacular examples of why they should). Sigh. BTW, the utility they supplied to check the battery didn't run under wine, had to open the case, copy the S/N to their web page manually. And I wandered - what does the utility do on windows? Heats the thing up and tests whether it explodes? -- MST From mlakshmanan at silverstorm.com Tue Oct 17 23:06:09 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Wed, 18 Oct 2006 02:06:09 -0400 Subject: [openib-general] [PATCH] IB/SRP Userspace: srptools/srp_daemon - Fix connect bug and add support for user specified initiator extension Message-ID: The patch addresses 3 issues: 1. Fixes bug in srp_daemon for the case where if it is invoked with the '-e' option, it fails to connect to the SRP targets because of a newline character in the parameter string. 2. Changes the name of the constant 'MAX_TRAGET_CONFIG_STR_STRING' to 'MAX_TARGET_CONFIG_STR_STRING'. 3. Changes the behavior of the '-n' option to srp_daemon. The earlier behavior printed the initiator extension. The new behavior allows the user to specify an initiator extension as an argument to the '-n' option. Signed-off-by: Madhu Lakshmanan --- --- 1.1-orig/src/userspace/srptools/srp_daemon/srp_daemon.c 2006-10-17 06:32:14.000000000 -0400 +++ 1.1/src/userspace/srptools/srp_daemon/srp_daemon.c 2006-10-17 06:10:12.000000000 -0400 @@ -78,7 +78,7 @@ static char *sysfs_path = "/sys"; static void usage(const char *argv0) { - fprintf(stderr, "Usage: %s [-vVceo] [-d | -i [-p ]] [-t ] [-r ] [-R ]\n", argv0); + fprintf(stderr, "Usage: %s [-vVceo] [-d | -i [-p ]] [-n ] [-t ] [-r ] [-R ]\n", argv0); fprintf(stderr, "-v Verbose\n"); fprintf(stderr, "-V debug Verbose\n"); fprintf(stderr, "-c prints connection Commands\n"); @@ -91,7 +91,7 @@ static void usage(const char *argv0) fprintf(stderr, "-R perform complete Rescan every seconds\n"); fprintf(stderr, "-t Timeout for mad response in milisec \n"); fprintf(stderr, "-r number of send Retries for each mad\n"); - fprintf(stderr, "-n New print - prints also initiator extention\n"); + fprintf(stderr, "-n New: use initiator extension\n"); fprintf(stderr, "\nExample: srp_daemon -e -i mthca0 -p 1 -R 60\n"); } @@ -114,7 +114,7 @@ void pr_cmd(char *target_str, int not_co int ret; if (config->cmd) - printf("%s", target_str); + printf("%s\n", target_str); if (config->execute && not_connected) { int fd = open(config->add_target_file, O_WRONLY); @@ -122,6 +122,7 @@ void pr_cmd(char *target_str, int not_co pr_err("unable to open %s, maybe ib_srp is not loaded\n", config->add_target_file); return; } + pr_debug("Add target str: %s\n", target_str); ret = write(fd, target_str, strlen(target_str)); pr_debug("Adding target returned %d\n", ret); close(fd); @@ -174,8 +175,8 @@ static void add_non_exist_traget(char *i char *subdir_name_ptr; int prefix_len; uint8_t dgid_val[16]; - const int MAX_TRAGET_CONFIG_STR_STRING = 255; - char target_config_str[MAX_TRAGET_CONFIG_STR_STRING]; + const int MAX_TARGET_CONFIG_STR_STRING = 255; + char target_config_str[MAX_TARGET_CONFIG_STR_STRING]; int len, len_left; int not_connected = 1; @@ -190,8 +191,7 @@ static void add_non_exist_traget(char *i prefix_len = strlen(scsi_host_dir); subdir_name_ptr = scsi_host_dir + prefix_len; - subdir = (void *) 1; /* Dummy value to enter the loop */ - while (subdir) { + do { subdir = readdir(dir); if (!subdir) @@ -237,9 +237,9 @@ static void add_non_exist_traget(char *i return; - } + } while (subdir); - len = snprintf(target_config_str, MAX_TRAGET_CONFIG_STR_STRING, "id_ext=%s," + len = snprintf(target_config_str, MAX_TARGET_CONFIG_STR_STRING, "id_ext=%s," "ioc_guid=%016llx," "dgid=%016llx%016llx," "pkey=ffff," @@ -249,41 +249,40 @@ static void add_non_exist_traget(char *i (unsigned long long) subnet_prefix, (unsigned long long) h_guid, (unsigned long long) h_service_id); - if (len >= MAX_TRAGET_CONFIG_STR_STRING) { + if (len >= MAX_TARGET_CONFIG_STR_STRING) { pr_err("Target conifg string is too long, ignoring target\n"); closedir(dir); return; } if (ioc_prof.io_class != htons(SRP_REV16A_IB_IO_CLASS)) { - len_left = MAX_TRAGET_CONFIG_STR_STRING - len; + len_left = MAX_TARGET_CONFIG_STR_STRING - len; len += snprintf(target_config_str+len, - MAX_TRAGET_CONFIG_STR_STRING - len, + MAX_TARGET_CONFIG_STR_STRING - len, ",io_class=%04hx", ntohs(ioc_prof.io_class)); - if (len >= MAX_TRAGET_CONFIG_STR_STRING) { + if (len >= MAX_TARGET_CONFIG_STR_STRING) { pr_err("Target conifg string is too long, ignoring target\n"); closedir(dir); return; } } - if (config->print_initiator_ext) { - len_left = MAX_TRAGET_CONFIG_STR_STRING - len; + if (config->initiator_ext) { + len_left = MAX_TARGET_CONFIG_STR_STRING - len; len += snprintf(target_config_str+len, - MAX_TRAGET_CONFIG_STR_STRING - len, + MAX_TARGET_CONFIG_STR_STRING - len, ",initiator_ext=%016llx", - (unsigned long long) ntohll(h_guid)); + (unsigned long long) config->initiator_ext); - if (len >= MAX_TRAGET_CONFIG_STR_STRING) { + if (len >= MAX_TARGET_CONFIG_STR_STRING) { pr_err("Target conifg string is too long, ignoring target\n"); closedir(dir); return; } } - target_config_str[len] = '\n'; - target_config_str[len+1] = '\0'; + target_config_str[len] = '\0'; pr_cmd(target_config_str, not_connected); @@ -860,10 +859,6 @@ static void print_config(struct config_t printf(" Executes add target command : %d\n", conf->execute); printf(" Print also connected targets : %d\n", conf->all); printf(" Report current tragets and stop : %d\n", conf->once); - if (conf->print_initiator_ext) - printf(" Print initiator_ext\n"); - else - printf(" Do not print initiator_ext\n"); if (conf->recalc_time) printf(" Performs full target rescan every %d seconds\n", conf->recalc_time); else @@ -892,12 +887,12 @@ static int get_config(struct config_t *c conf->mad_retries = 3; conf->recalc_time = 0; conf->add_target_file = NULL; - conf->print_initiator_ext = 0; + conf->initiator_ext = 0; while (1) { int c; - c = getopt(argc, argv, "caveod:i:p:t:r:R:Vhn"); + c = getopt(argc, argv, "caveod:i:n:p:t:r:R:Vh"); if (c == -1) break; @@ -940,7 +935,8 @@ static int get_config(struct config_t *c ++conf->debug_verbose; break; case 'n': - ++conf->print_initiator_ext; + conf->initiator_ext = (uint64_t) + strtoull(optarg, NULL, 16); break; case 't': conf->timeout = atoi(optarg); --- 1.1-orig/src/userspace/srptools/srp_daemon/srp_daemon.h 2006-10-17 06:32:14.000000000 -0400 +++ 1.1/src/userspace/srptools/srp_daemon/srp_daemon.h 2006-10-13 01:39:50.000000000 -0400 @@ -288,7 +288,7 @@ struct config_t { int debug_verbose; int timeout; int recalc_time; - int print_initiator_ext; + uint64_t initiator_ext; }; extern struct config_t *config; From erezz at voltaire.com Tue Oct 17 23:19:20 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 18 Oct 2006 08:19:20 +0200 Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 In-Reply-To: References: Message-ID: <4535C768.4020005@voltaire.com> I reported the same problem last week: http://openib.org/pipermail/openfabrics-ewg/2006-October/001714.html -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com Scott Weitzenkamp (sweitzen) wrote: > You need the kernel-source RPM, I guess the OFED install.sh should check > for that RPM. > > svbu-qa-opteron-1:~ # uname -a > Linux svbu-qa-opteron-1 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC > 2006 i68 > 6 athlon i386 GNU/Linux > svbu-qa-opteron-1:~ # rpm -qa | fgrep kernel > kernel-source-2.6.16.21-0.8 > kernel-smp-2.6.16.21-0.8 > kernel-ib-1.1-2.6.16.21_0.8_smp > kernel-ib-devel-1.1-2.6.16.21_0.8_smp > svbu-qa-opteron-1:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp > .config Makefile arch include2 > .kernelrelease Module.symvers include scripts > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > >> -----Original Message----- >> From: openib-general-bounces at openib.org >> [mailto:openib-general-bounces at openib.org] On Behalf Of Chris Dennett >> Sent: Tuesday, October 17, 2006 12:46 PM >> To: openfabrics-ewg at openib.org; openib-general at openib.org >> Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 >> >> I've been trying to install OFED 1.1 RC7 on an x86 server >> with a fresh install >> of SLES10 (32-bit). It errors out when trying to build the >> kernel modules. >> I've included what I think are the relevant log messages >> below. I've tried >> installing everything (minus iser and tvflash) or just the >> modules needed for >> SRP. I've installed 1.1 RC7 successfully on other RedHat >> servers without any >> problems. I am installing as root. Any help would be appreciated. >> >> Thanks. >> >> -Chris >> >> ============================================== >> + make kernel >> Building kernel modules >> Kernel version: 2.6.16.21-0.8-smp >> Modules directory: //lib/modules/2.6.16.21-0.8-smp >> Kernel sources: /lib/modules/2.6.16.21-0.8-smp/build >> env EXTRA_CFLAGS=" -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include >> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ >> >> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib \ >> >> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug" \ >> make -C /lib/modules/2.6.16.21-0.8-smp/build >> SUBDIRS="/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband" >> KERNELRELEASE=2.6.16.21-0.8-smp \ >> EXTRAVERSION=.21-0.8-smp V=1 \ >> CONFIG_INFINIBAND=m \ >> CONFIG_INFINIBAND_IPOIB=m \ >> CONFIG_INFINIBAND_SDP= \ >> CONFIG_INFINIBAND_SRP=m \ >> CONFIG_INFINIBAND_USER_MAD=m \ >> CONFIG_INFINIBAND_USER_ACCESS=m \ >> CONFIG_INFINIBAND_ADDR_TRANS=y \ >> CONFIG_INFINIBAND_MTHCA=m \ >> CONFIG_INFINIBAND_IPOIB_DEBUG=y \ >> CONFIG_INFINIBAND_ISER= \ >> CONFIG_INFINIBAND_EHCA= \ >> CONFIG_INFINIBAND_RDS= \ >> CONFIG_INFINIBAND_RDS_DEBUG= \ >> CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ >> CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ >> CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ >> CONFIG_INFINIBAND_SDP_DEBUG= \ >> CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ >> CONFIG_INFINIBAND_IPATH= \ >> CONFIG_INFINIBAND_MTHCA_DEBUG=y \ >> CONFIG_INFINIBAND_MADEYE= \ >> LINUXINCLUDE='-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include \ >> >> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ >> -Iinclude \ >> $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ >> -include include/linux/autoconf.h \ >> -include >> /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h \ >> ' \ >> modules >> make[1]: Entering directory >> `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' >> make[1]: *** No rule to make target `modules'. Stop. >> make[1]: Leaving directory `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' >> make: *** [kernel] Error 2 >> error: Bad exit status from /var/tmp/rpm-tmp.92052 (%install) >> >> >> RPM build errors: >> user vlad does not exist - using root >> group mtl does not exist - using root >> user vlad does not exist - using root >> group mtl does not exist - using root >> Bad exit status from /var/tmp/rpm-tmp.92052 (%install) >> ERROR: Failed executing "rpmbuild --rebuild --define '_topdir >> /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define >> 'build_root >> /var/tmp/OFED' --define 'configure_options --with-libibcommon >> --with-libibmad >> --with-libibumad --with-libibverbs --with-libmthca --with-opensm >> --with-librdmacm --with-openib-diags --with-srptools --with-mstflint >> --with-perftest --with-ipoib-mod --with-mthca-mod --with-srp-mod >> --with-core-mod --with-user_mad-mod --with-user_access-mod >> --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define >> 'KVERSION 2.6.16.21-0.8-smp' --define 'KSRC >> /lib/modules/2.6.16.21-0.8-smp/build' --define >> 'build_kernel_ib 1' --define >> 'build_kernel_ib_devel 0' --define 'NETWORK_CONF_DIR >> /etc/sysconfig/network' >> --define 'modprobe_update 1' --define 'include_ipoib_conf 0' --define >> 'build_32bit 0' /root/OFED-1.1-rc7/SRPMS/openib-1.1-0.src.rpm" >> >> =================================================== >> >> smx32:~ # uname -a >> Linux linux-yeez 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 >> UTC 2006 i686 >> i686 i386 GNU/Linux >> smx32:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp >> Module.symvers >> >> >> >> -- >> Chris Dennett >> Design Engineer >> Texas Memory Systems, Inc. >> 713-266-3200 x430 >> Chris.Dennett at texmemsys.com >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From ogerlitz at voltaire.com Tue Oct 17 23:22:36 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 18 Oct 2006 08:22:36 +0200 Subject: [openib-general] OFED-1.1-pre1 is ready In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACECB@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C92ACECB@mtlexch01.mtl.com> Message-ID: <4535C82C.9080601@voltaire.com> Tziporet Koren wrote: > OFED 1.1-pre1 is available: > URL: > https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-pre1.tgz > Release details: > > BUILD_ID: > OFED-1.1-pre1 > > openib-1.1 (REV=9854) > # User space > https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: > ref: refs/heads/ofed_1_1 > commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 Hi Tziporet, I have asked this Michael few days ago and did not get a reply yet: can you clarify where is the version of the OFED IB ***kernel*** drivers stated? I understand they are typically based on some tag of Linus GIT tree (for example OFED1.1 uses 2.6.18 - correct?) but i could not find any notice for that in the docs nor in the per rc emails. Or. From mst at mellanox.co.il Tue Oct 17 23:46:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 08:46:56 +0200 Subject: [openib-general] OFED-1.1-pre1 is ready In-Reply-To: <4535C82C.9080601@voltaire.com> References: <6C2C79E72C305246B504CBA17B5500C92ACECB@mtlexch01.mtl.com> <4535C82C.9080601@voltaire.com> Message-ID: <20061018064656.GE24817@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: OFED-1.1-pre1 is ready > > Tziporet Koren wrote: > > OFED 1.1-pre1 is available: > > URL: > > https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-pre1.tgz > > Release details: > > > > BUILD_ID: > > OFED-1.1-pre1 > > > > openib-1.1 (REV=9854) > > # User space > > https://openib.org/svn/gen2/branches/1.1/src/userspace > > Git: > > ref: refs/heads/ofed_1_1 > > commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 > > Hi Tziporet, > > I have asked this Michael few days ago and did not get a reply yet: can > you clarify where is the version of the OFED IB ***kernel*** drivers > stated? That's the commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 part. > I understand they are typically based on some tag of Linus GIT tree (for > example OFED1.1 uses 2.6.18 - correct?) but i could not find any notice > for that in the docs nor in the per rc emails. > > Or. OFED1.1 was last rebased against 2.6.18-rc6 + a couple of small patches touching cma + adding scripts out of kernel modules backports etc. 2.6.18 wasn't out by code freeze time, but all fixes in 2.6.18 are also in OFED 1.1. Try something like git log v2.6.18-rc6..936b9fc0bd1411b52826213a5d89e2ceb4f52a78 to get the list of OFED changes against v2.6.18-rc6. -- MST From dledford at redhat.com Tue Oct 17 23:49:58 2006 From: dledford at redhat.com (Doug Ledford) Date: Wed, 18 Oct 2006 02:49:58 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1160928780.2917.383.camel@fc6.xsintricity.com> References: <1160870845.2917.334.camel@fc6.xsintricity.com> <20061015155934.GD15055@mellanox.co.il> <1160928780.2917.383.camel@fc6.xsintricity.com> Message-ID: <1161154198.2917.507.camel@fc6.xsintricity.com> On Sun, 2006-10-15 at 12:13 -0400, Doug Ledford wrote: > > Now for userspace - does RHEL5 include at least libibverbs-1.0? > > This has been released a while back, and Roland makes regular bugfix releases. > > It includes the OFED 1.0 libibverbs (which makes openmpi complain about > lack of out of band data support, but otherwise seems to work). I built the OFED-1.1-pre1 user space RPMs for RHEL5. They are available at my web site. Kernel RPMs with the OFED 1.1 code will come a little later. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Tue Oct 17 23:58:21 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 08:58:21 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161154198.2917.507.camel@fc6.xsintricity.com> References: <1160870845.2917.334.camel@fc6.xsintricity.com> <20061015155934.GD15055@mellanox.co.il> <1160928780.2917.383.camel@fc6.xsintricity.com> <1161154198.2917.507.camel@fc6.xsintricity.com> Message-ID: <20061018065821.GG24817@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > On Sun, 2006-10-15 at 12:13 -0400, Doug Ledford wrote: > > > > Now for userspace - does RHEL5 include at least libibverbs-1.0? > > > This has been released a while back, and Roland makes regular bugfix releases. > > > > It includes the OFED 1.0 libibverbs (which makes openmpi complain about > > lack of out of band data support, but otherwise seems to work). What's out of band data BTW? > I built the OFED-1.1-pre1 user space RPMs for RHEL5. They are available > at my web site. Thanks! > Kernel RPMs with the OFED 1.1 code will come a little > later. >From our dicussion, it seems we should be able to just push the small number of missing bits into RHEL5 directly. That would be nicer of course. -- MST From mlakshmanan at silverstorm.com Wed Oct 18 00:04:04 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Wed, 18 Oct 2006 03:04:04 -0400 Subject: [openib-general] srp trouble on RHEL4 U4 In-Reply-To: <015e01c6f1fb$2a3fafe0$05cab4d5@ld.yandex.ru> Message-ID: Which SRP target are you using? Could you also give some more details on the fabric setup; i.e. what IB switch / gateway your host is connected to, and what kind of storage you wish to access? The full command that you used (echo -n .... > ..../add_target) to configure the SRP target would be very useful as well. The Silverstorm 7000 is an HCA (Host Channel Adapter). By itself, it should in most cases not be the primary reason for the error code you are seeing. The issue is more likely to be due to the SRP target you are attempting to connect to. Thanks, Madhu Lakshmanan Silverstorm Technologies, Inc. mlakshmanan at silverstorm.com > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of > Mirochnick Natalia > Sent: Tuesday, October 17, 2006 10:47 AM > To: openib-general at openib.org > Subject: [openib-general] srp trouble on RHEL4 U4 > > Hello, > > I'm trying to setup SRP connection (SRP in OFED 1.0). > IB card is Silverstorm 7000. > > ib_srp module is loaded, but after attempt to to create an > SRP device (as it > was described in manual srp_release_notes.txt) the error appears in > /var/log/messages: > kernel: REJ reason 0x0 > > What's wrong? > -- > Thanks in advance, > Mirochnick Natalia > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From dledford at redhat.com Wed Oct 18 00:08:50 2006 From: dledford at redhat.com (Doug Ledford) Date: Wed, 18 Oct 2006 03:08:50 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061018065821.GG24817@mellanox.co.il> References: <1160870845.2917.334.camel@fc6.xsintricity.com> <20061015155934.GD15055@mellanox.co.il> <1160928780.2917.383.camel@fc6.xsintricity.com> <1161154198.2917.507.camel@fc6.xsintricity.com> <20061018065821.GG24817@mellanox.co.il> Message-ID: <1161155330.2917.511.camel@fc6.xsintricity.com> On Wed, 2006-10-18 at 08:58 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > > > On Sun, 2006-10-15 at 12:13 -0400, Doug Ledford wrote: > > > > > > Now for userspace - does RHEL5 include at least libibverbs-1.0? > > > > This has been released a while back, and Roland makes regular bugfix releases. > > > > > > It includes the OFED 1.0 libibverbs (which makes openmpi complain about > > > lack of out of band data support, but otherwise seems to work). > > What's out of band data BTW? Probably just me misremembering the error message...here the actual message is: [0,1,1][btl_openib_endpoint.c:945:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline data [0,1,1][btl_openib_endpoint.c:945:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline data > > I built the OFED-1.1-pre1 user space RPMs for RHEL5. They are available > > at my web site. > > Thanks! > > > Kernel RPMs with the OFED 1.1 code will come a little > > later. > > >From our dicussion, it seems we should be able to just push the > small number of missing bits into RHEL5 directly. That would be > nicer of course. It depends. If there's lots of individual changes, it might be easier to push the OFED 1.1 change. But, that depends on when the final OFED 1.1 comes out and how much it varies from the existing RPMs. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From ogerlitz at voltaire.com Wed Oct 18 00:27:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 18 Oct 2006 09:27:22 +0200 Subject: [openib-general] OFED-1.1-pre1 is ready In-Reply-To: <20061018064656.GE24817@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C92ACECB@mtlexch01.mtl.com> <4535C82C.9080601@voltaire.com> <20061018064656.GE24817@mellanox.co.il> Message-ID: <4535D75A.8040001@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> Subject: Re: OFED-1.1-pre1 is ready >> >> Tziporet Koren wrote: >>> OFED 1.1-pre1 is available: >>> URL: >>> https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-pre1.tgz >>> Release details: >>> >>> BUILD_ID: >>> OFED-1.1-pre1 >>> >>> openib-1.1 (REV=9854) >>> # User space >>> https://openib.org/svn/gen2/branches/1.1/src/userspace >>> Git: >>> ref: refs/heads/ofed_1_1 >>> commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 >> Hi Tziporet, >> >> I have asked this Michael few days ago and did not get a reply yet: can >> you clarify where is the version of the OFED IB ***kernel*** drivers >> stated? > > That's the commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 part. I see. >> I understand they are typically based on some tag of Linus GIT tree (for >> example OFED1.1 uses 2.6.18 - correct?) but i could not find any notice >> for that in the docs nor in the per rc emails. > OFED1.1 was last rebased against 2.6.18-rc6 + a couple of small patches touching > cma + adding scripts out of kernel modules backports etc. 2.6.18 wasn't out > by code freeze time, but all fixes in 2.6.18 are also in OFED 1.1. > Try something like > git log v2.6.18-rc6..936b9fc0bd1411b52826213a5d89e2ceb4f52a78 > to get the list of OFED changes against v2.6.18-rc6. thanks for all the info, however i think the OFED docs must state on what upstream version are the OFED kernel IB drivers based (ie in this case 2.6.18-rc6 tag of linus tree) so one is able to determine that from reading the docs only (ie without using GIT). Or. From mst at mellanox.co.il Wed Oct 18 00:29:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 09:29:04 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161155330.2917.511.camel@fc6.xsintricity.com> References: <1161155330.2917.511.camel@fc6.xsintricity.com> Message-ID: <20061018072904.GA26507@mellanox.co.il> Quoting r. Doug Ledford : > > >From our dicussion, it seems we should be able to just push the > > small number of missing bits into RHEL5 directly. That would be > > nicer of course. > > It depends. If there's lots of individual changes, it might be easier > to push the OFED 1.1 change. But, that depends on when the final OFED > 1.1 comes out and how much it varies from the existing RPMs. OFED is in deep freeze, so you can already look at it to estimate the amount of changes against 2.6.18. Could you look at the diff please so that I know whether it's worth it to invest in building the minimal patch set for pushing into RHEL5, or whether you'll push OFED 1.1 into RHEL kernel as is? -- MST From mst at mellanox.co.il Wed Oct 18 00:32:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 09:32:44 +0200 Subject: [openib-general] OFED-1.1-pre1 is ready In-Reply-To: <4535D75A.8040001@voltaire.com> References: <4535D75A.8040001@voltaire.com> Message-ID: <20061018073244.GA26658@mellanox.co.il> Quoting r. Or Gerlitz : > > Try something like > > git log v2.6.18-rc6..936b9fc0bd1411b52826213a5d89e2ceb4f52a78 > > to get the list of OFED changes against v2.6.18-rc6. > > thanks for all the info, however i think the OFED docs must state on > what upstream version are the OFED kernel IB drivers based (ie in this > case 2.6.18-rc6 tag of linus tree) so one is able to determine that from > reading the docs only (ie without using GIT). Makes sense. Care to formulate the appropriate wording? Which document should this go into? -- MST From mst at mellanox.co.il Wed Oct 18 00:38:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 09:38:28 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161126773.2917.494.camel@fc6.xsintricity.com> References: <1161120853.2917.475.camel@fc6.xsintricity.com> <20061017214828.GV23922@mellanox.co.il> <1161126773.2917.494.camel@fc6.xsintricity.com> Message-ID: <20061018073828.GA14609@mellanox.co.il> Quoting r. Doug Ledford : > > Hmm, no, I really want to take a srpm from amd64 and get a 32 bit > > gcc executable that will build 64 bit binaries that match these > > built on native amd64 system exectly. > > Between just i386 and x86_64, you might be able to do that. I guess what I would like is for redhat to enable -m64 is gcc/binutils from 32 bit distribution. Then once I have a 64 bit machine, I could boot a 32 bit distro but change just the kernel to 64 bit. -- MST From liakhovitch at mail.ru Wed Oct 18 02:29:29 2006 From: liakhovitch at mail.ru (Mirochnick Natalia) Date: Wed, 18 Oct 2006 13:29:29 +0400 Subject: [openib-general] srp trouble on RHEL4 U4 References: Message-ID: <005a01c6f297$ea88dfa0$05cab4d5@ld.yandex.ru> 1. Thank alot for your attention. 2. Here's the details: IB switch: Silverstorm 5000 Storage: NetApp FAS 320 root@[...]# /usr/ofed/sbin/ibsrpdm -c id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe8000000000000000066a02600001de,pkey=ffff,service_id=0000494353535250,io_class=ff00 id_ext=0000000000000001,ioc_guid=00066a02380001de,dgid=fe8000000000000000066a02600001de,pkey=ffff,service_id=0000494353535250,io_class=ff00 [root at ...]# echo -n id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe8000000000000000066a02600001de,pkey=ffff,service_id=0000494353535250 > /sys/class/infiniband_srp/srp-mthca0-1/add_target Thanks in advance, Natalia Mirochnick ----- Original Message ----- From: "Lakshmanan, Madhu" To: "Mirochnick Natalia" ; Sent: Wednesday, October 18, 2006 11:04 AM Subject: RE: [openib-general] srp trouble on RHEL4 U4 Which SRP target are you using? Could you also give some more details on the fabric setup; i.e. what IB switch / gateway your host is connected to, and what kind of storage you wish to access? The full command that you used (echo -n .... > ..../add_target) to configure the SRP target would be very useful as well. The Silverstorm 7000 is an HCA (Host Channel Adapter). By itself, it should in most cases not be the primary reason for the error code you are seeing. The issue is more likely to be due to the SRP target you are attempting to connect to. Thanks, Madhu Lakshmanan Silverstorm Technologies, Inc. mlakshmanan at silverstorm.com > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of > Mirochnick Natalia > Sent: Tuesday, October 17, 2006 10:47 AM > To: openib-general at openib.org > Subject: [openib-general] srp trouble on RHEL4 U4 > > Hello, > > I'm trying to setup SRP connection (SRP in OFED 1.0). > IB card is Silverstorm 7000. > > ib_srp module is loaded, but after attempt to to create an > SRP device (as it > was described in manual srp_release_notes.txt) the error appears in > /var/log/messages: > kernel: REJ reason 0x0 > > What's wrong? > -- > Thanks in advance, > Mirochnick Natalia > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > __________ NOD32 1.1808 (20061017) Information __________ This message was checked by NOD32 antivirus system. http://www.eset.com From mlakshmanan at silverstorm.com Wed Oct 18 02:36:50 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Wed, 18 Oct 2006 05:36:50 -0400 Subject: [openib-general] srp trouble on RHEL4 U4 In-Reply-To: <005a01c6f297$ea88dfa0$05cab4d5@ld.yandex.ru> Message-ID: >> Madhu Lakshmanan wrote: >> Which SRP target are you using? Could you also give some more >> details on >> the fabric setup; i.e. what IB switch / gateway your host is connected >> to, and what kind of storage you wish to access? The full command that >> you used (echo -n .... > ..../add_target) to configure the SRP target >> would be very useful as well. > > From: Mirochnick Natalia [mailto:liakhovitch at mail.ru] > Subject: Re: [openib-general] srp trouble on RHEL4 U4 > > 2. Here's the details: > > IB switch: Silverstorm 5000 > Storage: NetApp FAS 320 > > root@[...]# /usr/ofed/sbin/ibsrpdm -c > id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=000049435353525 > 0,io_class=ff00 > id_ext=0000000000000001,ioc_guid=00066a02380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=000049435353525 > 0,io_class=ff00 > > [root at ...]# echo -n > id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=0000494353535250 > > /sys/class/infiniband_srp/srp-mthca0-1/add_target > The problem is with the echo string you are giving. The command should be invoked as: echo -n id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 00000000000066a02600001de,pkey=ffff,service_id=0000494353535250, io_class=ff00 > /sys/class/infiniband_srp/srp-mthca0-1/add_target You were missing the 'io_class=ff00' bit. Let me know if it works. Madhu From liakhovitch at mail.ru Wed Oct 18 03:03:08 2006 From: liakhovitch at mail.ru (Mirochnick Natalia) Date: Wed, 18 Oct 2006 14:03:08 +0400 Subject: [openib-general] srp trouble on RHEL4 U4 References: Message-ID: <009001c6f29c$a173f930$05cab4d5@ld.yandex.ru> I've changed the string as you've advised, but it didn't work. The only difference is that string "" was added in /var/log/messages. root@[...] echo -n id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe8000000000000000066a02600001de,pkey=ffff,service_id=0000494353535250,io_class=ff00 > /sys/class/infiniband_srp/srp-mthca0-1/add_target root@[...] tail /var/log/messages Oct 18 14:01:26 ... kernel: REJ reason 0x0 Oct 18 14:01:26 ... kernel: ib_srp: Connection failed By the way, in ofed srp_release_notes.txt hasn't been said that io_class is mandatory parameter. Natalia ----- Original Message ----- >> Madhu Lakshmanan wrote: >> Which SRP target are you using? Could you also give some more >> details on >> the fabric setup; i.e. what IB switch / gateway your host is connected >> to, and what kind of storage you wish to access? The full command that >> you used (echo -n .... > ..../add_target) to configure the SRP target >> would be very useful as well. > > From: Mirochnick Natalia [mailto:liakhovitch at mail.ru] > Subject: Re: [openib-general] srp trouble on RHEL4 U4 > > 2. Here's the details: > > IB switch: Silverstorm 5000 > Storage: NetApp FAS 320 > > root@[...]# /usr/ofed/sbin/ibsrpdm -c > id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=000049435353525 > 0,io_class=ff00 > id_ext=0000000000000001,ioc_guid=00066a02380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=000049435353525 > 0,io_class=ff00 > > [root at ...]# echo -n > id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=0000494353535250 > > /sys/class/infiniband_srp/srp-mthca0-1/add_target > The problem is with the echo string you are giving. The command should be invoked as: echo -n id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 00000000000066a02600001de,pkey=ffff,service_id=0000494353535250, io_class=ff00 > /sys/class/infiniband_srp/srp-mthca0-1/add_target You were missing the 'io_class=ff00' bit. Let me know if it works. Madhu From mlakshmanan at silverstorm.com Wed Oct 18 03:10:31 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Wed, 18 Oct 2006 06:10:31 -0400 Subject: [openib-general] srp trouble on RHEL4 U4 In-Reply-To: <009001c6f29c$a173f930$05cab4d5@ld.yandex.ru> Message-ID: > From: Mirochnick Natalia [mailto:liakhovitch at mail.ru] > Subject: Re: [openib-general] srp trouble on RHEL4 U4 > > I've changed the string as you've advised, but it didn't > work. The only > difference is that string "" was added in /var/log/messages. > > root@[...] echo -n > id_ext=0000000000000001,ioc_guid=00066a01380001de,dgid=fe80000 > 00000000000066a02600001de,pkey=ffff,service_id=000049435353525 > 0,io_class=ff00 > > /sys/class/infiniband_srp/srp-mthca0-1/add_target > > root@[...] tail /var/log/messages > Oct 18 14:01:26 ... kernel: REJ reason 0x0 > Oct 18 14:01:26 ... kernel: ib_srp: Connection failed > > By the way, in ofed srp_release_notes.txt hasn't been said > that io_class is > mandatory parameter. > > Natalia It is an error in the OFED 1.0 srp_release_notes.txt. For all Silverstorm SRP targets (like the Silverstorm 5000 switch with Fiber Channel IOC), the 'io_class=ff00' parameter is mandatory. Let me investigate a little more, and get back to you. Madhu From vlad at mellanox.co.il Wed Oct 18 03:08:43 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 18 Oct 2006 12:08:43 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1-RC7 build problem on SLES10 In-Reply-To: References: Message-ID: <1161166123.7577.11.camel@vladsk-laptop> Hi, OFED installation script check that the directory /lib/modules/`uname -r`/build/ and the file /lib/modules/`uname -r`/build/Makefle exist. It does not check for kernel-source RPM because of kernels from kernel.org support. -- Regards, Vladimir On Tue, 2006-10-17 at 15:45 -0700, Scott Weitzenkamp (sweitzen) wrote: > You need the kernel-source RPM, I guess the OFED install.sh should check > for that RPM. > > svbu-qa-opteron-1:~ # uname -a > Linux svbu-qa-opteron-1 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC > 2006 i68 > 6 athlon i386 GNU/Linux > svbu-qa-opteron-1:~ # rpm -qa | fgrep kernel > kernel-source-2.6.16.21-0.8 > kernel-smp-2.6.16.21-0.8 > kernel-ib-1.1-2.6.16.21_0.8_smp > kernel-ib-devel-1.1-2.6.16.21_0.8_smp > svbu-qa-opteron-1:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp > .config Makefile arch include2 > .kernelrelease Module.symvers include scripts > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of Chris Dennett > > Sent: Tuesday, October 17, 2006 12:46 PM > > To: openfabrics-ewg at openib.org; openib-general at openib.org > > Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 > > > > I've been trying to install OFED 1.1 RC7 on an x86 server > > with a fresh install > > of SLES10 (32-bit). It errors out when trying to build the > > kernel modules. > > I've included what I think are the relevant log messages > > below. I've tried > > installing everything (minus iser and tvflash) or just the > > modules needed for > > SRP. I've installed 1.1 RC7 successfully on other RedHat > > servers without any > > problems. I am installing as root. Any help would be appreciated. > > > > Thanks. > > > > -Chris > > > > ============================================== > > + make kernel > > Building kernel modules > > Kernel version: 2.6.16.21-0.8-smp > > Modules directory: //lib/modules/2.6.16.21-0.8-smp > > Kernel sources: /lib/modules/2.6.16.21-0.8-smp/build > > env EXTRA_CFLAGS=" -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ > > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib \ > > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug" \ > > make -C /lib/modules/2.6.16.21-0.8-smp/build > > SUBDIRS="/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband" > > KERNELRELEASE=2.6.16.21-0.8-smp \ > > EXTRAVERSION=.21-0.8-smp V=1 \ > > CONFIG_INFINIBAND=m \ > > CONFIG_INFINIBAND_IPOIB=m \ > > CONFIG_INFINIBAND_SDP= \ > > CONFIG_INFINIBAND_SRP=m \ > > CONFIG_INFINIBAND_USER_MAD=m \ > > CONFIG_INFINIBAND_USER_ACCESS=m \ > > CONFIG_INFINIBAND_ADDR_TRANS=y \ > > CONFIG_INFINIBAND_MTHCA=m \ > > CONFIG_INFINIBAND_IPOIB_DEBUG=y \ > > CONFIG_INFINIBAND_ISER= \ > > CONFIG_INFINIBAND_EHCA= \ > > CONFIG_INFINIBAND_RDS= \ > > CONFIG_INFINIBAND_RDS_DEBUG= \ > > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ > > CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ > > CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ > > CONFIG_INFINIBAND_SDP_DEBUG= \ > > CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ > > CONFIG_INFINIBAND_IPATH= \ > > CONFIG_INFINIBAND_MTHCA_DEBUG=y \ > > CONFIG_INFINIBAND_MADEYE= \ > > LINUXINCLUDE='-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include \ > > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ > > -Iinclude \ > > $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ > > -include include/linux/autoconf.h \ > > -include > > /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h \ > > ' \ > > modules > > make[1]: Entering directory > > `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' > > make[1]: *** No rule to make target `modules'. Stop. > > make[1]: Leaving directory `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' > > make: *** [kernel] Error 2 > > error: Bad exit status from /var/tmp/rpm-tmp.92052 (%install) > > > > > > RPM build errors: > > user vlad does not exist - using root > > group mtl does not exist - using root > > user vlad does not exist - using root > > group mtl does not exist - using root > > Bad exit status from /var/tmp/rpm-tmp.92052 (%install) > > ERROR: Failed executing "rpmbuild --rebuild --define '_topdir > > /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define > > 'build_root > > /var/tmp/OFED' --define 'configure_options --with-libibcommon > > --with-libibmad > > --with-libibumad --with-libibverbs --with-libmthca --with-opensm > > --with-librdmacm --with-openib-diags --with-srptools --with-mstflint > > --with-perftest --with-ipoib-mod --with-mthca-mod --with-srp-mod > > --with-core-mod --with-user_mad-mod --with-user_access-mod > > --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define > > 'KVERSION 2.6.16.21-0.8-smp' --define 'KSRC > > /lib/modules/2.6.16.21-0.8-smp/build' --define > > 'build_kernel_ib 1' --define > > 'build_kernel_ib_devel 0' --define 'NETWORK_CONF_DIR > > /etc/sysconfig/network' > > --define 'modprobe_update 1' --define 'include_ipoib_conf 0' --define > > 'build_32bit 0' /root/OFED-1.1-rc7/SRPMS/openib-1.1-0.src.rpm" > > > > =================================================== > > > > smx32:~ # uname -a > > Linux linux-yeez 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 > > UTC 2006 i686 > > i686 i386 GNU/Linux > > smx32:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp > > Module.symvers > > > > > > > > -- > > Chris Dennett > > Design Engineer > > Texas Memory Systems, Inc. > > 713-266-3200 x430 > > Chris.Dennett at texmemsys.com > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg From halr at voltaire.com Wed Oct 18 03:32:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Oct 2006 06:32:25 -0400 Subject: [openib-general] sysfs exposure of port counters useless? In-Reply-To: <1161143268.32093.432393.camel@hal.voltaire.com> References: <1161143268.32093.432393.camel@hal.voltaire.com> Message-ID: <1161167516.32093.448654.camel@hal.voltaire.com> On Tue, 2006-10-17 at 23:49, Hal Rosenstock wrote: [snip...] > > For IB counters in a Cisco switch, we read and reset the 32-bit counters > > once per second and keep 64-bit counters internally. > > 32 bit byte counters can be pegged in only 16 seconds on a 4x SDR link > and there are 4x DDR links now (8 seconds) and 12x links (5 seconds) so > that strategy is inaccurate on busy networks. I was a little sleepy... I take back the last part of the comment. 1 sec should be frequent enough. The only issue with this approach is the skew in the reading of the port counters as the interval is not as precise as it could be and that is likely to be good enough for these purposes. -- Hal From halr at voltaire.com Wed Oct 18 04:02:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Oct 2006 07:02:04 -0400 Subject: [openib-general] [PATCH] opensm: misc fixes in lft dump file parser In-Reply-To: <20061017182856.GA26498@sashak.voltaire.com> References: <20061017182856.GA26498@sashak.voltaire.com> Message-ID: <1161169276.32093.449598.camel@hal.voltaire.com> On Tue, 2006-10-17 at 14:28, Sasha Khapyorsky wrote: > There are misc small fixes for lft dump parser: > - merge ERROR and SYS logging in single osm_log() call > - more strict strtoul() results checking > - fix potential bugs with invalid dump files > - break too long lines > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From jsquyres at cisco.com Wed Oct 18 05:10:32 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 18 Oct 2006 08:10:32 -0400 Subject: [openib-general] Tools for development In-Reply-To: <20061017134550.GE20690@mellanox.co.il> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <20061017134550.GE20690@mellanox.co.il> Message-ID: On Oct 17, 2006, at 9:45 AM, Michael S. Tsirkin wrote: >> It seems like trac can integrate with both SVN and git and would also >> provide us with integrated wiki capabilities. > > One feature that bugzilla has (and that seems to be disabled in > openib bugzilla > :() is mail integration, where I can Cc bugzilla and mail contents > will get > attached to bug report. I was hoping that new server will have this > capability. Does trac have this? Good question; I don't know. I'll find out. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From jsquyres at cisco.com Wed Oct 18 05:12:29 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 18 Oct 2006 08:12:29 -0400 Subject: [openib-general] Tools for development In-Reply-To: <20061017162146.GB26226@sashak.voltaire.com> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <1161096883.30946.2.camel@stevo-desktop> <20061017150442.GA22531@mellanox.co.il> <20061017162146.GB26226@sashak.voltaire.com> Message-ID: <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> I was not on the call last week, but I understand that there was some discussion about exactly this point (ditch SVN and go 100% git): the decision was to stick with SVN for userspace stuff and stick with git for kernel stuff. However, this is a larger audience than was on the call. Is there a significant movement here from the developers to move to 100% git? (I don't really care) On Oct 17, 2006, at 12:21 PM, Sasha Khapyorsky wrote: > On 17:04 Tue 17 Oct , Michael S. Tsirkin wrote: >> Quoting r. Steve Wise : >>> At the risk of opening a can of worms, is there any reason we >>> don't move >>> the user stuff into its own git tree? This would get rid of svn >>> altogether... >> >> If we do, that should probably be multiple git trees - verbs, >> management, >> tests are all more or less independent and developed mostly by >> different people. > > Reasonable. And generally this should not be too bad. > > Sasha > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From jsquyres at cisco.com Wed Oct 18 05:22:24 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 18 Oct 2006 08:22:24 -0400 Subject: [openib-general] New DNS name for openfabrics.org Message-ID: Who runs the DNS for openfabrics.org? Could we get a new DNS A name added: staging.openfabrics.org -- for the new server? 69.55.231.195. Thanks! -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Wed Oct 18 05:39:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 14:39:33 +0200 Subject: [openib-general] Tools for development In-Reply-To: <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> References: <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> Message-ID: <20061018123933.GA28308@mellanox.co.il> Quoting r. Jeff Squyres : > However, this is a larger audience than was on the call. Is there a > significant movement here from the developers to move to 100% git? Life would be somewhat easier for me with 100% git. -- MST From erezz at voltaire.com Wed Oct 18 05:40:40 2006 From: erezz at voltaire.com (Erez Zilber) Date: Wed, 18 Oct 2006 14:40:40 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1-RC7 build problem on SLES10 In-Reply-To: <1161166123.7577.11.camel@vladsk-laptop> References: <1161166123.7577.11.camel@vladsk-laptop> Message-ID: <453620C8.8080900@voltaire.com> When I ran the install script without having the kernel sources rpm on SLES 10, I had to wait several minutes before it failed. Shouldn't the script check such dependencies before starting the build process? Erez Vladimir Sokolovsky wrote: > > Hi, > OFED installation script check that the directory > /lib/modules/`uname -r`/build/ and the file > /lib/modules/`uname -r`/build/Makefle exist. > It does not check for kernel-source RPM because of kernels from > kernel.org support. > > > -- > > Regards, > Vladimir > > On Tue, 2006-10-17 at 15:45 -0700, Scott Weitzenkamp (sweitzen) wrote: > > You need the kernel-source RPM, I guess the OFED install.sh should check > > for that RPM. > > > > svbu-qa-opteron-1:~ # uname -a > > Linux svbu-qa-opteron-1 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC > > 2006 i68 > > 6 athlon i386 GNU/Linux > > svbu-qa-opteron-1:~ # rpm -qa | fgrep kernel > > kernel-source-2.6.16.21-0.8 > > kernel-smp-2.6.16.21-0.8 > > kernel-ib-1.1-2.6.16.21_0.8_smp > > kernel-ib-devel-1.1-2.6.16.21_0.8_smp > > svbu-qa-opteron-1:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp > > .config Makefile arch include2 > > .kernelrelease Module.symvers include scripts > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > > > -----Original Message----- > > > From: openib-general-bounces at openib.org > > > [mailto:openib-general-bounces at openib.org] On Behalf Of Chris Dennett > > > Sent: Tuesday, October 17, 2006 12:46 PM > > > To: openfabrics-ewg at openib.org; openib-general at openib.org > > > Subject: [openib-general] OFED 1.1-RC7 build problem on SLES10 > > > > > > I've been trying to install OFED 1.1 RC7 on an x86 server > > > with a fresh install > > > of SLES10 (32-bit). It errors out when trying to build the > > > kernel modules. > > > I've included what I think are the relevant log messages > > > below. I've tried > > > installing everything (minus iser and tvflash) or just the > > > modules needed for > > > SRP. I've installed 1.1 RC7 successfully on other RedHat > > > servers without any > > > problems. I am installing as root. Any help would be appreciated. > > > > > > Thanks. > > > > > > -Chris > > > > > > ============================================== > > > + make kernel > > > Building kernel modules > > > Kernel version: 2.6.16.21-0.8-smp > > > Modules directory: //lib/modules/2.6.16.21-0.8-smp > > > Kernel sources: /lib/modules/2.6.16.21-0.8-smp/build > > > env EXTRA_CFLAGS=" -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ > > > > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib \ > > > > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug" \ > > > make -C /lib/modules/2.6.16.21-0.8-smp/build > > > SUBDIRS="/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband" > > > KERNELRELEASE=2.6.16.21-0.8-smp \ > > > EXTRAVERSION=.21-0.8-smp V=1 \ > > > CONFIG_INFINIBAND=m \ > > > CONFIG_INFINIBAND_IPOIB=m \ > > > CONFIG_INFINIBAND_SDP= \ > > > CONFIG_INFINIBAND_SRP=m \ > > > CONFIG_INFINIBAND_USER_MAD=m \ > > > CONFIG_INFINIBAND_USER_ACCESS=m \ > > > CONFIG_INFINIBAND_ADDR_TRANS=y \ > > > CONFIG_INFINIBAND_MTHCA=m \ > > > CONFIG_INFINIBAND_IPOIB_DEBUG=y \ > > > CONFIG_INFINIBAND_ISER= \ > > > CONFIG_INFINIBAND_EHCA= \ > > > CONFIG_INFINIBAND_RDS= \ > > > CONFIG_INFINIBAND_RDS_DEBUG= \ > > > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ > > > CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ > > > CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ > > > CONFIG_INFINIBAND_SDP_DEBUG= \ > > > CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ > > > CONFIG_INFINIBAND_IPATH= \ > > > CONFIG_INFINIBAND_MTHCA_DEBUG=y \ > > > CONFIG_INFINIBAND_MADEYE= \ > > > LINUXINCLUDE='-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include \ > > > > > > -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include \ > > > -Iinclude \ > > > $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ > > > -include include/linux/autoconf.h \ > > > -include > > > /var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h \ > > > ' \ > > > modules > > > make[1]: Entering directory > > > `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' > > > make[1]: *** No rule to make target `modules'. Stop. > > > make[1]: Leaving directory `/usr/src/linux-2.6.16.21-0.8-obj/i386/smp' > > > make: *** [kernel] Error 2 > > > error: Bad exit status from /var/tmp/rpm-tmp.92052 (%install) > > > > > > > > > RPM build errors: > > > user vlad does not exist - using root > > > group mtl does not exist - using root > > > user vlad does not exist - using root > > > group mtl does not exist - using root > > > Bad exit status from /var/tmp/rpm-tmp.92052 (%install) > > > ERROR: Failed executing "rpmbuild --rebuild --define '_topdir > > > /var/tmp/OFEDRPM' --define '_prefix /usr/local/ofed' --define > > > 'build_root > > > /var/tmp/OFED' --define 'configure_options --with-libibcommon > > > --with-libibmad > > > --with-libibumad --with-libibverbs --with-libmthca --with-opensm > > > --with-librdmacm --with-openib-diags --with-srptools --with-mstflint > > > --with-perftest --with-ipoib-mod --with-mthca-mod --with-srp-mod > > > --with-core-mod --with-user_mad-mod --with-user_access-mod > > > --with-addr_trans-mod' --define 'configure_options32 %{nil}' --define > > > 'KVERSION 2.6.16.21-0.8-smp' --define 'KSRC > > > /lib/modules/2.6.16.21-0.8-smp/build' --define > > > 'build_kernel_ib 1' --define > > > 'build_kernel_ib_devel 0' --define 'NETWORK_CONF_DIR > > > /etc/sysconfig/network' > > > --define 'modprobe_update 1' --define 'include_ipoib_conf 0' --define > > > 'build_32bit 0' /root/OFED-1.1-rc7/SRPMS/openib-1.1-0.src.rpm" > > > > > > =================================================== > > > > > > smx32:~ # uname -a > > > Linux linux-yeez 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 > > > UTC 2006 i686 > > > i686 i386 GNU/Linux > > > smx32:~ # ls /usr/src/linux-2.6.16.21-0.8-obj/i386/smp > > > Module.symvers > > > > > > > > > > > > -- > > > Chris Dennett > > > Design Engineer > > > Texas Memory Systems, Inc. > > > 713-266-3200 x430 > > > Chris.Dennett at texmemsys.com > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > _______________________________________________ > > openfabrics-ewg mailing list > > openfabrics-ewg at openib.org > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From dledford at redhat.com Wed Oct 18 06:10:58 2006 From: dledford at redhat.com (Doug Ledford) Date: Wed, 18 Oct 2006 09:10:58 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061018072904.GA26507@mellanox.co.il> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> Message-ID: <1161177058.2917.513.camel@fc6.xsintricity.com> On Wed, 2006-10-18 at 09:29 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > > >From our dicussion, it seems we should be able to just push the > > > small number of missing bits into RHEL5 directly. That would be > > > nicer of course. > > > > It depends. If there's lots of individual changes, it might be easier > > to push the OFED 1.1 change. But, that depends on when the final OFED > > 1.1 comes out and how much it varies from the existing RPMs. > > OFED is in deep freeze, so you can already look at it to estimate the amount of > changes against 2.6.18. > Could you look at the diff please so that I know whether it's worth it > to invest in building the minimal patch set for pushing into RHEL5, > or whether you'll push OFED 1.1 into RHEL kernel as is? Yeah, I'll look over the diff today. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jsquyres at cisco.com Wed Oct 18 06:26:29 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 18 Oct 2006 09:26:29 -0400 Subject: [openib-general] Tools for development In-Reply-To: <20061017164544.GB23922@mellanox.co.il> References: <1161099425.24554.48.camel@localhost> <20061017164544.GB23922@mellanox.co.il> Message-ID: <230C1EA7-4768-4E9C-AA74-8CD7CAE516C6@cisco.com> On Oct 17, 2006, at 12:45 PM, Michael S. Tsirkin wrote: >> Developers had requested git 1.4, but Ubuntu had an older >> version. We >> went ahead and installed git from source. I'd prefer to stick to >> Ubuntu >> packages if possible. > > We have much to gain from newer versions - just look at gitweb > change log. > But my assumption here was that someone will keep the built from > source > tools updated. I don't have a problem alerting the list when new > versions come out. > > If, as Roland suggested, we'll be stuck at this version, its better > to stick with distro-supplied ones, assuming that *that* is updated > in a timely fashion. > > So, I guess the question is how is the sytsem supported/updated? This is probably quite the operative question. I volunteered to setup and maintain trac if the group decides to use it. I don't know what the plan is for supporting the other software packages. I too, would side with Michael that the relatively-recent versions of svn (although this may become moot) and trac tend to be beneficial to developers (I assume the same is true for git, but I have no direct experience). Does anyone have any sysadmin cycles to do this kind of stuff? I would expect it to be a flurry of activity here in the beginning followed by short bursts of activity separated by long periods of nothing. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Wed Oct 18 06:35:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 15:35:52 +0200 Subject: [openib-general] Tools for development In-Reply-To: <230C1EA7-4768-4E9C-AA74-8CD7CAE516C6@cisco.com> References: <1161099425.24554.48.camel@localhost> <20061017164544.GB23922@mellanox.co.il> <230C1EA7-4768-4E9C-AA74-8CD7CAE516C6@cisco.com> Message-ID: <20061018133552.GD28308@mellanox.co.il> Quoting r. Jeff Squyres : > Does anyone have any sysadmin cycles to do this kind of stuff? I > would expect it to be a flurry of activity here in the beginning > followed by short bursts of activity separated by long periods of > nothing. FWIW, I can help out keeping the git tool updated - I'm doing it at Mellanox now and its quite trivial. In particular, this can be done without central admin priveledges - git does not need to be suid root, and its easy to set git up to run from some "git-admin" user's home directory. Playing with softlinks makes it quite easy for this user to update git for everyone. -- MST From sashak at voltaire.com Wed Oct 18 06:44:00 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 18 Oct 2006 15:44:00 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: fix function name in the comment Message-ID: <20061018134400.GA13485@sashak.voltaire.com> This trivially fixes function name (osm_switch_set_min_lid_size) in the comment. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_switch.h | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h index 8c4799f..0cf7542 100644 --- a/osm/include/opensm/osm_switch.h +++ b/osm/include/opensm/osm_switch.h @@ -440,9 +440,9 @@ osm_switch_set_hops( * SEE ALSO *********/ -/****f* OpenSM: Switch/osm_switch_set_hops +/****f* OpenSM: Switch/osm_switch_set_min_lid_size * NAME -* osm_switch_set_hops +* osm_switch_set_min_lid_size * * DESCRIPTION * Sets the size of the switch's routing table to at least accomodate the -- 1.4.2.3.g128e From halr at voltaire.com Wed Oct 18 06:47:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Oct 2006 09:47:35 -0400 Subject: [openib-general] [PATCH TRIVIAL] opensm: fix function name in the comment In-Reply-To: <20061018134400.GA13485@sashak.voltaire.com> References: <20061018134400.GA13485@sashak.voltaire.com> Message-ID: <1161179252.25985.6345.camel@hal.voltaire.com> On Wed, 2006-10-18 at 09:44, Sasha Khapyorsky wrote: > This trivially fixes function name (osm_switch_set_min_lid_size) in the > comment. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From ogerlitz at voltaire.com Wed Oct 18 06:53:50 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 18 Oct 2006 15:53:50 +0200 Subject: [openib-general] OFED-1.1-pre1 is ready In-Reply-To: <20061018073244.GA26658@mellanox.co.il> References: <4535D75A.8040001@voltaire.com> <20061018073244.GA26658@mellanox.co.il> Message-ID: <453631EE.6010702@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >>> Try something like >>> git log v2.6.18-rc6..936b9fc0bd1411b52826213a5d89e2ceb4f52a78 >>> to get the list of OFED changes against v2.6.18-rc6. >> thanks for all the info, however i think the OFED docs must state on >> what upstream version are the OFED kernel IB drivers based (ie in this >> case 2.6.18-rc6 tag of linus tree) so one is able to determine that from >> reading the docs only (ie without using GIT). > > Makes sense. Care to formulate the appropriate wording? > Which document should this go into? OK, something in the spirit of (remove the XXX) the below: I will be able to produce something better tomorrow morning. Or. # Kernel based on XXX=2.6.18-rc6 mainline kernel. The patches to this mainline kernel are included within the OFED sources, please see the YYY doc for their location and how to apply them on the kernel sources. From ogerlitz at voltaire.com Wed Oct 18 07:01:04 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 18 Oct 2006 16:01:04 +0200 (IST) Subject: [openib-general] IPoIB multicast neighbour?! Message-ID: While debugging something, i have opened ipoib debug messages and see ib0: neigh_destructor for ffffff ff12:601b:ffff:0000:0000:0000:0000:0002 Do you have an idea what is the source of this neighbour? why it is created and is there a way to eliminate this somehow (my guess is that removing IPv6 support from the kernel will do that). Its a RH4 U3 system with OFED 1.1 rc7 more info below, thanks. Or. # ip a s ib0 9: ib0: mtu 1500 qdisc pfifo_fast qlen 128 link/[32] 00:02:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:97:08:c5 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 192.169.3.235/24 brd 192.169.3.255 scope global ib0 inet6 fe80::208:f104:397:8c5/64 scope link valid_lft forever preferred_lft forever # ip m s ib0 9: ib0 link 00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01 link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:97:08:c5 link 00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01 inet 224.0.0.1 inet6 ff02::1:ff97:8c5 inet6 ff02::1 From swise at opengridcomputing.com Wed Oct 18 08:13:43 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 18 Oct 2006 10:13:43 -0500 Subject: [openib-general] Tools for development In-Reply-To: <20061018123933.GA28308@mellanox.co.il> References: <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> <20061018123933.GA28308@mellanox.co.il> Message-ID: <1161184423.4066.23.camel@stevo-desktop> On Wed, 2006-10-18 at 14:39 +0200, Michael S. Tsirkin wrote: > Quoting r. Jeff Squyres : > > However, this is a larger audience than was on the call. Is there a > > significant movement here from the developers to move to 100% git? > > Life would be somewhat easier for me with 100% git. > Probably for everyone. From changquing.tang at hp.com Wed Oct 18 08:17:35 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Wed, 18 Oct 2006 10:17:35 -0500 Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 In-Reply-To: <218A56C1-2789-4572-BA43-495749B00715@cisco.com> Message-ID: Has the registration site been set up ? --CQ -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Jeff Squyres Sent: Tuesday, October 17, 2006 6:57 AM To: Bill Boas Cc: Open Fabrics; openib-general at openib.org; openib-promoters at openib.org Subject: Re: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 I have copied this information to the wiki -- please make all updates there so that there is a single reference point to find all the information about the meeting. Thanks! https://openib.org/tiki/tiki-index.php?page=Meeting+Minutes On Oct 15, 2006, at 5:02 PM, Bill Boas wrote: > To all in the OpenFabrics Community > > > > We will be holding our first Developer Summit in the Tampa Convention > Center courtesy of SC06 starting at 1.30PM in Room 17 on Thursday > November 16, 2006. On Friday November 17, we will start in Room 13 at > 8.00 AM and continue till 5.00PM. We have had to schedule into these > time slots because no other usable space is available at any other > times during the week of SC06! > > > > OpenFabrics will cater food and beverages for afternoon break and > supper on Thursday, breakfast, lunch and two breaks on Friday. We will > set up a registration site at Acteva to collect $$ to cover our out of > pocket expenses - I'll email out the URL for that site in the next day > or two. > > > > Please review attached Strawman purposes, suggested attendees and > agenda. Any changes or comments, please email them to the community > for all to comment on please. > > > > The Summit has several dimensions and themes throughout our work > there: > > 1) - consistency and robustness of the Linux and Windows software > stacks for Release 2.0 of OpenFabrics; > > 2) - feature selection, development resources and timelines for > Release 2.0; > > 3) - activities, features and processes of the Enterprise Working > Group on OFED 1.x until Release 2.0 is ready hand-off to the EWG; > > 4) - enhancing the resources of the EWG to be ready for 2.0 it so > that it may be subsequently be distributed as OFED 2.0. and adopted > by the OpenFabrics vendor and customer communities for production use. > > > > This is a far too much work for just a day and half! PLEASE START > NOW exchanging ideas for additional features, contact peer > engineers from companies and customers to discuss work sizing, > development resources, identify volunteer developers for items so > that when we meet on the 16th we're not starting from a blank sheet! > > > > Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal > Rosenstock, Tom Tucker and Bob Woodruff are leading the pre- > meeting, STRAWMAN collation of requirements, feature > prioritization, developer assignments, sizing and processes so that > we have the list largely complete prior to the meeting and people > know has already volunteered for items from the list. > > > > Bill Boas > > VP, Business Development | System Fabric Works > > bboas at systemfabricworks.com | 510-375-8840 > > > > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg -- Jeff Squyres Server Virtualization Business Unit Cisco Systems _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From kliteyn at dev.mellanox.co.il Wed Oct 18 09:15:27 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 18 Oct 2006 18:15:27 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - moving win-specific to config.h Message-ID: <4536531F.8070901@dev.mellanox.co.il> Hi Hal As we discussed previously, I've added config.h in windows, and removed windows-specific defines from the common OSM files: opensm/osm_log.c opensm/osm_prtn.c opensm/osm_subnet.c -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_log.c =================================================================== --- opensm/osm_log.c (revision 9869) +++ opensm/osm_log.c (working copy) @@ -96,10 +96,6 @@ static void truncate_log_file(osm_log_t* #else /* Windows */ -#define fstat _fstat -#define stat _stat -#define fileno _fileno - static void truncate_log_file(osm_log_t* const p_log) { fprintf(stderr, "truncate_log_file: cannot truncate on windows system (yet)\n"); Index: opensm/osm_prtn.c =================================================================== --- opensm/osm_prtn.c (revision 9869) +++ opensm/osm_prtn.c (working copy) @@ -61,11 +61,6 @@ #include #include -#ifdef WIN32 -#define snprintf _snprintf -#define stat _stat -#endif - extern int osm_prtn_config_parse_file(osm_log_t * const p_log, osm_subn_t * const p_subn, const char *file_name); Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 9869) +++ opensm/osm_subnet.c (working copy) @@ -658,10 +658,6 @@ __osm_subn_opts_unpack_charp( } } -#ifdef WIN32 -#define snprintf _snprintf -#endif - /********************************************************************** **********************************************************************/ static void From halr at voltaire.com Wed Oct 18 09:27:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Oct 2006 12:27:26 -0400 Subject: [openib-general] [PATCH] osm: port to WinIB stack - moving win-specific to config.h In-Reply-To: <4536531F.8070901@dev.mellanox.co.il> References: <4536531F.8070901@dev.mellanox.co.il> Message-ID: <1161188842.25985.13261.camel@hal.voltaire.com> On Wed, 2006-10-18 at 12:15, Yevgeny Kliteynik wrote: > Hi Hal > > As we discussed previously, I've added config.h in > windows, and removed windows-specific defines from > the common OSM files: > opensm/osm_log.c > opensm/osm_prtn.c > opensm/osm_subnet.c > > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Excellent. Thanks. Applied. -- Hal From eitan at mellanox.co.il Wed Oct 18 09:36:10 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 18 Oct 2006 18:36:10 +0200 Subject: [openib-general] Tools for development In-Reply-To: <1161184423.4066.23.camel@stevo-desktop> References: <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> <20061018123933.GA28308@mellanox.co.il> <1161184423.4066.23.camel@stevo-desktop> Message-ID: <453657FA.9060906@mellanox.co.il> Steve Wise wrote: > On Wed, 2006-10-18 at 14:39 +0200, Michael S. Tsirkin wrote: > >> Quoting r. Jeff Squyres : >> >>> However, this is a larger audience than was on the call. Is there a >>> significant movement here from the developers to move to 100% git? >>> >> Life would be somewhat easier for me with 100% git. >> >> > > Probably for everyone. > Not for me. I hate to move from SVN. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Wed Oct 18 09:43:29 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 18 Oct 2006 09:43:29 -0700 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: Message-ID: <000001c6f2d4$8a73a0e0$ff0da8c0@amr.corp.intel.com> >Actually that will not work, since the undo operation is for when the >next operation (cma_get_port()) fails after we did an acquire_dev, >and in that case the refcount needs to be dropped. So I am not >able to avoid using an extra flag to indicate that a ref was got some >time in the past, and drop it in the error path. I will send that out now. Let's try something like this then (untested): diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 18a4366..0d06431 100755 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1859,16 +1859,20 @@ int rdma_bind_addr(struct rdma_cm_id *id mutex_unlock(&lock); } if (ret) - goto err; + goto err1; } memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); ret = cma_get_port(id_priv); if (ret) - goto err; + goto err2; return 0; -err: +err2: + mutex_lock(&lock); + cma_detach_from_dev(id_priv); + mutex_unlock(&lock); +err1: cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); return ret; } - Sean From mshefty at ichips.intel.com Wed Oct 18 09:48:55 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 18 Oct 2006 09:48:55 -0700 Subject: [openib-general] [PATCH] RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: <20061018051600.6523.29836.sendpatchset@localhost.localdomain> References: <20061018051600.6523.29836.sendpatchset@localhost.localdomain> Message-ID: <45365AF7.8050205@ichips.intel.com> Krishna Kumar wrote: > struct rdma_id_private *id_priv; > + int did_acquire_dev = 0; See my other mail that gets rid of this flag. > @@ -1776,6 +1778,8 @@ int rdma_bind_addr(struct rdma_cm_id *id > > return 0; > err: > + if (did_acquire_dev) > + cma_detach_from_dev(id_priv); We need to lock around cma_detach_from_dev(). - Sean From bugzilla-daemon at openib.org Wed Oct 18 09:56:23 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 18 Oct 2006 09:56:23 -0700 (PDT) Subject: [openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop Message-ID: <20061018165623.8AD7D2283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=263 ------- Comment #11 from sweitzen at cisco.com 2006-10-18 09:56 ------- Roland, I enabled debug_level=1 with OFED 1.1 rc7 RHEL4 U3 x86_64, and got same crash (netserver machine). I could only see the debug_level=1 info by running dmesg in a loop, and the info did not get saved into any /var/log files. Is there some extra configuration needed for syslog? Shouldn't IPoIB debug_level=1 info go into a syslog file by default? Here's what I saw from dmesg loop right before crash. ib1: Port state change event ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Created ah 00000101beffa800 ib1: Created ah 00000101be636800 ib0: Created ah 00000101be5724c0 ib1: Created ah 00000101be9c8a80 ib0: Created ah 00000101bfc57100 ib1: Created ah 00000101be49f700 ib0: Created ah 00000101beffa3c0 ib1: Created ah 00000101beffae80 ib0: Created ah 00000101be636b40 ib1: Created ah 000001019dfecd40 ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0861 MTU > 1024 ib0: PathRec LID 0x0006 for GID fe80:0000:0000:0000:0005:ad00:0020:0861 ib0: Created ah 000001019dfec600 ib0: created address handle 000001019dfecac0 for LID 0x0006, SL 0 ib0: Port state change event ib1: Port state change event ib0: flushing ib0: downing ib_dev ib1: flushing ib1: downing ib_dev ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0861 MTU > 1024 ib0: PathRec LID 0x0006 for GID fe80:0000:0000:0000:0005:ad00:0020:0861 ib0: Created ah 00000101beffa300 ib0: created address handle 000001019dfec1c0 for LID 0x0006, SL 0 ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: Created ah 00000101bfc55e80 ib0: Created ah 00000101bfc4cc80 ib0: Created ah 000001019dfec480 ib0: Created ah 000001019dfec3c0 ib0: Created ah 000001019dfec100 Tue Oct 17 01:05:42 PDT 2006 Message from syslogd at svbu-qa-pcie-1 at Tue Oct 17 01:05:43 2006 ... svbu-qa-pcie-1 kernel: general protection fault: 0000 [1] SMP Here's serial console output from netserver machine. ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet ib0: dev_queue_xmit failed to requeue packet general protection fault: 0000 [1] SMP CPU 0 Modules linked in: rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_ipoib(U) ib_mthca<7>Losi ng some ticks... checking if CPU frequency changed. (U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc ds yenta_socket pcmcia_core dm_mirror dm_multipath dm_mod button battery ac uhci_h cd ehci_hcd hw_random shpchp e1000 floppy sg ext3 jbd aic79xx sd_mod scsi_mod Pid: 7838, comm: ib_mad1 Not tainted 2.6.9-34.ELsmp RIP: 0010:[] {:ib_ipoib:path_rec_completion+ 178} RSP: 0018:00000101a756bc70 EFLAGS: 00010202 warning: many lost ticks. Your time source seems to be instable or some driver is hogging interupts rip mwait_idle+0x56/0x7c RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 00000101bbeffc80 RSI: 0000000000000000 RDI: 00000000fffffffc RBP: 00000101bbeffc80 R08: 0000000000000003 R09: 00000101bbeffca0 R10: ffffffff8011dfe0 R11: ffffffff8011dfe0 R12: 0000ffff1b60167f R13: 00000000fffffffc R14: 0000000000000000 R15: 0000ffff1b6012ff FS: 0000000000000000(0000) GS:ffffffff804d7b00(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006cf5e8 CR3: 0000000000101000 CR4: 00000000000006e0 Process ib_mad1 (pid: 7838, threadinfo 00000101a756a000, task 00000101bdc3b030) Stack: ffffffffa00e547d 00000101afda5000 0000000000000002 00000101afda5380 0000000000000246 0000000000000246 ffffffff802ab017 00000101bc16a500 00000101bbeffca0 00000101bbeffc80 Call Trace:{:ib_sa:ib_sa_path_rec_callback+0} {dev_queue_xmit+525} {:ib_ipoib:path_ rec_completion+885} {:ib_sa:ib_sa_path_rec_callback+64} {:ib_sa:send_handler+74} {:ib_mad:ib_ mad_complete_send_wr+418} {:ib_mad:ib_mad_completion_handler+979} {:ib_mad:ib_mad_completion_handler+0} {worker_thread+419} {default_wake_fun ction+0} {default_wake_function+0} {keventd_cr eate_kthread+0} {worker_thread+0} {keventd_create_kth read+0} {kthread+200} {child_rip+8} {keventd_create_kthread+0} {kthread+0 } {child_rip+0} Code: 49 8b 74 24 08 50 0f b6 42 16 50 0f b6 42 15 50 0f b6 42 14 RIP {:ib_ipoib:path_rec_completion+178} RSP <00000101a756bc70> <0>Kernel panic - not syncing: Oops ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From troy at scl.ameslab.gov Wed Oct 18 09:55:28 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Wed, 18 Oct 2006 11:55:28 -0500 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: <453655E6.3070309@scl.ameslab.gov> References: <453655E6.3070309@scl.ameslab.gov> Message-ID: <829BE3D9-3A7F-4964-BB22-30B94692B7C8@scl.ameslab.gov> (I am taking this back to the openib list because I think the list needs to hear about real applications that are hitting memory registration limits) What are the limits on the ehca memory registrations? Is there a limit to the number of regions that can be registered? Is there any way (with kernel hacks) that we can register the entire address space of the application? We would like to be able to do RDMA sends and receives from anywhere in the application address space eventually, and only register it once. What is the point of RDMA for memory-intensive applications if you have to copy the data to a registered buffer before sending it anyway? On Oct 18, 2006, at 11:27 AM, Kyle Schochenmaier wrote: > Hoang-Nam Nguyen wrote: >> Hi Troy! >> >>> I am running PVFS2 on OpenIB, with IBM's ehca. >>> When we start writing/reading large files, either with the NetPIPE >>> PVFS module we have or a modified GAMESS executable that uses >>> libpvfs2 directly, the 'ibv_reg_mr' function fails, and we get an >>> error. >>> This is also correlated with kernel log messages like this: >>> Oct 16 11:14:45 p5l8 kernel: PU0003 000e0091:ehca_hcall_7arg_7ret >>> HCAD_ERROR opco >>> de=160 ret=fffffffffffffff7 arg1=1000000003000004 arg2=5 >>> arg3=14f0ebc8 arg4=10000 >>> arg5=e0000000000000 arg6=e3e9f200 arg7=0 out1=0 out2=0 out3=0 out4=0 >>> out5=0 out6=0 >>> out7=0 >>> >> Return code f7 from firmware/hvcall means H_NO_MEM. I'm wondering >> if you could provide me with some pre-history of this problem. >> Is this a permanent problem? If yes, could you give me more infos >> on your testcase resp. scenario eg large file size, NetPIPE options? >> Which version of ehca are you using? And which kernel version? >> Thanks! >> Hoang-Nam Nguyen >> >> > I think Troy could better explain what is happening here, so I'm > taking this off-list for now -- we're trying to get this working > for SC'06, so time is limited :) -- if Troy wants to forward this > on to the list after looking at it, thats fine too. > Our app writes out a file once, then reads it in many times through > the pvfs2 system. In the pvfs2 layers, there is memory caching > done at the network level, so memory is registered by the app, and > attempts are made to re-register and/or re-use these memory regions > to save on memory reg overhead. The problem occurs only while > writing files, so while memory is being initially registered with > the nic/app and cached? Also, our tests show that the app runs > normally to completion on identical machines using mellanox hca's > instead of the eHCA. The file sizes are generally >16GByte, > however our failures usually appear by the time ~220-250MBytes have > been written(possibly also all registered)? > > I'm not sure the standard OpenIB NetPIPE runs can reproduce this > type of workload. However, we have developed a working PVFS2- > NetPIPE module which can reproduce this problem on occassion, if > there is interest in further testing this on your end, I can make > it available. > > Our ehca's have the following revision info: > vendor_id: 0x5076 > vendor_part_id: 0 > hw_ver: 0x1000003 > Kernel version is debian 2.6.17 > > I hope this is enough info to get some more insight from everyone. From hnguyen at de.ibm.com Wed Oct 18 09:55:36 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 18 Oct 2006 18:55:36 +0200 Subject: [openib-general] [PATCH OFED-1.1-rc7] libehca configure.in and config.h.in: fix missing check of libsysfs.h Message-ID: <200610181855.36678.hnguyen@de.ibm.com> Hello, below is a patch of configure.in and config.h.in in libehca. It checks the presence of libsysfs.h properly. Unfortunately I recognized this bug lately after I've fixed the "openib.spec" issues and tested ofed on a clean system. Thanks! Nam Signed-off-by: Hoang-Nam Nguyen --- config.h.in | 3 +++ configure.in | 5 +++++ 2 files changed, 8 insertions(+) diff -Nurp openib-1.1/src/userspace/libehca/config.h.in openib-1.1_patch/src/userspace/libehca/config.h.in --- openib-1.1/src/userspace/libehca/config.h.in 2006-10-05 15:07:36.000000000 +0200 +++ openib-1.1_patch/src/userspace/libehca/config.h.in 2006-10-18 17:31:37.000000000 +0200 @@ -27,6 +27,9 @@ /* Define to 1 if you have the header file. */ #undef HAVE_STRING_H +/* Define to 1 if you have the header file. */ +#undef HAVE_SYSFS_LIBSYSFS_H + /* Define to 1 if you have the header file. */ #undef HAVE_SYS_STAT_H diff -Nurp openib-1.1/src/userspace/libehca/configure.in openib-1.1_patch/src/userspace/libehca/configure.in --- openib-1.1/src/userspace/libehca/configure.in 2006-10-05 15:07:03.000000000 +0200 +++ openib-1.1_patch/src/userspace/libehca/configure.in 2006-10-18 17:31:37.000000000 +0200 @@ -25,9 +25,14 @@ AC_CHECK_LIB(ibverbs, [], AC_MSG_ERROR([libibverbs not installed])) +dnl Checks for header files. +AC_CHECK_HEADER(infiniband/driver.h, [], + AC_MSG_ERROR([ not found. libehca requires libibverbs.])) + dnl Checks for library functions AC_CHECK_FUNCS(ibv_read_sysfs_file) fi +AC_CHECK_HEADERS(sysfs/libsysfs.h) dnl Checks for programs. AC_PROG_CC -------------- next part -------------- diff -Nurp openib-1.1/src/userspace/libehca/config.h.in openib-1.1_patch/src/userspace/libehca/config.h.in --- openib-1.1/src/userspace/libehca/config.h.in 2006-10-05 15:07:36.000000000 +0200 +++ openib-1.1_patch/src/userspace/libehca/config.h.in 2006-10-18 17:31:37.000000000 +0200 @@ -27,6 +27,9 @@ /* Define to 1 if you have the header file. */ #undef HAVE_STRING_H +/* Define to 1 if you have the header file. */ +#undef HAVE_SYSFS_LIBSYSFS_H + /* Define to 1 if you have the header file. */ #undef HAVE_SYS_STAT_H diff -Nurp openib-1.1/src/userspace/libehca/configure.in openib-1.1_patch/src/userspace/libehca/configure.in --- openib-1.1/src/userspace/libehca/configure.in 2006-10-05 15:07:03.000000000 +0200 +++ openib-1.1_patch/src/userspace/libehca/configure.in 2006-10-18 17:31:37.000000000 +0200 @@ -25,9 +25,14 @@ AC_CHECK_LIB(ibverbs, [], AC_MSG_ERROR([libibverbs not installed])) +dnl Checks for header files. +AC_CHECK_HEADER(infiniband/driver.h, [], + AC_MSG_ERROR([ not found. libehca requires libibverbs.])) + dnl Checks for library functions AC_CHECK_FUNCS(ibv_read_sysfs_file) fi +AC_CHECK_HEADERS(sysfs/libsysfs.h) dnl Checks for programs. AC_PROG_CC From hnguyen at de.ibm.com Wed Oct 18 10:01:55 2006 From: hnguyen at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 18 Oct 2006 19:01:55 +0200 Subject: [openib-general] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs Message-ID: <200610181901.55412.hnguyen@de.ibm.com> Hello, here is the patch of configure in libehca as a result of the patch "libehca configure.in and config.h.in". It is generated by autogen.sh and pretty lengthy. Hence, I'm attaching it here for completeness. Vlad, do you want me to check it in svn or send you the whole file? Thanks! Nam -------------- next part -------------- diff -Nurp openib-1.1/src/userspace/libehca/configure openib-1.1_patch/src/userspace/libehca/configure --- openib-1.1/src/userspace/libehca/configure 2006-10-05 15:07:39.000000000 +0200 +++ openib-1.1_patch/src/userspace/libehca/configure 2006-10-18 17:31:37.000000000 +0200 @@ -287,8 +287,8 @@ if test "X${echo_test_string+set}" != Xs # find a string as large as possible, as long as the shell can cope with it for cmd in 'sed 50q "$0"' 'sed 20q "$0"' 'sed 10q "$0"' 'sed 2q "$0"' 'echo test'; do # expected sizes: less than 2Kb, 1Kb, 512 bytes, 16 bytes, ... - if (echo_test_string="`eval $cmd`") 2>/dev/null && - echo_test_string="`eval $cmd`" && + if (echo_test_string=`eval $cmd`) 2>/dev/null && + echo_test_string=`eval $cmd` && (test "X$echo_test_string" = "X$echo_test_string") 2>/dev/null then break @@ -3308,7 +3308,7 @@ else if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then lt_cv_path_LD="$ac_dir/$ac_prog" # Check to see if the program is GNU ld. I'd rather use --version, - # but apparently some GNU ld's only accept -v. + # but apparently some variants of GNU ld only accept -v. # Break only if it was the GNU/non-GNU ld that we prefer. case `"$lt_cv_path_LD" -v 2>&1 &6 else - # I'd rather use --version here, but apparently some GNU ld's only accept -v. + # I'd rather use --version here, but apparently some GNU lds only accept -v. case `$LD -v 2>&1 &1 | sed '1q'` in - */dev/null* | *'Invalid file or object type'*) - lt_cv_path_NM="$tmp_nm -B" - break - ;; - *) - case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in - */dev/null*) - lt_cv_path_NM="$tmp_nm -p" + lt_nm_to_check="${ac_tool_prefix}nm" + if test -n "$ac_tool_prefix" && test "$build" = "$host"; then + lt_nm_to_check="$lt_nm_to_check nm" + fi + for lt_tmp_nm in $lt_nm_to_check; do + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR + for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do + IFS="$lt_save_ifs" + test -z "$ac_dir" && ac_dir=. + tmp_nm="$ac_dir/$lt_tmp_nm" + if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext" ; then + # Check to see if the nm accepts a BSD-compat flag. + # Adding the `sed 1q' prevents false positives on HP-UX, which says: + # nm: unknown option "B" ignored + # Tru64's nm complains that /dev/null is an invalid object file + case `"$tmp_nm" -B /dev/null 2>&1 | sed '1q'` in + */dev/null* | *'Invalid file or object type'*) + lt_cv_path_NM="$tmp_nm -B" break ;; *) - lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but - continue # so that we can try to find one that supports BSD flags + case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in + */dev/null*) + lt_cv_path_NM="$tmp_nm -p" + break + ;; + *) + lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but + continue # so that we can try to find one that supports BSD flags + ;; + esac ;; esac - esac - fi + fi + done + IFS="$lt_save_ifs" done - IFS="$lt_save_ifs" test -z "$lt_cv_path_NM" && lt_cv_path_NM=nm fi fi @@ -3512,7 +3519,7 @@ gnu*) hpux10.20* | hpux11*) lt_cv_file_magic_cmd=/usr/bin/file - case "$host_cpu" in + case $host_cpu in ia64*) lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - IA64' lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so @@ -3528,6 +3535,11 @@ hpux10.20* | hpux11*) esac ;; +interix3*) + # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a here + lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|\.a)$' + ;; + irix5* | irix6* | nonstopux*) case $LD in *-32|*"-32 ") libmagic=32-bit;; @@ -3573,15 +3585,11 @@ osf3* | osf4* | osf5*) lt_cv_deplibs_check_method=pass_all ;; -sco3.2v5*) - lt_cv_deplibs_check_method=pass_all - ;; - solaris*) lt_cv_deplibs_check_method=pass_all ;; -sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) +sysv4 | sysv4.3*) case $host_vendor in motorola) lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib) M[0-9][0-9]* Version [0-9]' @@ -3602,10 +3610,13 @@ sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) siemens) lt_cv_deplibs_check_method=pass_all ;; + pc) + lt_cv_deplibs_check_method=pass_all + ;; esac ;; -sysv5OpenUNIX8* | sysv5UnixWare7* | sysv5uw[78]* | unixware7* | sysv4*uw2*) +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) lt_cv_deplibs_check_method=pass_all ;; esac @@ -3623,6 +3634,9 @@ test -z "$deplibs_check_method" && depli # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} + # Allow CC to be a program name with arguments. compiler=$CC @@ -3658,7 +3672,7 @@ ia64-*-hpux*) ;; *-*-irix6*) # Find out which ABI we are using. - echo '#line 3661 "configure"' > conftest.$ac_ext + echo '#line 3675 "configure"' > conftest.$ac_ext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>&5 ac_status=$? @@ -3701,7 +3715,7 @@ x86_64-*linux*|ppc*-*linux*|powerpc*-*li ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then - case "`/usr/bin/file conftest.o`" in + case `/usr/bin/file conftest.o` in *32-bit*) case $host in x86_64-*linux*) @@ -3814,6 +3828,26 @@ echo "${ECHO_T}$lt_cv_cc_needs_belf" >&6 CFLAGS="$SAVE_CFLAGS" fi ;; +sparc*-*solaris*) + # Find out which ABI we are using. + echo 'int i;' > conftest.$ac_ext + if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; then + case `/usr/bin/file conftest.o` in + *64-bit*) + case $lt_cv_prog_gnu_ld in + yes*) LD="${LD-ld} -m elf64_sparc" ;; + *) LD="${LD-ld} -64" ;; + esac + ;; + esac + fi + rm -rf conftest* + ;; + esac @@ -5237,7 +5271,7 @@ fi # Provide some information about the compiler. -echo "$as_me:5240:" \ +echo "$as_me:5274:" \ "checking for Fortran 77 compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 @@ -5434,12 +5468,18 @@ else elif test -x /usr/sbin/sysctl; then lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax` else - lt_cv_sys_max_cmd_len=65536 # usable default for *BSD + lt_cv_sys_max_cmd_len=65536 # usable default for all BSDs fi # And add a safety zone lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` ;; + + interix*) + # We know the value 262144 and hardcode it with a safety zone (like BSD) + lt_cv_sys_max_cmd_len=196608 + ;; + osf*) # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running configure # due to this test when exec_disable_arg_limit is 1 on Tru64. It is not @@ -5453,6 +5493,17 @@ else esac fi ;; + sco3.2v5*) + lt_cv_sys_max_cmd_len=102400 + ;; + sysv5* | sco5v6* | sysv4.2uw2*) + kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null` + if test -n "$kargmax"; then + lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[ ]//'` + else + lt_cv_sys_max_cmd_len=32768 + fi + ;; *) # If test is not a shell built-in, we'll probably end up computing a # maximum length that is only half of the actual maximum length, but @@ -5538,9 +5589,18 @@ irix* | nonstopux*) osf*) symcode='[BCDEGQRST]' ;; -solaris* | sysv5*) +solaris*) symcode='[BDRT]' ;; +sco3.2v5*) + symcode='[DT]' + ;; +sysv4.2uw2*) + symcode='[DT]' + ;; +sysv5* | sco5v6* | unixware* | OpenUNIX*) + symcode='[ABDT]' + ;; sysv4) symcode='[DFNSTU]' ;; @@ -5749,7 +5809,7 @@ rm="rm -f" default_ofile=libtool can_build_shared=yes -# All known linkers require a `.a' archive for static linking (except M$VC, +# All known linkers require a `.a' archive for static linking (except MSVC, # which needs '.lib'). libext=a ltmain="$ac_aux_dir/ltmain.sh" @@ -6006,6 +6066,7 @@ test -z "$AR_FLAGS" && AR_FLAGS=cru test -z "$AS" && AS=as test -z "$CC" && CC=cc test -z "$LTCC" && LTCC=$CC +test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS test -z "$DLLTOOL" && DLLTOOL=dlltool test -z "$LD" && LD=ld test -z "$LN_S" && LN_S="ln -s" @@ -6025,10 +6086,10 @@ old_postuninstall_cmds= if test -n "$RANLIB"; then case $host_os in openbsd*) - old_postinstall_cmds="\$RANLIB -t \$oldlib~$old_postinstall_cmds" + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$oldlib" ;; *) - old_postinstall_cmds="\$RANLIB \$oldlib~$old_postinstall_cmds" + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$oldlib" ;; esac old_archive_cmds="$old_archive_cmds~\$RANLIB \$oldlib" @@ -6070,7 +6131,7 @@ else if test -n "$file_magic_test_file"; then case $deplibs_check_method in "file_magic "*) - file_magic_regex="`expr \"$deplibs_check_method\" : \"file_magic \(.*\)\"`" + file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` MAGIC_CMD="$lt_cv_path_MAGIC_CMD" if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | $EGREP "$file_magic_regex" > /dev/null; then @@ -6132,7 +6193,7 @@ else if test -n "$file_magic_test_file"; then case $deplibs_check_method in "file_magic "*) - file_magic_regex="`expr \"$deplibs_check_method\" : \"file_magic \(.*\)\"`" + file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` MAGIC_CMD="$lt_cv_path_MAGIC_CMD" if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | $EGREP "$file_magic_regex" > /dev/null; then @@ -6227,6 +6288,9 @@ lt_simple_link_test_code='int main(){ret # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} + # Allow CC to be a program name with arguments. compiler=$CC @@ -6234,82 +6298,17 @@ compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* -# -# Check for any special shared library compilation flags. -# -lt_prog_cc_shlib= -if test "$GCC" = no; then - case $host_os in - sco3.2v5*) - lt_prog_cc_shlib='-belf' - ;; - esac -fi -if test -n "$lt_prog_cc_shlib"; then - { echo "$as_me:$LINENO: WARNING: \`$CC' requires \`$lt_prog_cc_shlib' to build shared libraries" >&5 -echo "$as_me: WARNING: \`$CC' requires \`$lt_prog_cc_shlib' to build shared libraries" >&2;} - if echo "$old_CC $old_CFLAGS " | grep "[ ]$lt_prog_cc_shlib[ ]" >/dev/null; then : - else - { echo "$as_me:$LINENO: WARNING: add \`$lt_prog_cc_shlib' to the CC or CFLAGS env variable and reconfigure" >&5 -echo "$as_me: WARNING: add \`$lt_prog_cc_shlib' to the CC or CFLAGS env variable and reconfigure" >&2;} - lt_cv_prog_cc_can_build_shared=no - fi -fi - - -# -# Check to make sure the static flag actually works. -# -echo "$as_me:$LINENO: checking if $compiler static flag $lt_prog_compiler_static works" >&5 -echo $ECHO_N "checking if $compiler static flag $lt_prog_compiler_static works... $ECHO_C" >&6 -if test "${lt_prog_compiler_static_works+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - lt_prog_compiler_static_works=no - save_LDFLAGS="$LDFLAGS" - LDFLAGS="$LDFLAGS $lt_prog_compiler_static" - printf "$lt_simple_link_test_code" > conftest.$ac_ext - if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings - if test -s conftest.err; then - # Append any errors to the config.log. - cat conftest.err 1>&5 - $echo "X$_lt_linker_boilerplate" | $Xsed > conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if diff conftest.exp conftest.er2 >/dev/null; then - lt_prog_compiler_static_works=yes - fi - else - lt_prog_compiler_static_works=yes - fi - fi - $rm conftest* - LDFLAGS="$save_LDFLAGS" - -fi -echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works" >&5 -echo "${ECHO_T}$lt_prog_compiler_static_works" >&6 - -if test x"$lt_prog_compiler_static_works" = xyes; then - : -else - lt_prog_compiler_static= -fi - - - lt_prog_compiler_no_builtin_flag= @@ -6332,20 +6331,20 @@ else # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:6338: $lt_compile\"" >&5) + (eval echo "\"\$as_me:6337: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 - echo "$as_me:6342: \$? = $ac_status" >&5 + echo "$as_me:6341: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. - $echo "X$_lt_compiler_boilerplate" | $Xsed >conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if test ! -s conftest.err || diff conftest.exp conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_cv_prog_compiler_rtti_exceptions=yes fi fi @@ -6406,6 +6405,11 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_pic='-fno-common' ;; + interix3*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; + msdosdjgpp*) # Just because we use GCC doesn't mean we suddenly get shared libraries # on systems that don't support them. @@ -6422,7 +6426,7 @@ echo $ECHO_N "checking for $compiler opt hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. - case "$host_cpu" in + case $host_cpu in hppa*64*|ia64*) # +Z the default ;; @@ -6469,7 +6473,7 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_wl='-Wl,' # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. - case "$host_cpu" in + case $host_cpu in hppa*64*|ia64*) # +Z the default ;; @@ -6499,12 +6503,12 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-static' ;; - pgcc* | pgf77* | pgf90*) + pgcc* | pgf77* | pgf90* | pgf95*) # Portland Group compilers (*not* the Pentium gcc compiler, # which looks to be a dead project) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_pic='-fpic' - lt_prog_compiler_static='-static' + lt_prog_compiler_static='-Bstatic' ;; ccc*) lt_prog_compiler_wl='-Wl,' @@ -6520,11 +6524,6 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_static='-non_shared' ;; - sco3.2v5*) - lt_prog_compiler_pic='-Kpic' - lt_prog_compiler_static='-dn' - ;; - solaris*) lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-Bstatic' @@ -6542,7 +6541,7 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_static='-Bstatic' ;; - sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) + sysv4 | sysv4.2uw2* | sysv4.3*) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_pic='-KPIC' lt_prog_compiler_static='-Bstatic' @@ -6555,6 +6554,12 @@ echo $ECHO_N "checking for $compiler opt fi ;; + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + ;; + unicos*) lt_prog_compiler_wl='-Wl,' lt_prog_compiler_can_build_shared=no @@ -6594,20 +6599,20 @@ else # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:6600: $lt_compile\"" >&5) + (eval echo "\"\$as_me:6605: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 - echo "$as_me:6604: \$? = $ac_status" >&5 + echo "$as_me:6609: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. - $echo "X$_lt_compiler_boilerplate" | $Xsed >conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if test ! -s conftest.err || diff conftest.exp conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works=yes fi fi @@ -6628,7 +6633,7 @@ else fi fi -case "$host_os" in +case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic= @@ -6638,6 +6643,48 @@ case "$host_os" in ;; esac +# +# Check to make sure the static flag actually works. +# +wl=$lt_prog_compiler_wl eval lt_tmp_static_flag=\"$lt_prog_compiler_static\" +echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 +echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 +if test "${lt_prog_compiler_static_works+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + lt_prog_compiler_static_works=no + save_LDFLAGS="$LDFLAGS" + LDFLAGS="$LDFLAGS $lt_tmp_static_flag" + printf "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&5 + $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + lt_prog_compiler_static_works=yes + fi + else + lt_prog_compiler_static_works=yes + fi + fi + $rm conftest* + LDFLAGS="$save_LDFLAGS" + +fi +echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works" >&5 +echo "${ECHO_T}$lt_prog_compiler_static_works" >&6 + +if test x"$lt_prog_compiler_static_works" = xyes; then + : +else + lt_prog_compiler_static= +fi + + echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o+set}" = set; then @@ -6656,25 +6703,25 @@ else # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:6662: $lt_compile\"" >&5) + (eval echo "\"\$as_me:6709: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 - echo "$as_me:6666: \$? = $ac_status" >&5 + echo "$as_me:6713: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings - $echo "X$_lt_compiler_boilerplate" | $Xsed > out/conftest.exp - $SED '/^$/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.err || diff out/conftest.exp out/conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o=yes fi fi - chmod u+w . + chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation @@ -6770,6 +6817,10 @@ cc_basename=`$echo "X$cc_temp" | $Xsed - with_gnu_ld=no fi ;; + interix*) + # we just hope/assume this is gcc and not c89 (= MSVC++) + with_gnu_ld=yes + ;; openbsd*) with_gnu_ld=no ;; @@ -6854,7 +6905,7 @@ EOF export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then @@ -6863,22 +6914,38 @@ EOF echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ - $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' + $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs=no fi ;; + interix3*) + hardcode_direct=no + hardcode_shlibpath_var=no + hardcode_libdir_flag_spec='${wl}-rpath,$libdir' + export_dynamic_flag_spec='${wl}-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + archive_expsym_cmds='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + ;; + linux*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then tmp_addflag= case $cc_basename,$host_cpu in pgcc*) # Portland Group C compiler - whole_archive_flag_spec= + whole_archive_flag_spec='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag' ;; - pgf77* | pgf90* ) # Portland Group f77 and f90 compilers - whole_archive_flag_spec= - tmp_addflag=' -fpic -Mnomain' ;; + pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers + whole_archive_flag_spec='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag -Mnomain' ;; ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 tmp_addflag=' -i_dynamic' ;; efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 @@ -6909,7 +6976,7 @@ EOF fi ;; - solaris* | sysv5*) + solaris*) if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then ld_shlibs=no cat <&2 @@ -6930,6 +6997,33 @@ EOF fi ;; + sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) + case `$LD -v 2>&1` in + *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) + ld_shlibs=no + cat <<_LT_EOF 1>&2 + +*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not +*** reliably create shared libraries on SCO systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.16.91.0.3 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. + +_LT_EOF + ;; + *) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + hardcode_libdir_flag_spec='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' + archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib' + else + ld_shlibs=no + fi + ;; + esac + ;; + sunos4*) archive_cmds='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' wlarc= @@ -6963,7 +7057,7 @@ EOF # Note: this linker hardcodes the directories in LIBPATH if there # are no directories specified by -L. hardcode_minus_L=yes - if test "$GCC" = yes && test -z "$link_static_flag"; then + if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then # Neither direct hardcoding nor static linking is supported with a # broken collect2. hardcode_direct=unsupported @@ -6997,6 +7091,7 @@ EOF break fi done + ;; esac exp_sym_flag='-bexport' @@ -7034,6 +7129,7 @@ EOF hardcode_libdir_flag_spec='-L$libdir' hardcode_libdir_separator= fi + ;; esac shared_flag='-shared' if test "$aix_use_runtimelinking" = yes; then @@ -7046,11 +7142,11 @@ EOF # chokes on -Wl,-G. The following line is correct: shared_flag='-G' else - if test "$aix_use_runtimelinking" = yes; then + if test "$aix_use_runtimelinking" = yes; then shared_flag='${wl}-G' else shared_flag='${wl}-bM:SRE' - fi + fi fi fi @@ -7115,12 +7211,12 @@ rm -f conftest.err conftest.$ac_objext \ if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec='${wl}-blibpath:$libdir:'"$aix_libpath" - archive_expsym_cmds="\$CC"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$no_entry_flag \${wl}$exp_sym_flag:\$export_symbols $shared_flag" + archive_expsym_cmds="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" else if test "$host_cpu" = ia64; then hardcode_libdir_flag_spec='${wl}-R $libdir:/usr/lib:/lib' allow_undefined_flag="-z nodefs" - archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$no_entry_flag \${wl}$exp_sym_flag:\$export_symbols" + archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" else # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF @@ -7180,13 +7276,11 @@ if test -z "$aix_libpath"; then aix_libp # -berok will link without error, but may produce a broken library. no_undefined_flag=' ${wl}-bernotok' allow_undefined_flag=' ${wl}-berok' - # -bexpall does not export symbols beginning with underscore (_) - always_export_symbols=yes # Exported symbols can be pulled into shared objects from archives - whole_archive_flag_spec=' ' + whole_archive_flag_spec='$convenience' archive_cmds_need_lc=yes - # This is similar to how AIX traditionally builds it's shared libraries. - archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}-bE:$export_symbols ${wl}-bnoentry${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' + # This is similar to how AIX traditionally builds its shared libraries. + archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; @@ -7225,7 +7319,7 @@ if test -z "$aix_libpath"; then aix_libp ;; darwin* | rhapsody*) - case "$host_os" in + case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag='${wl}-undefined ${wl}suppress' ;; @@ -7254,7 +7348,7 @@ if test -z "$aix_libpath"; then aix_libp output_verbose_link_cmd='echo' archive_cmds='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' module_cmds='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else @@ -7263,7 +7357,7 @@ if test -z "$aix_libpath"; then aix_libp output_verbose_link_cmd='echo' archive_cmds='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; @@ -7327,47 +7421,62 @@ if test -z "$aix_libpath"; then aix_libp export_dynamic_flag_spec='${wl}-E' ;; - hpux10* | hpux11*) + hpux10*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then - case "$host_cpu" in - hppa*64*|ia64*) + archive_cmds='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + fi + if test "$with_gnu_ld" = no; then + hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' + hardcode_libdir_separator=: + + hardcode_direct=yes + export_dynamic_flag_spec='${wl}-E' + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L=yes + fi + ;; + + hpux11*) + if test "$GCC" = yes -a "$with_gnu_ld" = no; then + case $host_cpu in + hppa*64*) archive_cmds='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; + ia64*) + archive_cmds='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; *) archive_cmds='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac else - case "$host_cpu" in - hppa*64*|ia64*) - archive_cmds='$LD -b +h $soname -o $lib $libobjs $deplibs $linker_flags' + case $host_cpu in + hppa*64*) + archive_cmds='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + archive_cmds='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) - archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + archive_cmds='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac fi if test "$with_gnu_ld" = no; then - case "$host_cpu" in - hppa*64*) - hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' + hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' + hardcode_libdir_separator=: + + case $host_cpu in + hppa*64*|ia64*) hardcode_libdir_flag_spec_ld='+b $libdir' - hardcode_libdir_separator=: - hardcode_direct=no - hardcode_shlibpath_var=no - ;; - ia64*) - hardcode_libdir_flag_spec='-L$libdir' hardcode_direct=no hardcode_shlibpath_var=no - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L=yes ;; *) - hardcode_libdir_flag_spec='${wl}+b ${wl}$libdir' - hardcode_libdir_separator=: hardcode_direct=yes export_dynamic_flag_spec='${wl}-E' @@ -7469,14 +7578,6 @@ if test -z "$aix_libpath"; then aix_libp hardcode_libdir_separator=: ;; - sco3.2v5*) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_shlibpath_var=no - export_dynamic_flag_spec='${wl}-Bexport' - runpath_var=LD_RUN_PATH - hardcode_runpath_var=yes - ;; - solaris*) no_undefined_flag=' -z text' if test "$GCC" = yes; then @@ -7562,36 +7663,45 @@ if test -z "$aix_libpath"; then aix_libp fi ;; - sysv4.2uw2*) - archive_cmds='$LD -G -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct=yes - hardcode_minus_L=no + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7*) + no_undefined_flag='${wl}-z,text' + archive_cmds_need_lc=no hardcode_shlibpath_var=no - hardcode_runpath_var=yes - runpath_var=LD_RUN_PATH - ;; + runpath_var='LD_RUN_PATH' - sysv5OpenUNIX8* | sysv5UnixWare7* | sysv5uw[78]* | unixware7*) - no_undefined_flag='${wl}-z ${wl}text' if test "$GCC" = yes; then - archive_cmds='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_cmds='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' else - archive_cmds='$CC -G ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_cmds='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' fi - runpath_var='LD_RUN_PATH' - hardcode_shlibpath_var=no ;; - sysv5*) - no_undefined_flag=' -z text' - # $CC -shared without GNU ld will not create a library from C++ - # object files and a static libstdc++, better avoid it by now - archive_cmds='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' - archive_expsym_cmds='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ - $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' - hardcode_libdir_flag_spec= + sysv5* | sco3.2v5* | sco5v6*) + # Note: We can NOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + no_undefined_flag='${wl}-z,text' + allow_undefined_flag='${wl}-z,nodefs' + archive_cmds_need_lc=no hardcode_shlibpath_var=no + hardcode_libdir_flag_spec='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' + hardcode_libdir_separator=':' + link_all_deplibs=yes + export_dynamic_flag_spec='${wl}-Bexport' runpath_var='LD_RUN_PATH' + + if test "$GCC" = yes; then + archive_cmds='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + fi ;; uts4*) @@ -7610,11 +7720,6 @@ echo "$as_me:$LINENO: result: $ld_shlibs echo "${ECHO_T}$ld_shlibs" >&6 test "$ld_shlibs" = no && can_build_shared=no -variables_saved_for_relink="PATH $shlibpath_var $runpath_var" -if test "$GCC" = yes; then - variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" -fi - # # Do we need to explicitly link libc? # @@ -7647,6 +7752,7 @@ echo $ECHO_N "checking whether -lc shoul libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl + pic_flag=$lt_prog_compiler_pic compiler_flags=-v linker_flags=-v verstring= @@ -7807,7 +7913,8 @@ cygwin* | mingw* | pw32*) dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname' + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' @@ -7860,7 +7967,7 @@ darwin* | rhapsody*) soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH - shrext_cmds='$(test .$module = .yes && echo .so || echo .dylib)' + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` @@ -7898,7 +8005,14 @@ kfreebsd*-gnu) freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. - objformat=`test -x /usr/bin/objformat && /usr/bin/objformat || echo aout` + if test -x /usr/bin/objformat; then + objformat=`/usr/bin/objformat` + else + case $host_os in + freebsd[123]*) objformat=aout ;; + *) objformat=elf ;; + esac + fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) @@ -7920,10 +8034,15 @@ freebsd* | dragonfly*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; - *) # from 3.2 on + freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ + freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; + freebsd*) # from 4.6 on + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; esac ;; @@ -7943,7 +8062,7 @@ hpux9* | hpux10* | hpux11*) version_type=sunos need_lib_prefix=no need_version=no - case "$host_cpu" in + case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes @@ -7983,6 +8102,18 @@ hpux9* | hpux10* | hpux11*) postinstall_cmds='chmod 555 $lib' ;; +interix3*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; + irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; @@ -8040,31 +8171,10 @@ linux*) # before this can be enabled. hardcode_into_libs=yes - # find out which ABI we are using - libsuff= - case "$host_cpu" in - x86_64*|s390x*|powerpc64*) - echo '#line 8047 "configure"' > conftest.$ac_ext - if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; then - case `/usr/bin/file conftest.$ac_objext` in - *64-bit*) - libsuff=64 - sys_lib_search_path_spec="/lib${libsuff} /usr/lib${libsuff} /usr/local/lib${libsuff}" - ;; - esac - fi - rm -rf conftest* - ;; - esac - # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then - lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:,\t]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` - sys_lib_dlsearch_path_spec="/lib${libsuff} /usr/lib${libsuff} $lt_ld_extra" + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` + sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on @@ -8125,8 +8235,13 @@ nto-qnx*) openbsd*) version_type=sunos + sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no - need_version=no + # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. + case $host_os in + openbsd3.3 | openbsd3.3.*) need_version=yes ;; + *) need_version=no ;; + esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH @@ -8164,13 +8279,6 @@ osf3* | osf4* | osf5*) sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; -sco3.2v5*) - version_type=osf - soname_spec='${libname}${release}${shared_ext}$major' - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - ;; - solaris*) version_type=linux need_lib_prefix=no @@ -8196,7 +8304,7 @@ sunos4*) need_version=yes ;; -sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) +sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' @@ -8229,6 +8337,29 @@ sysv4*MP*) fi ;; +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + version_type=freebsd-elf + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + if test "$with_gnu_ld" = yes; then + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' + shlibpath_overrides_runpath=no + else + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' + shlibpath_overrides_runpath=yes + case $host_os in + sco3.2v5*) + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" + ;; + esac + fi + sys_lib_dlsearch_path_spec='/usr/lib' + ;; + uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' @@ -8244,6 +8375,11 @@ echo "$as_me:$LINENO: result: $dynamic_l echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" +if test "$GCC" = yes; then + variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" +fi + echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action= @@ -8899,7 +9035,7 @@ fi test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" save_LDFLAGS="$LDFLAGS" - eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" save_LIBS="$LIBS" LIBS="$lt_cv_dlopen_libs $LIBS" @@ -8915,7 +9051,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <&5 (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then - (./conftest; exit; ) 2>/dev/null + (./conftest; exit; ) >&5 2>/dev/null lt_status=$? case x$lt_status in x$lt_dlno_uscore) lt_cv_dlopen_self=yes ;; x$lt_dlneed_uscore) lt_cv_dlopen_self=yes ;; - x$lt_unknown|x*) lt_cv_dlopen_self=no ;; + x$lt_dlunknown|x*) lt_cv_dlopen_self=no ;; esac else : # compilation failed @@ -9001,7 +9139,7 @@ echo "$as_me:$LINENO: result: $lt_cv_dlo echo "${ECHO_T}$lt_cv_dlopen_self" >&6 if test "x$lt_cv_dlopen_self" = xyes; then - LDFLAGS="$LDFLAGS $link_static_flag" + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $lt_prog_compiler_static\" echo "$as_me:$LINENO: checking whether a statically linked program can dlopen itself" >&5 echo $ECHO_N "checking whether a statically linked program can dlopen itself... $ECHO_C" >&6 if test "${lt_cv_dlopen_self_static+set}" = set; then @@ -9013,7 +9151,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <&5 (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then - (./conftest; exit; ) 2>/dev/null + (./conftest; exit; ) >&5 2>/dev/null lt_status=$? case x$lt_status in x$lt_dlno_uscore) lt_cv_dlopen_self_static=yes ;; x$lt_dlneed_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_unknown|x*) lt_cv_dlopen_self_static=no ;; + x$lt_dlunknown|x*) lt_cv_dlopen_self_static=no ;; esac else : # compilation failed @@ -9117,7 +9257,7 @@ echo "${ECHO_T}$lt_cv_dlopen_self_static fi -# Report which librarie types wil actually be built +# Report which library types will actually be built echo "$as_me:$LINENO: checking if libtool supports shared libraries" >&5 echo $ECHO_N "checking if libtool supports shared libraries... $ECHO_C" >&6 echo "$as_me:$LINENO: result: $can_build_shared" >&5 @@ -9129,7 +9269,7 @@ test "$can_build_shared" = "no" && enabl # On AIX, shared libraries and static libraries use the same namespace, and # are all built from PIC. -case "$host_os" in +case $host_os in aix3*) test "$enable_shared" = yes && enable_static=no if test -n "$RANLIB"; then @@ -9167,7 +9307,7 @@ if test -f "$ltmain"; then # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. - for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC NM \ + for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ @@ -9333,6 +9473,9 @@ AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC +# LTCC compiler flags. +LTCFLAGS=$lt_LTCFLAGS + # A language-specific compiler. CC=$lt_compiler @@ -9663,6 +9806,9 @@ echo "$as_me: WARNING: output file \`$of echo "$as_me: WARNING: using \`LTCC=$LTCC', extracted from \`$ofile'" >&2;} fi fi + if test -z "$LTCFLAGS"; then + eval "`$SHELL ${ofile} --config | grep '^LTCFLAGS='`" + fi # Extract list of available tagged configurations in $ofile. # Note that this assumes the entire list is on one line. @@ -9715,6 +9861,7 @@ hardcode_libdir_flag_spec_CXX= hardcode_libdir_flag_spec_ld_CXX= hardcode_libdir_separator_CXX= hardcode_minus_L_CXX=no +hardcode_shlibpath_var_CXX=unsupported hardcode_automatic_CXX=no module_cmds_CXX= module_expsym_cmds_CXX= @@ -9732,7 +9879,7 @@ postdeps_CXX= compiler_lib_search_path_CXX= # Source file extension for C++ test sources. -ac_ext=cc +ac_ext=cpp # Object file extension for compiled C++ test sources. objext=o @@ -9742,13 +9889,16 @@ objext_CXX=$objext lt_simple_compile_test_code="int some_variable = 0;\n" # Code to be used in simple link tests -lt_simple_link_test_code='int main(int, char *) { return(0); }\n' +lt_simple_link_test_code='int main(int, char *[]) { return(0); }\n' # ltmain only uses $CC for tagged configurations so make sure $CC is set. # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} + # Allow CC to be a program name with arguments. compiler=$CC @@ -9756,13 +9906,13 @@ compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* @@ -9777,12 +9927,12 @@ lt_save_path_LD=$lt_cv_path_LD if test -n "${lt_cv_prog_gnu_ldcxx+set}"; then lt_cv_prog_gnu_ld=$lt_cv_prog_gnu_ldcxx else - unset lt_cv_prog_gnu_ld + $as_unset lt_cv_prog_gnu_ld fi if test -n "${lt_cv_path_LDCXX+set}"; then lt_cv_path_LD=$lt_cv_path_LDCXX else - unset lt_cv_path_LD + $as_unset lt_cv_path_LD fi test -z "${LDCXX+set}" || LD=$LDCXX CC=${CXX-"c++"} @@ -9868,7 +10018,7 @@ else if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then lt_cv_path_LD="$ac_dir/$ac_prog" # Check to see if the program is GNU ld. I'd rather use --version, - # but apparently some GNU ld's only accept -v. + # but apparently some variants of GNU ld only accept -v. # Break only if it was the GNU/non-GNU ld that we prefer. case `"$lt_cv_path_LD" -v 2>&1 &6 else - # I'd rather use --version here, but apparently some GNU ld's only accept -v. + # I'd rather use --version here, but apparently some GNU lds only accept -v. case `$LD -v 2>&1 conftest.$ac_ext <<_ACEOF @@ -10177,16 +10329,26 @@ if test -z "$aix_libpath"; then aix_libp # -berok will link without error, but may produce a broken library. no_undefined_flag_CXX=' ${wl}-bernotok' allow_undefined_flag_CXX=' ${wl}-berok' - # -bexpall does not export symbols beginning with underscore (_) - always_export_symbols_CXX=yes # Exported symbols can be pulled into shared objects from archives - whole_archive_flag_spec_CXX=' ' + whole_archive_flag_spec_CXX='$convenience' archive_cmds_need_lc_CXX=yes - # This is similar to how AIX traditionally builds it's shared libraries. - archive_expsym_cmds_CXX="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}-bE:$export_symbols ${wl}-bnoentry${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' + # This is similar to how AIX traditionally builds its shared libraries. + archive_expsym_cmds_CXX="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; + + beos*) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + allow_undefined_flag_CXX=unsupported + # Joseph Beckenbach says some releases of gcc + # support --undefined. This deserves some investigation. FIXME + archive_cmds_CXX='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + else + ld_shlibs_CXX=no + fi + ;; + chorus*) case $cc_basename in *) @@ -10196,7 +10358,6 @@ if test -z "$aix_libpath"; then aix_libp esac ;; - cygwin* | mingw* | pw32*) # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, CXX) is actually meaningless, # as there is no search path for DLLs. @@ -10206,7 +10367,7 @@ if test -z "$aix_libpath"; then aix_libp enable_shared_with_static_runtimes_CXX=yes if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then - archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' + archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds_CXX='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then @@ -10215,13 +10376,13 @@ if test -z "$aix_libpath"; then aix_libp echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ - $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' + $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs_CXX=no fi ;; darwin* | rhapsody*) - case "$host_os" in + case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag_CXX='${wl}-undefined ${wl}suppress' ;; @@ -10259,7 +10420,7 @@ if test -z "$aix_libpath"; then aix_libp archive_cmds_CXX='$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring' fi module_cmds_CXX='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds if test "X$lt_int_apple_cc_single_mod" = Xyes ; then archive_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else @@ -10272,7 +10433,7 @@ if test -z "$aix_libpath"; then aix_libp output_verbose_link_cmd='echo' archive_cmds_CXX='$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds_CXX='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_CXX='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; @@ -10352,33 +10513,22 @@ if test -z "$aix_libpath"; then aix_libp ;; hpux10*|hpux11*) if test $with_gnu_ld = no; then - case "$host_cpu" in - hppa*64*) - hardcode_libdir_flag_spec_CXX='${wl}+b ${wl}$libdir' + hardcode_libdir_flag_spec_CXX='${wl}+b ${wl}$libdir' + hardcode_libdir_separator_CXX=: + + case $host_cpu in + hppa*64*|ia64*) hardcode_libdir_flag_spec_ld_CXX='+b $libdir' - hardcode_libdir_separator_CXX=: - ;; - ia64*) - hardcode_libdir_flag_spec_CXX='-L$libdir' ;; *) - hardcode_libdir_flag_spec_CXX='${wl}+b ${wl}$libdir' - hardcode_libdir_separator_CXX=: export_dynamic_flag_spec_CXX='${wl}-E' ;; esac fi - case "$host_cpu" in - hppa*64*) - hardcode_direct_CXX=no - hardcode_shlibpath_var_CXX=no - ;; - ia64*) + case $host_cpu in + hppa*64*|ia64*) hardcode_direct_CXX=no hardcode_shlibpath_var_CXX=no - hardcode_minus_L_CXX=yes # Not in the search PATH, - # but as the default - # location of the library. ;; *) hardcode_direct_CXX=yes @@ -10394,9 +10544,12 @@ if test -z "$aix_libpath"; then aix_libp ld_shlibs_CXX=no ;; aCC*) - case "$host_cpu" in - hppa*64*|ia64*) - archive_cmds_CXX='$LD -b +h $soname -o $lib $linker_flags $libobjs $deplibs' + case $host_cpu in + hppa*64*) + archive_cmds_CXX='$CC -b ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + ia64*) + archive_cmds_CXX='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; *) archive_cmds_CXX='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' @@ -10415,9 +10568,12 @@ if test -z "$aix_libpath"; then aix_libp *) if test "$GXX" = yes; then if test $with_gnu_ld = no; then - case "$host_cpu" in - ia64*|hppa*64*) - archive_cmds_CXX='$LD -b +h $soname -o $lib $linker_flags $libobjs $deplibs' + case $host_cpu in + hppa*64*) + archive_cmds_CXX='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + ia64*) + archive_cmds_CXX='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' ;; *) archive_cmds_CXX='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' @@ -10431,6 +10587,20 @@ if test -z "$aix_libpath"; then aix_libp ;; esac ;; + interix3*) + hardcode_direct_CXX=no + hardcode_shlibpath_var_CXX=no + hardcode_libdir_flag_spec_CXX='${wl}-rpath,$libdir' + export_dynamic_flag_spec_CXX='${wl}-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + archive_cmds_CXX='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + archive_expsym_cmds_CXX='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + ;; irix5* | irix6*) case $cc_basename in CC*) @@ -10511,12 +10681,12 @@ if test -z "$aix_libpath"; then aix_libp ;; pgCC*) # Portland Group C++ compiler - archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib' - archive_expsym_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib' + archive_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib' + archive_expsym_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib' hardcode_libdir_flag_spec_CXX='${wl}--rpath ${wl}$libdir' export_dynamic_flag_spec_CXX='${wl}--export-dynamic' - whole_archive_flag_spec_CXX='' + whole_archive_flag_spec_CXX='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' ;; cxx*) # Compaq C++ @@ -10713,19 +10883,6 @@ if test -z "$aix_libpath"; then aix_libp # FIXME: insert proper C++ library support ld_shlibs_CXX=no ;; - sco*) - archive_cmds_need_lc_CXX=no - case $cc_basename in - CC*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - esac - ;; sunos4*) case $cc_basename in CC*) @@ -10748,10 +10905,11 @@ if test -z "$aix_libpath"; then aix_libp case $cc_basename in CC*) # Sun C++ 4.2, 5.x and Centerline C++ + archive_cmds_need_lc_CXX=yes no_undefined_flag_CXX=' -zdefs' - archive_cmds_CXX='$CC -G${allow_undefined_flag} -nolib -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + archive_cmds_CXX='$CC -G${allow_undefined_flag} -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' archive_expsym_cmds_CXX='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ - $CC -G${allow_undefined_flag} -nolib ${wl}-M ${wl}$lib.exp -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' + $CC -G${allow_undefined_flag} ${wl}-M ${wl}$lib.exp -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' hardcode_libdir_flag_spec_CXX='-R$libdir' hardcode_shlibpath_var_CXX=no @@ -10771,15 +10929,7 @@ if test -z "$aix_libpath"; then aix_libp esac link_all_deplibs_CXX=yes - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - # - # There doesn't appear to be a way to prevent this compiler from - # explicitly linking system object files so we need to strip them - # from the output so that they don't get included in the library - # dependencies. - output_verbose_link_cmd='templist=`$CC -G $CFLAGS -v conftest.$objext 2>&1 | grep "\-[LR]"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + output_verbose_link_cmd='echo' # Archives containing C++ object files must be created using # "CC -xar", where "CC" is the Sun C++ compiler. This is @@ -10825,8 +10975,59 @@ if test -z "$aix_libpath"; then aix_libp ;; esac ;; - sysv5OpenUNIX8* | sysv5UnixWare7* | sysv5uw[78]* | unixware7*) + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7* | sco3.2v5.0.[024]*) + no_undefined_flag_CXX='${wl}-z,text' + archive_cmds_need_lc_CXX=no + hardcode_shlibpath_var_CXX=no + runpath_var='LD_RUN_PATH' + + case $cc_basename in + CC*) + archive_cmds_CXX='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_CXX='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + archive_cmds_CXX='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_CXX='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + ;; + sysv5* | sco3.2v5* | sco5v6*) + # Note: We can NOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + # For security reasons, it is highly recommended that you always + # use absolute paths for naming shared libraries, and exclude the + # DT_RUNPATH tag from executables and libraries. But doing so + # requires that you compile everything twice, which is a pain. + # So that behaviour is only enabled if SCOABSPATH is set to a + # non-empty value in the environment. Most likely only useful for + # creating official distributions of packages. + # This is a hack until libtool officially supports absolute path + # names for shared libraries. + no_undefined_flag_CXX='${wl}-z,text' + allow_undefined_flag_CXX='${wl}-z,nodefs' archive_cmds_need_lc_CXX=no + hardcode_shlibpath_var_CXX=no + hardcode_libdir_flag_spec_CXX='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' + hardcode_libdir_separator_CXX=':' + link_all_deplibs_CXX=yes + export_dynamic_flag_spec_CXX='${wl}-Bexport' + runpath_var='LD_RUN_PATH' + + case $cc_basename in + CC*) + archive_cmds_CXX='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_CXX='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + archive_cmds_CXX='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_CXX='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac ;; tandem*) case $cc_basename in @@ -10883,7 +11084,7 @@ if { (eval echo "$as_me:$LINENO: \"$ac_c # The `*' in the case matches for architectures that use `case' in # $output_verbose_cmd can trigger glob expansion during the loop # eval without this substitution. - output_verbose_link_cmd="`$echo \"X$output_verbose_link_cmd\" | $Xsed -e \"$no_glob_subst\"`" + output_verbose_link_cmd=`$echo "X$output_verbose_link_cmd" | $Xsed -e "$no_glob_subst"` for p in `eval $output_verbose_link_cmd`; do case $p in @@ -10959,6 +11160,29 @@ fi $rm -f confest.$objext +# PORTME: override above test on systems where it is broken +case $host_os in +interix3*) + # Interix 3.5 installs completely hosed .la files for C++, so rather than + # hack all around it, let's just trust "g++" to DTRT. + predep_objects_CXX= + postdep_objects_CXX= + postdeps_CXX= + ;; + +solaris*) + case $cc_basename in + CC*) + # Adding this requires a known-good setup of shared libraries for + # Sun compiler versions before 5.6, else PIC objects from an old + # archive will be linked into the output, leading to subtle bugs. + postdeps_CXX='-lCstd -lCrun' + ;; + esac + ;; +esac + + case " $postdeps_CXX " in *" -lc "*) archive_cmds_need_lc_CXX=no ;; esac @@ -11006,6 +11230,10 @@ echo $ECHO_N "checking for $compiler opt # DJGPP does not support shared libraries at all lt_prog_compiler_pic_CXX= ;; + interix3*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; sysv4*MP*) if test -d /usr/nec; then lt_prog_compiler_pic_CXX=-Kconform_pic @@ -11014,7 +11242,7 @@ echo $ECHO_N "checking for $compiler opt hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. - case "$host_cpu" in + case $host_cpu in hppa*64*|ia64*) ;; *) @@ -11075,15 +11303,15 @@ echo $ECHO_N "checking for $compiler opt case $cc_basename in CC*) lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_static_CXX="${ac_cv_prog_cc_wl}-a ${ac_cv_prog_cc_wl}archive" + lt_prog_compiler_static_CXX='${wl}-a ${wl}archive' if test "$host_cpu" != ia64; then lt_prog_compiler_pic_CXX='+Z' fi ;; aCC*) lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_static_CXX="${ac_cv_prog_cc_wl}-a ${ac_cv_prog_cc_wl}archive" - case "$host_cpu" in + lt_prog_compiler_static_CXX='${wl}-a ${wl}archive' + case $host_cpu in hppa*64*|ia64*) # +Z the default ;; @@ -11096,6 +11324,10 @@ echo $ECHO_N "checking for $compiler opt ;; esac ;; + interix*) + # This is c89, which is MS Visual C++ (no shared libs) + # Anyone wants to do a port? + ;; irix5* | irix6* | nonstopux*) case $cc_basename in CC*) @@ -11124,7 +11356,7 @@ echo $ECHO_N "checking for $compiler opt # Portland Group C++ compiler. lt_prog_compiler_wl_CXX='-Wl,' lt_prog_compiler_pic_CXX='-fpic' - lt_prog_compiler_static_CXX='-static' + lt_prog_compiler_static_CXX='-Bstatic' ;; cxx*) # Compaq C++ @@ -11175,15 +11407,6 @@ echo $ECHO_N "checking for $compiler opt ;; psos*) ;; - sco*) - case $cc_basename in - CC*) - lt_prog_compiler_pic_CXX='-fPIC' - ;; - *) - ;; - esac - ;; solaris*) case $cc_basename in CC*) @@ -11225,7 +11448,14 @@ echo $ECHO_N "checking for $compiler opt ;; esac ;; - unixware*) + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + case $cc_basename in + CC*) + lt_prog_compiler_wl_CXX='-Wl,' + lt_prog_compiler_pic_CXX='-KPIC' + lt_prog_compiler_static_CXX='-Bstatic' + ;; + esac ;; vxworks*) ;; @@ -11258,20 +11488,20 @@ else # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:11264: $lt_compile\"" >&5) + (eval echo "\"\$as_me:11494: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 - echo "$as_me:11268: \$? = $ac_status" >&5 + echo "$as_me:11498: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. - $echo "X$_lt_compiler_boilerplate" | $Xsed >conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if test ! -s conftest.err || diff conftest.exp conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works_CXX=yes fi fi @@ -11292,7 +11522,7 @@ else fi fi -case "$host_os" in +case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic_CXX= @@ -11302,6 +11532,48 @@ case "$host_os" in ;; esac +# +# Check to make sure the static flag actually works. +# +wl=$lt_prog_compiler_wl_CXX eval lt_tmp_static_flag=\"$lt_prog_compiler_static_CXX\" +echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 +echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 +if test "${lt_prog_compiler_static_works_CXX+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + lt_prog_compiler_static_works_CXX=no + save_LDFLAGS="$LDFLAGS" + LDFLAGS="$LDFLAGS $lt_tmp_static_flag" + printf "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&5 + $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + lt_prog_compiler_static_works_CXX=yes + fi + else + lt_prog_compiler_static_works_CXX=yes + fi + fi + $rm conftest* + LDFLAGS="$save_LDFLAGS" + +fi +echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works_CXX" >&5 +echo "${ECHO_T}$lt_prog_compiler_static_works_CXX" >&6 + +if test x"$lt_prog_compiler_static_works_CXX" = xyes; then + : +else + lt_prog_compiler_static_CXX= +fi + + echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o_CXX+set}" = set; then @@ -11320,25 +11592,25 @@ else # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:11326: $lt_compile\"" >&5) + (eval echo "\"\$as_me:11598: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 - echo "$as_me:11330: \$? = $ac_status" >&5 + echo "$as_me:11602: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings - $echo "X$_lt_compiler_boilerplate" | $Xsed > out/conftest.exp - $SED '/^$/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.err || diff out/conftest.exp out/conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o_CXX=yes fi fi - chmod u+w . + chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation @@ -11404,11 +11676,6 @@ echo "$as_me:$LINENO: result: $ld_shlibs echo "${ECHO_T}$ld_shlibs_CXX" >&6 test "$ld_shlibs_CXX" = no && can_build_shared=no -variables_saved_for_relink="PATH $shlibpath_var $runpath_var" -if test "$GCC" = yes; then - variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" -fi - # # Do we need to explicitly link libc? # @@ -11441,6 +11708,7 @@ echo $ECHO_N "checking whether -lc shoul libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl_CXX + pic_flag=$lt_prog_compiler_pic_CXX compiler_flags=-v linker_flags=-v verstring= @@ -11601,7 +11869,8 @@ cygwin* | mingw* | pw32*) dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname' + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' @@ -11654,7 +11923,7 @@ darwin* | rhapsody*) soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH - shrext_cmds='$(test .$module = .yes && echo .so || echo .dylib)' + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` @@ -11692,7 +11961,14 @@ kfreebsd*-gnu) freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. - objformat=`test -x /usr/bin/objformat && /usr/bin/objformat || echo aout` + if test -x /usr/bin/objformat; then + objformat=`/usr/bin/objformat` + else + case $host_os in + freebsd[123]*) objformat=aout ;; + *) objformat=elf ;; + esac + fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) @@ -11714,10 +11990,15 @@ freebsd* | dragonfly*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; - *) # from 3.2 on + freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ + freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; + freebsd*) # from 4.6 on + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; esac ;; @@ -11737,7 +12018,7 @@ hpux9* | hpux10* | hpux11*) version_type=sunos need_lib_prefix=no need_version=no - case "$host_cpu" in + case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes @@ -11777,6 +12058,18 @@ hpux9* | hpux10* | hpux11*) postinstall_cmds='chmod 555 $lib' ;; +interix3*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; + irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; @@ -11834,31 +12127,10 @@ linux*) # before this can be enabled. hardcode_into_libs=yes - # find out which ABI we are using - libsuff= - case "$host_cpu" in - x86_64*|s390x*|powerpc64*) - echo '#line 11841 "configure"' > conftest.$ac_ext - if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; then - case `/usr/bin/file conftest.$ac_objext` in - *64-bit*) - libsuff=64 - sys_lib_search_path_spec="/lib${libsuff} /usr/lib${libsuff} /usr/local/lib${libsuff}" - ;; - esac - fi - rm -rf conftest* - ;; - esac - # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then - lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:,\t]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` - sys_lib_dlsearch_path_spec="/lib${libsuff} /usr/lib${libsuff} $lt_ld_extra" + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` + sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on @@ -11919,8 +12191,13 @@ nto-qnx*) openbsd*) version_type=sunos + sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no - need_version=no + # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. + case $host_os in + openbsd3.3 | openbsd3.3.*) need_version=yes ;; + *) need_version=no ;; + esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH @@ -11958,13 +12235,6 @@ osf3* | osf4* | osf5*) sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; -sco3.2v5*) - version_type=osf - soname_spec='${libname}${release}${shared_ext}$major' - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - ;; - solaris*) version_type=linux need_lib_prefix=no @@ -11990,7 +12260,7 @@ sunos4*) need_version=yes ;; -sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) +sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' @@ -12023,6 +12293,29 @@ sysv4*MP*) fi ;; +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + version_type=freebsd-elf + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + if test "$with_gnu_ld" = yes; then + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' + shlibpath_overrides_runpath=no + else + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' + shlibpath_overrides_runpath=yes + case $host_os in + sco3.2v5*) + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" + ;; + esac + fi + sys_lib_dlsearch_path_spec='/usr/lib' + ;; + uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' @@ -12038,6 +12331,11 @@ echo "$as_me:$LINENO: result: $dynamic_l echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" +if test "$GCC" = yes; then + variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" +fi + echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action_CXX= @@ -12075,1339 +12373,995 @@ elif test "$shlibpath_overrides_runpath" enable_fast_install=needless fi -striplib= -old_striplib= -echo "$as_me:$LINENO: checking whether stripping libraries is possible" >&5 -echo $ECHO_N "checking whether stripping libraries is possible... $ECHO_C" >&6 -if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; then - test -z "$old_striplib" && old_striplib="$STRIP --strip-debug" - test -z "$striplib" && striplib="$STRIP --strip-unneeded" - echo "$as_me:$LINENO: result: yes" >&5 -echo "${ECHO_T}yes" >&6 -else -# FIXME - insert some real tests, host_os isn't really good enough - case $host_os in - darwin*) - if test -n "$STRIP" ; then - striplib="$STRIP -x" - echo "$as_me:$LINENO: result: yes" >&5 -echo "${ECHO_T}yes" >&6 - else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - ;; - *) - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 + +# The else clause should only fire when bootstrapping the +# libtool distribution, otherwise you forgot to ship ltmain.sh +# with your package, and you will get complaints that there are +# no rules to generate ltmain.sh. +if test -f "$ltmain"; then + # See if we are running on zsh, and set the options which allow our commands through + # without removal of \ escapes. + if test -n "${ZSH_VERSION+set}" ; then + setopt NO_GLOB_SUBST + fi + # Now quote all the things that may contain metacharacters while being + # careful not to overquote the AC_SUBSTed values. We take copies of the + # variables and quote the copies for generation of the libtool script. + for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ + SED SHELL STRIP \ + libname_spec library_names_spec soname_spec extract_expsyms_cmds \ + old_striplib striplib file_magic_cmd finish_cmds finish_eval \ + deplibs_check_method reload_flag reload_cmds need_locks \ + lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ + lt_cv_sys_global_symbol_to_c_name_address \ + sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ + old_postinstall_cmds old_postuninstall_cmds \ + compiler_CXX \ + CC_CXX \ + LD_CXX \ + lt_prog_compiler_wl_CXX \ + lt_prog_compiler_pic_CXX \ + lt_prog_compiler_static_CXX \ + lt_prog_compiler_no_builtin_flag_CXX \ + export_dynamic_flag_spec_CXX \ + thread_safe_flag_spec_CXX \ + whole_archive_flag_spec_CXX \ + enable_shared_with_static_runtimes_CXX \ + old_archive_cmds_CXX \ + old_archive_from_new_cmds_CXX \ + predep_objects_CXX \ + postdep_objects_CXX \ + predeps_CXX \ + postdeps_CXX \ + compiler_lib_search_path_CXX \ + archive_cmds_CXX \ + archive_expsym_cmds_CXX \ + postinstall_cmds_CXX \ + postuninstall_cmds_CXX \ + old_archive_from_expsyms_cmds_CXX \ + allow_undefined_flag_CXX \ + no_undefined_flag_CXX \ + export_symbols_cmds_CXX \ + hardcode_libdir_flag_spec_CXX \ + hardcode_libdir_flag_spec_ld_CXX \ + hardcode_libdir_separator_CXX \ + hardcode_automatic_CXX \ + module_cmds_CXX \ + module_expsym_cmds_CXX \ + lt_cv_prog_compiler_c_o_CXX \ + exclude_expsyms_CXX \ + include_expsyms_CXX; do + + case $var in + old_archive_cmds_CXX | \ + old_archive_from_new_cmds_CXX | \ + archive_cmds_CXX | \ + archive_expsym_cmds_CXX | \ + module_cmds_CXX | \ + module_expsym_cmds_CXX | \ + old_archive_from_expsyms_cmds_CXX | \ + export_symbols_cmds_CXX | \ + extract_expsyms_cmds | reload_cmds | finish_cmds | \ + postinstall_cmds | postuninstall_cmds | \ + old_postinstall_cmds | old_postuninstall_cmds | \ + sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) + # Double-quote double-evaled strings. + eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" + ;; + *) + eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" + ;; + esac + done + + case $lt_echo in + *'\$0 --fallback-echo"') + lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` ;; esac -fi -if test "x$enable_dlopen" != xyes; then - enable_dlopen=unknown - enable_dlopen_self=unknown - enable_dlopen_self_static=unknown -else - lt_cv_dlopen=no - lt_cv_dlopen_libs= +cfgfile="$ofile" - case $host_os in - beos*) - lt_cv_dlopen="load_add_on" - lt_cv_dlopen_libs= - lt_cv_dlopen_self=yes - ;; + cat <<__EOF__ >> "$cfgfile" +# ### BEGIN LIBTOOL TAG CONFIG: $tagname - mingw* | pw32*) - lt_cv_dlopen="LoadLibrary" - lt_cv_dlopen_libs= - ;; +# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: - cygwin*) - lt_cv_dlopen="dlopen" - lt_cv_dlopen_libs= - ;; +# Shell to use when invoking shell scripts. +SHELL=$lt_SHELL - darwin*) - # if libdl is installed we need to link against it - echo "$as_me:$LINENO: checking for dlopen in -ldl" >&5 -echo $ECHO_N "checking for dlopen in -ldl... $ECHO_C" >&6 -if test "${ac_cv_lib_dl_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldl $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ +# Whether or not to build shared libraries. +build_libtool_libs=$enable_shared -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -int -main () -{ -dlopen (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dl_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# Whether or not to build static libraries. +build_old_libs=$enable_static -ac_cv_lib_dl_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dl_dlopen" >&5 -echo "${ECHO_T}$ac_cv_lib_dl_dlopen" >&6 -if test $ac_cv_lib_dl_dlopen = yes; then - lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl" -else +# Whether or not to add -lc for building shared libraries. +build_libtool_need_lc=$archive_cmds_need_lc_CXX - lt_cv_dlopen="dyld" - lt_cv_dlopen_libs= - lt_cv_dlopen_self=yes +# Whether or not to disallow shared libs when runtime libs are static +allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_CXX -fi +# Whether or not to optimize for fast installation. +fast_install=$enable_fast_install - ;; +# The host system. +host_alias=$host_alias +host=$host +host_os=$host_os - *) - echo "$as_me:$LINENO: checking for shl_load" >&5 -echo $ECHO_N "checking for shl_load... $ECHO_C" >&6 -if test "${ac_cv_func_shl_load+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ -/* Define shl_load to an innocuous variant, in case declares shl_load. - For example, HP-UX 11i declares gettimeofday. */ -#define shl_load innocuous_shl_load +# The build system. +build_alias=$build_alias +build=$build +build_os=$build_os -/* System header to define __stub macros and hopefully few prototypes, - which can conflict with char shl_load (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ +# An echo program that does not interpret backslashes. +echo=$lt_echo -#ifdef __STDC__ -# include -#else -# include -#endif +# The archiver. +AR=$lt_AR +AR_FLAGS=$lt_AR_FLAGS -#undef shl_load +# A C compiler. +LTCC=$lt_LTCC -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -{ -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char shl_load (); -/* The GNU C library defines this for functions which it implements - to always fail with ENOSYS. Some functions are actually named - something starting with __ and the normal name is an alias. */ -#if defined (__stub_shl_load) || defined (__stub___shl_load) -choke me -#else -char (*f) () = shl_load; -#endif -#ifdef __cplusplus -} -#endif +# LTCC compiler flags. +LTCFLAGS=$lt_LTCFLAGS -int -main () -{ -return f != shl_load; - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_func_shl_load=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# A language-specific compiler. +CC=$lt_compiler_CXX -ac_cv_func_shl_load=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -fi -echo "$as_me:$LINENO: result: $ac_cv_func_shl_load" >&5 -echo "${ECHO_T}$ac_cv_func_shl_load" >&6 -if test $ac_cv_func_shl_load = yes; then - lt_cv_dlopen="shl_load" -else - echo "$as_me:$LINENO: checking for shl_load in -ldld" >&5 -echo $ECHO_N "checking for shl_load in -ldld... $ECHO_C" >&6 -if test "${ac_cv_lib_dld_shl_load+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldld $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ +# Is the compiler the GNU C compiler? +with_gcc=$GCC_CXX -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char shl_load (); -int -main () -{ -shl_load (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dld_shl_load=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# An ERE matcher. +EGREP=$lt_EGREP -ac_cv_lib_dld_shl_load=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dld_shl_load" >&5 -echo "${ECHO_T}$ac_cv_lib_dld_shl_load" >&6 -if test $ac_cv_lib_dld_shl_load = yes; then - lt_cv_dlopen="shl_load" lt_cv_dlopen_libs="-dld" -else - echo "$as_me:$LINENO: checking for dlopen" >&5 -echo $ECHO_N "checking for dlopen... $ECHO_C" >&6 -if test "${ac_cv_func_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ -/* Define dlopen to an innocuous variant, in case declares dlopen. - For example, HP-UX 11i declares gettimeofday. */ -#define dlopen innocuous_dlopen +# The linker used to build libraries. +LD=$lt_LD_CXX -/* System header to define __stub macros and hopefully few prototypes, - which can conflict with char dlopen (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ +# Whether we need hard or soft links. +LN_S=$lt_LN_S -#ifdef __STDC__ -# include -#else -# include -#endif +# A BSD-compatible nm program. +NM=$lt_NM -#undef dlopen +# A symbol stripping program +STRIP=$lt_STRIP -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -{ -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -/* The GNU C library defines this for functions which it implements - to always fail with ENOSYS. Some functions are actually named - something starting with __ and the normal name is an alias. */ -#if defined (__stub_dlopen) || defined (__stub___dlopen) -choke me -#else -char (*f) () = dlopen; -#endif -#ifdef __cplusplus -} -#endif +# Used to examine libraries when file_magic_cmd begins "file" +MAGIC_CMD=$MAGIC_CMD -int -main () -{ -return f != dlopen; - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_func_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# Used on cygwin: DLL creation program. +DLLTOOL="$DLLTOOL" -ac_cv_func_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -fi -echo "$as_me:$LINENO: result: $ac_cv_func_dlopen" >&5 -echo "${ECHO_T}$ac_cv_func_dlopen" >&6 -if test $ac_cv_func_dlopen = yes; then - lt_cv_dlopen="dlopen" -else - echo "$as_me:$LINENO: checking for dlopen in -ldl" >&5 -echo $ECHO_N "checking for dlopen in -ldl... $ECHO_C" >&6 -if test "${ac_cv_lib_dl_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldl $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ +# Used on cygwin: object dumper. +OBJDUMP="$OBJDUMP" -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -int -main () -{ -dlopen (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dl_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# Used on cygwin: assembler. +AS="$AS" -ac_cv_lib_dl_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dl_dlopen" >&5 -echo "${ECHO_T}$ac_cv_lib_dl_dlopen" >&6 -if test $ac_cv_lib_dl_dlopen = yes; then - lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl" -else - echo "$as_me:$LINENO: checking for dlopen in -lsvld" >&5 -echo $ECHO_N "checking for dlopen in -lsvld... $ECHO_C" >&6 -if test "${ac_cv_lib_svld_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsvld $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ +# The name of the directory that contains temporary libtool files. +objdir=$objdir -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -int -main () -{ -dlopen (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_svld_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# How to create reloadable object files. +reload_flag=$lt_reload_flag +reload_cmds=$lt_reload_cmds -ac_cv_lib_svld_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_svld_dlopen" >&5 -echo "${ECHO_T}$ac_cv_lib_svld_dlopen" >&6 -if test $ac_cv_lib_svld_dlopen = yes; then - lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-lsvld" -else - echo "$as_me:$LINENO: checking for dld_link in -ldld" >&5 -echo $ECHO_N "checking for dld_link in -ldld... $ECHO_C" >&6 -if test "${ac_cv_lib_dld_dld_link+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldld $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ +# How to pass a linker flag through the compiler. +wl=$lt_lt_prog_compiler_wl_CXX -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dld_link (); -int -main () -{ -dld_link (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_cxx_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dld_dld_link=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 +# Object file suffix (normally "o"). +objext="$ac_objext" -ac_cv_lib_dld_dld_link=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dld_dld_link" >&5 -echo "${ECHO_T}$ac_cv_lib_dld_dld_link" >&6 -if test $ac_cv_lib_dld_dld_link = yes; then - lt_cv_dlopen="dld_link" lt_cv_dlopen_libs="-dld" -fi +# Old archive suffix (normally "a"). +libext="$libext" +# Shared library suffix (normally ".so"). +shrext_cmds='$shrext_cmds' -fi +# Executable file suffix (normally ""). +exeext="$exeext" +# Additional compiler flags for building library objects. +pic_flag=$lt_lt_prog_compiler_pic_CXX +pic_mode=$pic_mode -fi +# What is the maximum length of a command? +max_cmd_len=$lt_cv_sys_max_cmd_len +# Does compiler simultaneously support -c and -o options? +compiler_c_o=$lt_lt_cv_prog_compiler_c_o_CXX -fi +# Must we lock files when doing compilation? +need_locks=$lt_need_locks +# Do we need the lib prefix for modules? +need_lib_prefix=$need_lib_prefix -fi +# Do we need a version for libraries? +need_version=$need_version +# Whether dlopen is supported. +dlopen_support=$enable_dlopen -fi +# Whether dlopen of programs is supported. +dlopen_self=$enable_dlopen_self - ;; - esac +# Whether dlopen of statically linked programs is supported. +dlopen_self_static=$enable_dlopen_self_static - if test "x$lt_cv_dlopen" != xno; then - enable_dlopen=yes - else - enable_dlopen=no - fi +# Compiler flag to prevent dynamic linking. +link_static_flag=$lt_lt_prog_compiler_static_CXX - case $lt_cv_dlopen in - dlopen) - save_CPPFLAGS="$CPPFLAGS" - test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" +# Compiler flag to turn off builtin functions. +no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_CXX - save_LDFLAGS="$LDFLAGS" - eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" +# Compiler flag to allow reflexive dlopens. +export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_CXX - save_LIBS="$LIBS" - LIBS="$lt_cv_dlopen_libs $LIBS" +# Compiler flag to generate shared objects directly from archives. +whole_archive_flag_spec=$lt_whole_archive_flag_spec_CXX - echo "$as_me:$LINENO: checking whether a program can dlopen itself" >&5 -echo $ECHO_N "checking whether a program can dlopen itself... $ECHO_C" >&6 -if test "${lt_cv_dlopen_self+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test "$cross_compiling" = yes; then : - lt_cv_dlopen_self=cross -else - lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 - lt_status=$lt_dlunknown - cat > conftest.$ac_ext < -#endif +# Library versioning type. +version_type=$version_type -#include +# Format of library name prefix. +libname_spec=$lt_libname_spec -#ifdef RTLD_GLOBAL -# define LT_DLGLOBAL RTLD_GLOBAL -#else -# ifdef DL_GLOBAL -# define LT_DLGLOBAL DL_GLOBAL -# else -# define LT_DLGLOBAL 0 -# endif -#endif +# List of archive names. First name is the real one, the rest are links. +# The last name is the one that the linker finds with -lNAME. +library_names_spec=$lt_library_names_spec -/* We may have to define LT_DLLAZY_OR_NOW in the command line if we - find out it does not work in some platform. */ -#ifndef LT_DLLAZY_OR_NOW -# ifdef RTLD_LAZY -# define LT_DLLAZY_OR_NOW RTLD_LAZY -# else -# ifdef DL_LAZY -# define LT_DLLAZY_OR_NOW DL_LAZY -# else -# ifdef RTLD_NOW -# define LT_DLLAZY_OR_NOW RTLD_NOW -# else -# ifdef DL_NOW -# define LT_DLLAZY_OR_NOW DL_NOW -# else -# define LT_DLLAZY_OR_NOW 0 -# endif -# endif -# endif -# endif -#endif +# The coded name of the library, if different from the real name. +soname_spec=$lt_soname_spec -#ifdef __cplusplus -extern "C" void exit (int); -#endif +# Commands used to build and install an old-style archive. +RANLIB=$lt_RANLIB +old_archive_cmds=$lt_old_archive_cmds_CXX +old_postinstall_cmds=$lt_old_postinstall_cmds +old_postuninstall_cmds=$lt_old_postuninstall_cmds -void fnord() { int i=42;} -int main () -{ - void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); - int status = $lt_dlunknown; +# Create an old-style archive from a shared archive. +old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_CXX - if (self) - { - if (dlsym (self,"fnord")) status = $lt_dlno_uscore; - else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; - /* dlclose (self); */ - } +# Create a temporary old-style archive to link instead of a shared archive. +old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_CXX - exit (status); -} -EOF - if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then - (./conftest; exit; ) 2>/dev/null - lt_status=$? - case x$lt_status in - x$lt_dlno_uscore) lt_cv_dlopen_self=yes ;; - x$lt_dlneed_uscore) lt_cv_dlopen_self=yes ;; - x$lt_unknown|x*) lt_cv_dlopen_self=no ;; - esac - else : - # compilation failed - lt_cv_dlopen_self=no - fi -fi -rm -fr conftest* +# Commands used to build and install a shared archive. +archive_cmds=$lt_archive_cmds_CXX +archive_expsym_cmds=$lt_archive_expsym_cmds_CXX +postinstall_cmds=$lt_postinstall_cmds +postuninstall_cmds=$lt_postuninstall_cmds +# Commands used to build a loadable module (assumed same as above if empty) +module_cmds=$lt_module_cmds_CXX +module_expsym_cmds=$lt_module_expsym_cmds_CXX -fi -echo "$as_me:$LINENO: result: $lt_cv_dlopen_self" >&5 -echo "${ECHO_T}$lt_cv_dlopen_self" >&6 +# Commands to strip libraries. +old_striplib=$lt_old_striplib +striplib=$lt_striplib - if test "x$lt_cv_dlopen_self" = xyes; then - LDFLAGS="$LDFLAGS $link_static_flag" - echo "$as_me:$LINENO: checking whether a statically linked program can dlopen itself" >&5 -echo $ECHO_N "checking whether a statically linked program can dlopen itself... $ECHO_C" >&6 -if test "${lt_cv_dlopen_self_static+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test "$cross_compiling" = yes; then : - lt_cv_dlopen_self_static=cross -else - lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 - lt_status=$lt_dlunknown - cat > conftest.$ac_ext < -#endif +# Dependencies to place after the objects being linked to create a +# shared library. +postdep_objects=$lt_postdep_objects_CXX -#include +# Dependencies to place before the objects being linked to create a +# shared library. +predeps=$lt_predeps_CXX -#ifdef RTLD_GLOBAL -# define LT_DLGLOBAL RTLD_GLOBAL -#else -# ifdef DL_GLOBAL -# define LT_DLGLOBAL DL_GLOBAL -# else -# define LT_DLGLOBAL 0 -# endif -#endif +# Dependencies to place after the objects being linked to create a +# shared library. +postdeps=$lt_postdeps_CXX -/* We may have to define LT_DLLAZY_OR_NOW in the command line if we - find out it does not work in some platform. */ -#ifndef LT_DLLAZY_OR_NOW -# ifdef RTLD_LAZY -# define LT_DLLAZY_OR_NOW RTLD_LAZY -# else -# ifdef DL_LAZY -# define LT_DLLAZY_OR_NOW DL_LAZY -# else -# ifdef RTLD_NOW -# define LT_DLLAZY_OR_NOW RTLD_NOW -# else -# ifdef DL_NOW -# define LT_DLLAZY_OR_NOW DL_NOW -# else -# define LT_DLLAZY_OR_NOW 0 -# endif -# endif -# endif -# endif -#endif +# The library search path used internally by the compiler when linking +# a shared library. +compiler_lib_search_path=$lt_compiler_lib_search_path_CXX -#ifdef __cplusplus -extern "C" void exit (int); -#endif +# Method to check whether dependent libraries are shared objects. +deplibs_check_method=$lt_deplibs_check_method -void fnord() { int i=42;} -int main () -{ - void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); - int status = $lt_dlunknown; +# Command to use when deplibs_check_method == file_magic. +file_magic_cmd=$lt_file_magic_cmd - if (self) - { - if (dlsym (self,"fnord")) status = $lt_dlno_uscore; - else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; - /* dlclose (self); */ - } +# Flag that allows shared libraries with undefined symbols to be built. +allow_undefined_flag=$lt_allow_undefined_flag_CXX - exit (status); -} -EOF - if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then - (./conftest; exit; ) 2>/dev/null - lt_status=$? - case x$lt_status in - x$lt_dlno_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_dlneed_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_unknown|x*) lt_cv_dlopen_self_static=no ;; - esac - else : - # compilation failed - lt_cv_dlopen_self_static=no - fi -fi -rm -fr conftest* +# Flag that forces no undefined symbols. +no_undefined_flag=$lt_no_undefined_flag_CXX +# Commands used to finish a libtool library installation in a directory. +finish_cmds=$lt_finish_cmds -fi -echo "$as_me:$LINENO: result: $lt_cv_dlopen_self_static" >&5 -echo "${ECHO_T}$lt_cv_dlopen_self_static" >&6 - fi +# Same as above, but a single script fragment to be evaled but not shown. +finish_eval=$lt_finish_eval - CPPFLAGS="$save_CPPFLAGS" - LDFLAGS="$save_LDFLAGS" - LIBS="$save_LIBS" - ;; - esac +# Take the output of nm and produce a listing of raw symbols and C names. +global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe - case $lt_cv_dlopen_self in - yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; - *) enable_dlopen_self=unknown ;; - esac +# Transform the output of nm in a proper C declaration +global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl - case $lt_cv_dlopen_self_static in - yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; - *) enable_dlopen_self_static=unknown ;; - esac -fi +# Transform the output of nm in a C name address pair +global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address +# This is the shared library runtime path variable. +runpath_var=$runpath_var -# The else clause should only fire when bootstrapping the -# libtool distribution, otherwise you forgot to ship ltmain.sh -# with your package, and you will get complaints that there are -# no rules to generate ltmain.sh. -if test -f "$ltmain"; then - # See if we are running on zsh, and set the options which allow our commands through - # without removal of \ escapes. - if test -n "${ZSH_VERSION+set}" ; then - setopt NO_GLOB_SUBST - fi - # Now quote all the things that may contain metacharacters while being - # careful not to overquote the AC_SUBSTed values. We take copies of the - # variables and quote the copies for generation of the libtool script. - for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC NM \ - SED SHELL STRIP \ - libname_spec library_names_spec soname_spec extract_expsyms_cmds \ - old_striplib striplib file_magic_cmd finish_cmds finish_eval \ - deplibs_check_method reload_flag reload_cmds need_locks \ - lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ - lt_cv_sys_global_symbol_to_c_name_address \ - sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ - old_postinstall_cmds old_postuninstall_cmds \ - compiler_CXX \ - CC_CXX \ - LD_CXX \ - lt_prog_compiler_wl_CXX \ - lt_prog_compiler_pic_CXX \ - lt_prog_compiler_static_CXX \ - lt_prog_compiler_no_builtin_flag_CXX \ - export_dynamic_flag_spec_CXX \ - thread_safe_flag_spec_CXX \ - whole_archive_flag_spec_CXX \ - enable_shared_with_static_runtimes_CXX \ - old_archive_cmds_CXX \ - old_archive_from_new_cmds_CXX \ - predep_objects_CXX \ - postdep_objects_CXX \ - predeps_CXX \ - postdeps_CXX \ - compiler_lib_search_path_CXX \ - archive_cmds_CXX \ - archive_expsym_cmds_CXX \ - postinstall_cmds_CXX \ - postuninstall_cmds_CXX \ - old_archive_from_expsyms_cmds_CXX \ - allow_undefined_flag_CXX \ - no_undefined_flag_CXX \ - export_symbols_cmds_CXX \ - hardcode_libdir_flag_spec_CXX \ - hardcode_libdir_flag_spec_ld_CXX \ - hardcode_libdir_separator_CXX \ - hardcode_automatic_CXX \ - module_cmds_CXX \ - module_expsym_cmds_CXX \ - lt_cv_prog_compiler_c_o_CXX \ - exclude_expsyms_CXX \ - include_expsyms_CXX; do - - case $var in - old_archive_cmds_CXX | \ - old_archive_from_new_cmds_CXX | \ - archive_cmds_CXX | \ - archive_expsym_cmds_CXX | \ - module_cmds_CXX | \ - module_expsym_cmds_CXX | \ - old_archive_from_expsyms_cmds_CXX | \ - export_symbols_cmds_CXX | \ - extract_expsyms_cmds | reload_cmds | finish_cmds | \ - postinstall_cmds | postuninstall_cmds | \ - old_postinstall_cmds | old_postuninstall_cmds | \ - sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) - # Double-quote double-evaled strings. - eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e \"\$delay_variable_subst\"\`\\\"" - ;; - *) - eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e \"\$sed_quote_subst\"\`\\\"" - ;; - esac - done - - case $lt_echo in - *'\$0 --fallback-echo"') - lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\$0 --fallback-echo"$/$0 --fallback-echo"/'` - ;; - esac +# This is the shared library path variable. +shlibpath_var=$shlibpath_var -cfgfile="$ofile" +# Is shlibpath searched before the hard-coded library search path? +shlibpath_overrides_runpath=$shlibpath_overrides_runpath - cat <<__EOF__ >> "$cfgfile" -# ### BEGIN LIBTOOL TAG CONFIG: $tagname +# How to hardcode a shared library path into an executable. +hardcode_action=$hardcode_action_CXX -# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: +# Whether we should hardcode library paths into libraries. +hardcode_into_libs=$hardcode_into_libs -# Shell to use when invoking shell scripts. -SHELL=$lt_SHELL +# Flag to hardcode \$libdir into a binary during linking. +# This must work even if \$libdir does not exist. +hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_CXX -# Whether or not to build shared libraries. -build_libtool_libs=$enable_shared +# If ld is used when linking, flag to hardcode \$libdir into +# a binary during linking. This must work even if \$libdir does +# not exist. +hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_CXX -# Whether or not to build static libraries. -build_old_libs=$enable_static +# Whether we need a single -rpath flag with a separated argument. +hardcode_libdir_separator=$lt_hardcode_libdir_separator_CXX -# Whether or not to add -lc for building shared libraries. -build_libtool_need_lc=$archive_cmds_need_lc_CXX +# Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the +# resulting binary. +hardcode_direct=$hardcode_direct_CXX -# Whether or not to disallow shared libs when runtime libs are static -allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_CXX +# Set to yes if using the -LDIR flag during linking hardcodes DIR into the +# resulting binary. +hardcode_minus_L=$hardcode_minus_L_CXX -# Whether or not to optimize for fast installation. -fast_install=$enable_fast_install +# Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into +# the resulting binary. +hardcode_shlibpath_var=$hardcode_shlibpath_var_CXX -# The host system. -host_alias=$host_alias -host=$host -host_os=$host_os +# Set to yes if building a shared library automatically hardcodes DIR into the library +# and all subsequent libraries and executables linked against it. +hardcode_automatic=$hardcode_automatic_CXX -# The build system. -build_alias=$build_alias -build=$build -build_os=$build_os +# Variables whose values should be saved in libtool wrapper scripts and +# restored at relink time. +variables_saved_for_relink="$variables_saved_for_relink" -# An echo program that does not interpret backslashes. -echo=$lt_echo +# Whether libtool must link a program against all its dependency libraries. +link_all_deplibs=$link_all_deplibs_CXX -# The archiver. -AR=$lt_AR -AR_FLAGS=$lt_AR_FLAGS +# Compile-time system search path for libraries +sys_lib_search_path_spec=$lt_sys_lib_search_path_spec -# A C compiler. -LTCC=$lt_LTCC +# Run-time system search path for libraries +sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec -# A language-specific compiler. -CC=$lt_compiler_CXX +# Fix the shell variable \$srcfile for the compiler. +fix_srcfile_path="$fix_srcfile_path_CXX" -# Is the compiler the GNU C compiler? -with_gcc=$GCC_CXX +# Set to yes if exported symbols are required. +always_export_symbols=$always_export_symbols_CXX -# An ERE matcher. -EGREP=$lt_EGREP +# The commands to list exported symbols. +export_symbols_cmds=$lt_export_symbols_cmds_CXX -# The linker used to build libraries. -LD=$lt_LD_CXX +# The commands to extract the exported symbol list from a shared archive. +extract_expsyms_cmds=$lt_extract_expsyms_cmds -# Whether we need hard or soft links. -LN_S=$lt_LN_S +# Symbols that should not be listed in the preloaded symbols. +exclude_expsyms=$lt_exclude_expsyms_CXX -# A BSD-compatible nm program. -NM=$lt_NM +# Symbols that must always be exported. +include_expsyms=$lt_include_expsyms_CXX -# A symbol stripping program -STRIP=$lt_STRIP +# ### END LIBTOOL TAG CONFIG: $tagname -# Used to examine libraries when file_magic_cmd begins "file" -MAGIC_CMD=$MAGIC_CMD +__EOF__ -# Used on cygwin: DLL creation program. -DLLTOOL="$DLLTOOL" -# Used on cygwin: object dumper. -OBJDUMP="$OBJDUMP" +else + # If there is no Makefile yet, we rely on a make rule to execute + # `config.status --recheck' to rerun these tests and create the + # libtool script then. + ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` + if test -f "$ltmain_in"; then + test -f Makefile && make "$ltmain" + fi +fi -# Used on cygwin: assembler. -AS="$AS" -# The name of the directory that contains temporary libtool files. -objdir=$objdir +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu -# How to create reloadable object files. -reload_flag=$lt_reload_flag -reload_cmds=$lt_reload_cmds +CC=$lt_save_CC +LDCXX=$LD +LD=$lt_save_LD +GCC=$lt_save_GCC +with_gnu_ldcxx=$with_gnu_ld +with_gnu_ld=$lt_save_with_gnu_ld +lt_cv_path_LDCXX=$lt_cv_path_LD +lt_cv_path_LD=$lt_save_path_LD +lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld +lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld -# How to pass a linker flag through the compiler. -wl=$lt_lt_prog_compiler_wl_CXX + else + tagname="" + fi + ;; -# Object file suffix (normally "o"). -objext="$ac_objext" + F77) + if test -n "$F77" && test "X$F77" != "Xno"; then -# Old archive suffix (normally "a"). -libext="$libext" +ac_ext=f +ac_compile='$F77 -c $FFLAGS conftest.$ac_ext >&5' +ac_link='$F77 -o conftest$ac_exeext $FFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_f77_compiler_gnu -# Shared library suffix (normally ".so"). -shrext_cmds='$shrext_cmds' -# Executable file suffix (normally ""). -exeext="$exeext" +archive_cmds_need_lc_F77=no +allow_undefined_flag_F77= +always_export_symbols_F77=no +archive_expsym_cmds_F77= +export_dynamic_flag_spec_F77= +hardcode_direct_F77=no +hardcode_libdir_flag_spec_F77= +hardcode_libdir_flag_spec_ld_F77= +hardcode_libdir_separator_F77= +hardcode_minus_L_F77=no +hardcode_automatic_F77=no +module_cmds_F77= +module_expsym_cmds_F77= +link_all_deplibs_F77=unknown +old_archive_cmds_F77=$old_archive_cmds +no_undefined_flag_F77= +whole_archive_flag_spec_F77= +enable_shared_with_static_runtimes_F77=no -# Additional compiler flags for building library objects. -pic_flag=$lt_lt_prog_compiler_pic_CXX -pic_mode=$pic_mode +# Source file extension for f77 test sources. +ac_ext=f -# What is the maximum length of a command? -max_cmd_len=$lt_cv_sys_max_cmd_len +# Object file extension for compiled f77 test sources. +objext=o +objext_F77=$objext -# Does compiler simultaneously support -c and -o options? -compiler_c_o=$lt_lt_cv_prog_compiler_c_o_CXX +# Code to be used in simple compile tests +lt_simple_compile_test_code=" subroutine t\n return\n end\n" -# Must we lock files when doing compilation? -need_locks=$lt_need_locks +# Code to be used in simple link tests +lt_simple_link_test_code=" program t\n end\n" -# Do we need the lib prefix for modules? -need_lib_prefix=$need_lib_prefix +# ltmain only uses $CC for tagged configurations so make sure $CC is set. -# Do we need a version for libraries? -need_version=$need_version +# If no C compiler was specified, use CC. +LTCC=${LTCC-"$CC"} -# Whether dlopen is supported. -dlopen_support=$enable_dlopen +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} -# Whether dlopen of programs is supported. -dlopen_self=$enable_dlopen_self +# Allow CC to be a program name with arguments. +compiler=$CC -# Whether dlopen of statically linked programs is supported. -dlopen_self_static=$enable_dlopen_self_static -# Compiler flag to prevent dynamic linking. -link_static_flag=$lt_lt_prog_compiler_static_CXX +# save warnings/boilerplate of simple test code +ac_outfile=conftest.$ac_objext +printf "$lt_simple_compile_test_code" >conftest.$ac_ext +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err +_lt_compiler_boilerplate=`cat conftest.err` +$rm conftest* -# Compiler flag to turn off builtin functions. -no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_CXX +ac_outfile=conftest.$ac_objext +printf "$lt_simple_link_test_code" >conftest.$ac_ext +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err +_lt_linker_boilerplate=`cat conftest.err` +$rm conftest* -# Compiler flag to allow reflexive dlopens. -export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_CXX -# Compiler flag to generate shared objects directly from archives. -whole_archive_flag_spec=$lt_whole_archive_flag_spec_CXX - -# Compiler flag to generate thread-safe objects. -thread_safe_flag_spec=$lt_thread_safe_flag_spec_CXX - -# Library versioning type. -version_type=$version_type +# Allow CC to be a program name with arguments. +lt_save_CC="$CC" +CC=${F77-"f77"} +compiler=$CC +compiler_F77=$CC +for cc_temp in $compiler""; do + case $cc_temp in + compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; + distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; + \-*) ;; + *) break;; + esac +done +cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` -# Format of library name prefix. -libname_spec=$lt_libname_spec -# List of archive names. First name is the real one, the rest are links. -# The last name is the one that the linker finds with -lNAME. -library_names_spec=$lt_library_names_spec +echo "$as_me:$LINENO: checking if libtool supports shared libraries" >&5 +echo $ECHO_N "checking if libtool supports shared libraries... $ECHO_C" >&6 +echo "$as_me:$LINENO: result: $can_build_shared" >&5 +echo "${ECHO_T}$can_build_shared" >&6 -# The coded name of the library, if different from the real name. -soname_spec=$lt_soname_spec +echo "$as_me:$LINENO: checking whether to build shared libraries" >&5 +echo $ECHO_N "checking whether to build shared libraries... $ECHO_C" >&6 +test "$can_build_shared" = "no" && enable_shared=no -# Commands used to build and install an old-style archive. -RANLIB=$lt_RANLIB -old_archive_cmds=$lt_old_archive_cmds_CXX -old_postinstall_cmds=$lt_old_postinstall_cmds -old_postuninstall_cmds=$lt_old_postuninstall_cmds +# On AIX, shared libraries and static libraries use the same namespace, and +# are all built from PIC. +case $host_os in +aix3*) + test "$enable_shared" = yes && enable_static=no + if test -n "$RANLIB"; then + archive_cmds="$archive_cmds~\$RANLIB \$lib" + postinstall_cmds='$RANLIB $lib' + fi + ;; +aix4* | aix5*) + if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then + test "$enable_shared" = yes && enable_static=no + fi + ;; +esac +echo "$as_me:$LINENO: result: $enable_shared" >&5 +echo "${ECHO_T}$enable_shared" >&6 -# Create an old-style archive from a shared archive. -old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_CXX +echo "$as_me:$LINENO: checking whether to build static libraries" >&5 +echo $ECHO_N "checking whether to build static libraries... $ECHO_C" >&6 +# Make sure either enable_shared or enable_static is yes. +test "$enable_shared" = yes || enable_static=yes +echo "$as_me:$LINENO: result: $enable_static" >&5 +echo "${ECHO_T}$enable_static" >&6 -# Create a temporary old-style archive to link instead of a shared archive. -old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_CXX +GCC_F77="$G77" +LD_F77="$LD" -# Commands used to build and install a shared archive. -archive_cmds=$lt_archive_cmds_CXX -archive_expsym_cmds=$lt_archive_expsym_cmds_CXX -postinstall_cmds=$lt_postinstall_cmds -postuninstall_cmds=$lt_postuninstall_cmds +lt_prog_compiler_wl_F77= +lt_prog_compiler_pic_F77= +lt_prog_compiler_static_F77= -# Commands used to build a loadable module (assumed same as above if empty) -module_cmds=$lt_module_cmds_CXX -module_expsym_cmds=$lt_module_expsym_cmds_CXX +echo "$as_me:$LINENO: checking for $compiler option to produce PIC" >&5 +echo $ECHO_N "checking for $compiler option to produce PIC... $ECHO_C" >&6 -# Commands to strip libraries. -old_striplib=$lt_old_striplib -striplib=$lt_striplib + if test "$GCC" = yes; then + lt_prog_compiler_wl_F77='-Wl,' + lt_prog_compiler_static_F77='-static' -# Dependencies to place before the objects being linked to create a -# shared library. -predep_objects=$lt_predep_objects_CXX + case $host_os in + aix*) + # All AIX code is PIC. + if test "$host_cpu" = ia64; then + # AIX 5 now supports IA64 processor + lt_prog_compiler_static_F77='-Bstatic' + fi + ;; -# Dependencies to place after the objects being linked to create a -# shared library. -postdep_objects=$lt_postdep_objects_CXX + amigaos*) + # FIXME: we need at least 68020 code to build shared libraries, but + # adding the `-m68020' flag to GCC prevents building anything better, + # like `-m68040'. + lt_prog_compiler_pic_F77='-m68020 -resident32 -malways-restore-a4' + ;; -# Dependencies to place before the objects being linked to create a -# shared library. -predeps=$lt_predeps_CXX + beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) + # PIC is the default for these OSes. + ;; -# Dependencies to place after the objects being linked to create a -# shared library. -postdeps=$lt_postdeps_CXX + mingw* | pw32* | os2*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + lt_prog_compiler_pic_F77='-DDLL_EXPORT' + ;; -# The library search path used internally by the compiler when linking -# a shared library. -compiler_lib_search_path=$lt_compiler_lib_search_path_CXX + darwin* | rhapsody*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + lt_prog_compiler_pic_F77='-fno-common' + ;; -# Method to check whether dependent libraries are shared objects. -deplibs_check_method=$lt_deplibs_check_method + interix3*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; -# Command to use when deplibs_check_method == file_magic. -file_magic_cmd=$lt_file_magic_cmd + msdosdjgpp*) + # Just because we use GCC doesn't mean we suddenly get shared libraries + # on systems that don't support them. + lt_prog_compiler_can_build_shared_F77=no + enable_shared=no + ;; -# Flag that allows shared libraries with undefined symbols to be built. -allow_undefined_flag=$lt_allow_undefined_flag_CXX + sysv4*MP*) + if test -d /usr/nec; then + lt_prog_compiler_pic_F77=-Kconform_pic + fi + ;; -# Flag that forces no undefined symbols. -no_undefined_flag=$lt_no_undefined_flag_CXX + hpux*) + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but + # not for PA HP-UX. + case $host_cpu in + hppa*64*|ia64*) + # +Z the default + ;; + *) + lt_prog_compiler_pic_F77='-fPIC' + ;; + esac + ;; -# Commands used to finish a libtool library installation in a directory. -finish_cmds=$lt_finish_cmds + *) + lt_prog_compiler_pic_F77='-fPIC' + ;; + esac + else + # PORTME Check for flag to pass linker flags through the system compiler. + case $host_os in + aix*) + lt_prog_compiler_wl_F77='-Wl,' + if test "$host_cpu" = ia64; then + # AIX 5 now supports IA64 processor + lt_prog_compiler_static_F77='-Bstatic' + else + lt_prog_compiler_static_F77='-bnso -bI:/lib/syscalls.exp' + fi + ;; + darwin*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + case $cc_basename in + xlc*) + lt_prog_compiler_pic_F77='-qnocommon' + lt_prog_compiler_wl_F77='-Wl,' + ;; + esac + ;; -# Same as above, but a single script fragment to be evaled but not shown. -finish_eval=$lt_finish_eval + mingw* | pw32* | os2*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + lt_prog_compiler_pic_F77='-DDLL_EXPORT' + ;; -# Take the output of nm and produce a listing of raw symbols and C names. -global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe + hpux9* | hpux10* | hpux11*) + lt_prog_compiler_wl_F77='-Wl,' + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but + # not for PA HP-UX. + case $host_cpu in + hppa*64*|ia64*) + # +Z the default + ;; + *) + lt_prog_compiler_pic_F77='+Z' + ;; + esac + # Is there a better lt_prog_compiler_static that works with the bundled CC? + lt_prog_compiler_static_F77='${wl}-a ${wl}archive' + ;; -# Transform the output of nm in a proper C declaration -global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl + irix5* | irix6* | nonstopux*) + lt_prog_compiler_wl_F77='-Wl,' + # PIC (with -KPIC) is the default. + lt_prog_compiler_static_F77='-non_shared' + ;; -# Transform the output of nm in a C name address pair -global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_address + newsos6) + lt_prog_compiler_pic_F77='-KPIC' + lt_prog_compiler_static_F77='-Bstatic' + ;; -# This is the shared library runtime path variable. -runpath_var=$runpath_var + linux*) + case $cc_basename in + icc* | ecc*) + lt_prog_compiler_wl_F77='-Wl,' + lt_prog_compiler_pic_F77='-KPIC' + lt_prog_compiler_static_F77='-static' + ;; + pgcc* | pgf77* | pgf90* | pgf95*) + # Portland Group compilers (*not* the Pentium gcc compiler, + # which looks to be a dead project) + lt_prog_compiler_wl_F77='-Wl,' + lt_prog_compiler_pic_F77='-fpic' + lt_prog_compiler_static_F77='-Bstatic' + ;; + ccc*) + lt_prog_compiler_wl_F77='-Wl,' + # All Alpha code is PIC. + lt_prog_compiler_static_F77='-non_shared' + ;; + esac + ;; -# This is the shared library path variable. -shlibpath_var=$shlibpath_var + osf3* | osf4* | osf5*) + lt_prog_compiler_wl_F77='-Wl,' + # All OSF/1 code is PIC. + lt_prog_compiler_static_F77='-non_shared' + ;; -# Is shlibpath searched before the hard-coded library search path? -shlibpath_overrides_runpath=$shlibpath_overrides_runpath - -# How to hardcode a shared library path into an executable. -hardcode_action=$hardcode_action_CXX + solaris*) + lt_prog_compiler_pic_F77='-KPIC' + lt_prog_compiler_static_F77='-Bstatic' + case $cc_basename in + f77* | f90* | f95*) + lt_prog_compiler_wl_F77='-Qoption ld ';; + *) + lt_prog_compiler_wl_F77='-Wl,';; + esac + ;; -# Whether we should hardcode library paths into libraries. -hardcode_into_libs=$hardcode_into_libs + sunos4*) + lt_prog_compiler_wl_F77='-Qoption ld ' + lt_prog_compiler_pic_F77='-PIC' + lt_prog_compiler_static_F77='-Bstatic' + ;; -# Flag to hardcode \$libdir into a binary during linking. -# This must work even if \$libdir does not exist. -hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_CXX + sysv4 | sysv4.2uw2* | sysv4.3*) + lt_prog_compiler_wl_F77='-Wl,' + lt_prog_compiler_pic_F77='-KPIC' + lt_prog_compiler_static_F77='-Bstatic' + ;; -# If ld is used when linking, flag to hardcode \$libdir into -# a binary during linking. This must work even if \$libdir does -# not exist. -hardcode_libdir_flag_spec_ld=$lt_hardcode_libdir_flag_spec_ld_CXX + sysv4*MP*) + if test -d /usr/nec ;then + lt_prog_compiler_pic_F77='-Kconform_pic' + lt_prog_compiler_static_F77='-Bstatic' + fi + ;; -# Whether we need a single -rpath flag with a separated argument. -hardcode_libdir_separator=$lt_hardcode_libdir_separator_CXX + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + lt_prog_compiler_wl_F77='-Wl,' + lt_prog_compiler_pic_F77='-KPIC' + lt_prog_compiler_static_F77='-Bstatic' + ;; -# Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the -# resulting binary. -hardcode_direct=$hardcode_direct_CXX + unicos*) + lt_prog_compiler_wl_F77='-Wl,' + lt_prog_compiler_can_build_shared_F77=no + ;; -# Set to yes if using the -LDIR flag during linking hardcodes DIR into the -# resulting binary. -hardcode_minus_L=$hardcode_minus_L_CXX + uts4*) + lt_prog_compiler_pic_F77='-pic' + lt_prog_compiler_static_F77='-Bstatic' + ;; -# Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into -# the resulting binary. -hardcode_shlibpath_var=$hardcode_shlibpath_var_CXX + *) + lt_prog_compiler_can_build_shared_F77=no + ;; + esac + fi -# Set to yes if building a shared library automatically hardcodes DIR into the library -# and all subsequent libraries and executables linked against it. -hardcode_automatic=$hardcode_automatic_CXX +echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_F77" >&5 +echo "${ECHO_T}$lt_prog_compiler_pic_F77" >&6 -# Variables whose values should be saved in libtool wrapper scripts and -# restored at relink time. -variables_saved_for_relink="$variables_saved_for_relink" +# +# Check to make sure the PIC flag actually works. +# +if test -n "$lt_prog_compiler_pic_F77"; then -# Whether libtool must link a program against all its dependency libraries. -link_all_deplibs=$link_all_deplibs_CXX +echo "$as_me:$LINENO: checking if $compiler PIC flag $lt_prog_compiler_pic_F77 works" >&5 +echo $ECHO_N "checking if $compiler PIC flag $lt_prog_compiler_pic_F77 works... $ECHO_C" >&6 +if test "${lt_prog_compiler_pic_works_F77+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + lt_prog_compiler_pic_works_F77=no + ac_outfile=conftest.$ac_objext + printf "$lt_simple_compile_test_code" > conftest.$ac_ext + lt_compiler_flag="$lt_prog_compiler_pic_F77" + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + # The option is referenced via a variable to avoid confusing sed. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:13168: $lt_compile\"" >&5) + (eval "$lt_compile" 2>conftest.err) + ac_status=$? + cat conftest.err >&5 + echo "$as_me:13172: \$? = $ac_status" >&5 + if (exit $ac_status) && test -s "$ac_outfile"; then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings other than the usual output. + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then + lt_prog_compiler_pic_works_F77=yes + fi + fi + $rm conftest* -# Compile-time system search path for libraries -sys_lib_search_path_spec=$lt_sys_lib_search_path_spec +fi +echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_works_F77" >&5 +echo "${ECHO_T}$lt_prog_compiler_pic_works_F77" >&6 -# Run-time system search path for libraries -sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec +if test x"$lt_prog_compiler_pic_works_F77" = xyes; then + case $lt_prog_compiler_pic_F77 in + "" | " "*) ;; + *) lt_prog_compiler_pic_F77=" $lt_prog_compiler_pic_F77" ;; + esac +else + lt_prog_compiler_pic_F77= + lt_prog_compiler_can_build_shared_F77=no +fi -# Fix the shell variable \$srcfile for the compiler. -fix_srcfile_path="$fix_srcfile_path_CXX" +fi +case $host_os in + # For platforms which do not support PIC, -DPIC is meaningless: + *djgpp*) + lt_prog_compiler_pic_F77= + ;; + *) + lt_prog_compiler_pic_F77="$lt_prog_compiler_pic_F77" + ;; +esac -# Set to yes if exported symbols are required. -always_export_symbols=$always_export_symbols_CXX +# +# Check to make sure the static flag actually works. +# +wl=$lt_prog_compiler_wl_F77 eval lt_tmp_static_flag=\"$lt_prog_compiler_static_F77\" +echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 +echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 +if test "${lt_prog_compiler_static_works_F77+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + lt_prog_compiler_static_works_F77=no + save_LDFLAGS="$LDFLAGS" + LDFLAGS="$LDFLAGS $lt_tmp_static_flag" + printf "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&5 + $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + lt_prog_compiler_static_works_F77=yes + fi + else + lt_prog_compiler_static_works_F77=yes + fi + fi + $rm conftest* + LDFLAGS="$save_LDFLAGS" -# The commands to list exported symbols. -export_symbols_cmds=$lt_export_symbols_cmds_CXX +fi +echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works_F77" >&5 +echo "${ECHO_T}$lt_prog_compiler_static_works_F77" >&6 -# The commands to extract the exported symbol list from a shared archive. -extract_expsyms_cmds=$lt_extract_expsyms_cmds +if test x"$lt_prog_compiler_static_works_F77" = xyes; then + : +else + lt_prog_compiler_static_F77= +fi -# Symbols that should not be listed in the preloaded symbols. -exclude_expsyms=$lt_exclude_expsyms_CXX -# Symbols that must always be exported. -include_expsyms=$lt_include_expsyms_CXX +echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 +echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 +if test "${lt_cv_prog_compiler_c_o_F77+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + lt_cv_prog_compiler_c_o_F77=no + $rm -r conftest 2>/dev/null + mkdir conftest + cd conftest + mkdir out + printf "$lt_simple_compile_test_code" > conftest.$ac_ext -# ### END LIBTOOL TAG CONFIG: $tagname + lt_compiler_flag="-o out/conftest2.$ac_objext" + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:13272: $lt_compile\"" >&5) + (eval "$lt_compile" 2>out/conftest.err) + ac_status=$? + cat out/conftest.err >&5 + echo "$as_me:13276: \$? = $ac_status" >&5 + if (exit $ac_status) && test -s out/conftest2.$ac_objext + then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then + lt_cv_prog_compiler_c_o_F77=yes + fi + fi + chmod u+w . 2>&5 + $rm conftest* + # SGI C++ compiler will create directory out/ii_files/ for + # template instantiation + test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files + $rm out/* && rmdir out + cd .. + rmdir conftest + $rm conftest* -__EOF__ +fi +echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_c_o_F77" >&5 +echo "${ECHO_T}$lt_cv_prog_compiler_c_o_F77" >&6 -else - # If there is no Makefile yet, we rely on a make rule to execute - # `config.status --recheck' to rerun these tests and create the - # libtool script then. - ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` - if test -f "$ltmain_in"; then - test -f Makefile && make "$ltmain" +hard_links="nottested" +if test "$lt_cv_prog_compiler_c_o_F77" = no && test "$need_locks" != no; then + # do not overwrite the value of need_locks provided by the user + echo "$as_me:$LINENO: checking if we can lock with hard links" >&5 +echo $ECHO_N "checking if we can lock with hard links... $ECHO_C" >&6 + hard_links=yes + $rm conftest* + ln conftest.a conftest.b 2>/dev/null && hard_links=no + touch conftest.a + ln conftest.a conftest.b 2>&5 || hard_links=no + ln conftest.a conftest.b 2>/dev/null && hard_links=no + echo "$as_me:$LINENO: result: $hard_links" >&5 +echo "${ECHO_T}$hard_links" >&6 + if test "$hard_links" = no; then + { echo "$as_me:$LINENO: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&5 +echo "$as_me: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&2;} + need_locks=warn fi +else + need_locks=no fi +echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 +echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - -CC=$lt_save_CC -LDCXX=$LD -LD=$lt_save_LD -GCC=$lt_save_GCC -with_gnu_ldcxx=$with_gnu_ld -with_gnu_ld=$lt_save_with_gnu_ld -lt_cv_path_LDCXX=$lt_cv_path_LD -lt_cv_path_LD=$lt_save_path_LD -lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld -lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld - - else - tagname="" - fi - ;; - - F77) - if test -n "$F77" && test "X$F77" != "Xno"; then - -ac_ext=f -ac_compile='$F77 -c $FFLAGS conftest.$ac_ext >&5' -ac_link='$F77 -o conftest$ac_exeext $FFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_f77_compiler_gnu - - -archive_cmds_need_lc_F77=no -allow_undefined_flag_F77= -always_export_symbols_F77=no -archive_expsym_cmds_F77= -export_dynamic_flag_spec_F77= -hardcode_direct_F77=no -hardcode_libdir_flag_spec_F77= -hardcode_libdir_flag_spec_ld_F77= -hardcode_libdir_separator_F77= -hardcode_minus_L_F77=no -hardcode_automatic_F77=no -module_cmds_F77= -module_expsym_cmds_F77= -link_all_deplibs_F77=unknown -old_archive_cmds_F77=$old_archive_cmds -no_undefined_flag_F77= -whole_archive_flag_spec_F77= -enable_shared_with_static_runtimes_F77=no - -# Source file extension for f77 test sources. -ac_ext=f - -# Object file extension for compiled f77 test sources. -objext=o -objext_F77=$objext - -# Code to be used in simple compile tests -lt_simple_compile_test_code=" subroutine t\n return\n end\n" - -# Code to be used in simple link tests -lt_simple_link_test_code=" program t\n end\n" - -# ltmain only uses $CC for tagged configurations so make sure $CC is set. - -# If no C compiler was specified, use CC. -LTCC=${LTCC-"$CC"} - -# Allow CC to be a program name with arguments. -compiler=$CC - - -# save warnings/boilerplate of simple test code -ac_outfile=conftest.$ac_objext -printf "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err -_lt_compiler_boilerplate=`cat conftest.err` -$rm conftest* - -ac_outfile=conftest.$ac_objext -printf "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err -_lt_linker_boilerplate=`cat conftest.err` -$rm conftest* - - -# Allow CC to be a program name with arguments. -lt_save_CC="$CC" -CC=${F77-"f77"} -compiler=$CC -compiler_F77=$CC -for cc_temp in $compiler""; do + runpath_var= + allow_undefined_flag_F77= + enable_shared_with_static_runtimes_F77=no + archive_cmds_F77= + archive_expsym_cmds_F77= + old_archive_From_new_cmds_F77= + old_archive_from_expsyms_cmds_F77= + export_dynamic_flag_spec_F77= + whole_archive_flag_spec_F77= + thread_safe_flag_spec_F77= + hardcode_libdir_flag_spec_F77= + hardcode_libdir_flag_spec_ld_F77= + hardcode_libdir_separator_F77= + hardcode_direct_F77=no + hardcode_minus_L_F77=no + hardcode_shlibpath_var_F77=unsupported + link_all_deplibs_F77=unknown + hardcode_automatic_F77=no + module_cmds_F77= + module_expsym_cmds_F77= + always_export_symbols_F77=no + export_symbols_cmds_F77='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' + # include_expsyms should be a list of space-separated symbols to be *always* + # included in the symbol list + include_expsyms_F77= + # exclude_expsyms can be an extended regexp of symbols to exclude + # it will be wrapped by ` (' and `)$', so one must not match beginning or + # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc', + # as well as any symbol that contains `d'. + exclude_expsyms_F77="_GLOBAL_OFFSET_TABLE_" + # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out + # platforms (ab)use it in PIC code, but their linkers get confused if + # the symbol is explicitly referenced. Since portable code cannot + # rely on this symbol name, it's probably fine to never include it in + # preloaded symbol tables. + extract_expsyms_cmds= + # Just being paranoid about ensuring that cc_basename is set. + for cc_temp in $compiler""; do case $cc_temp in compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; @@ -13417,500 +13371,460 @@ for cc_temp in $compiler""; do done cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` + case $host_os in + cygwin* | mingw* | pw32*) + # FIXME: the MSVC++ port hasn't been tested in a loooong time + # When not using gcc, we currently assume that we are using + # Microsoft Visual C++. + if test "$GCC" != yes; then + with_gnu_ld=no + fi + ;; + interix*) + # we just hope/assume this is gcc and not c89 (= MSVC++) + with_gnu_ld=yes + ;; + openbsd*) + with_gnu_ld=no + ;; + esac -echo "$as_me:$LINENO: checking if libtool supports shared libraries" >&5 -echo $ECHO_N "checking if libtool supports shared libraries... $ECHO_C" >&6 -echo "$as_me:$LINENO: result: $can_build_shared" >&5 -echo "${ECHO_T}$can_build_shared" >&6 - -echo "$as_me:$LINENO: checking whether to build shared libraries" >&5 -echo $ECHO_N "checking whether to build shared libraries... $ECHO_C" >&6 -test "$can_build_shared" = "no" && enable_shared=no - -# On AIX, shared libraries and static libraries use the same namespace, and -# are all built from PIC. -case "$host_os" in -aix3*) - test "$enable_shared" = yes && enable_static=no - if test -n "$RANLIB"; then - archive_cmds="$archive_cmds~\$RANLIB \$lib" - postinstall_cmds='$RANLIB $lib' - fi - ;; -aix4* | aix5*) - if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then - test "$enable_shared" = yes && enable_static=no - fi - ;; -esac -echo "$as_me:$LINENO: result: $enable_shared" >&5 -echo "${ECHO_T}$enable_shared" >&6 - -echo "$as_me:$LINENO: checking whether to build static libraries" >&5 -echo $ECHO_N "checking whether to build static libraries... $ECHO_C" >&6 -# Make sure either enable_shared or enable_static is yes. -test "$enable_shared" = yes || enable_static=yes -echo "$as_me:$LINENO: result: $enable_static" >&5 -echo "${ECHO_T}$enable_static" >&6 - -test "$ld_shlibs_F77" = no && can_build_shared=no - -GCC_F77="$G77" -LD_F77="$LD" + ld_shlibs_F77=yes + if test "$with_gnu_ld" = yes; then + # If archive_cmds runs LD, not CC, wlarc should be empty + wlarc='${wl}' -lt_prog_compiler_wl_F77= -lt_prog_compiler_pic_F77= -lt_prog_compiler_static_F77= + # Set some defaults for GNU ld with shared library support. These + # are reset later if shared libraries are not supported. Putting them + # here allows them to be overridden if necessary. + runpath_var=LD_RUN_PATH + hardcode_libdir_flag_spec_F77='${wl}--rpath ${wl}$libdir' + export_dynamic_flag_spec_F77='${wl}--export-dynamic' + # ancient GNU ld didn't support --whole-archive et. al. + if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then + whole_archive_flag_spec_F77="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' + else + whole_archive_flag_spec_F77= + fi + supports_anon_versioning=no + case `$LD -v 2>/dev/null` in + *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 + *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... + *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... + *\ 2.11.*) ;; # other 2.11 versions + *) supports_anon_versioning=yes ;; + esac -echo "$as_me:$LINENO: checking for $compiler option to produce PIC" >&5 -echo $ECHO_N "checking for $compiler option to produce PIC... $ECHO_C" >&6 + # See if GNU ld supports shared libraries. + case $host_os in + aix3* | aix4* | aix5*) + # On AIX/PPC, the GNU linker is very broken + if test "$host_cpu" != ia64; then + ld_shlibs_F77=no + cat <&2 - if test "$GCC" = yes; then - lt_prog_compiler_wl_F77='-Wl,' - lt_prog_compiler_static_F77='-static' +*** Warning: the GNU linker, at least up to release 2.9.1, is reported +*** to be unable to reliably create shared libraries on AIX. +*** Therefore, libtool is disabling shared libraries support. If you +*** really care for shared libraries, you may want to modify your PATH +*** so that a non-GNU linker is found, and then restart. - case $host_os in - aix*) - # All AIX code is PIC. - if test "$host_cpu" = ia64; then - # AIX 5 now supports IA64 processor - lt_prog_compiler_static_F77='-Bstatic' +EOF fi ;; amigaos*) - # FIXME: we need at least 68020 code to build shared libraries, but - # adding the `-m68020' flag to GCC prevents building anything better, - # like `-m68040'. - lt_prog_compiler_pic_F77='-m68020 -resident32 -malways-restore-a4' - ;; + archive_cmds_F77='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' + hardcode_libdir_flag_spec_F77='-L$libdir' + hardcode_minus_L_F77=yes - beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) - # PIC is the default for these OSes. + # Samuel A. Falvo II reports + # that the semantics of dynamic libraries on AmigaOS, at least up + # to version 4, is to share data among multiple programs linked + # with the same dynamic library. Since this doesn't match the + # behavior of shared libraries on other platforms, we can't use + # them. + ld_shlibs_F77=no ;; - mingw* | pw32* | os2*) - # This hack is so that the source file can tell whether it is being - # built for inclusion in a dll (and should export symbols for example). - lt_prog_compiler_pic_F77='-DDLL_EXPORT' + beos*) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + allow_undefined_flag_F77=unsupported + # Joseph Beckenbach says some releases of gcc + # support --undefined. This deserves some investigation. FIXME + archive_cmds_F77='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + else + ld_shlibs_F77=no + fi ;; - darwin* | rhapsody*) - # PIC is the default on this platform - # Common symbols not allowed in MH_DYLIB files - lt_prog_compiler_pic_F77='-fno-common' - ;; + cygwin* | mingw* | pw32*) + # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, F77) is actually meaningless, + # as there is no search path for DLLs. + hardcode_libdir_flag_spec_F77='-L$libdir' + allow_undefined_flag_F77=unsupported + always_export_symbols_F77=no + enable_shared_with_static_runtimes_F77=yes + export_symbols_cmds_F77='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' - msdosdjgpp*) - # Just because we use GCC doesn't mean we suddenly get shared libraries - # on systems that don't support them. - lt_prog_compiler_can_build_shared_F77=no - enable_shared=no - ;; - - sysv4*MP*) - if test -d /usr/nec; then - lt_prog_compiler_pic_F77=-Kconform_pic + if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then + archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + # If the export-symbols file already is a .def file (1st line + # is EXPORTS), use it as is; otherwise, prepend... + archive_expsym_cmds_F77='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then + cp $export_symbols $output_objdir/$soname.def; + else + echo EXPORTS > $output_objdir/$soname.def; + cat $export_symbols >> $output_objdir/$soname.def; + fi~ + $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + else + ld_shlibs_F77=no fi ;; - hpux*) - # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but - # not for PA HP-UX. - case "$host_cpu" in - hppa*64*|ia64*) - # +Z the default - ;; - *) - lt_prog_compiler_pic_F77='-fPIC' - ;; - esac + interix3*) + hardcode_direct_F77=no + hardcode_shlibpath_var_F77=no + hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' + export_dynamic_flag_spec_F77='${wl}-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + archive_cmds_F77='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + archive_expsym_cmds_F77='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' ;; - *) - lt_prog_compiler_pic_F77='-fPIC' - ;; - esac - else - # PORTME Check for flag to pass linker flags through the system compiler. - case $host_os in - aix*) - lt_prog_compiler_wl_F77='-Wl,' - if test "$host_cpu" = ia64; then - # AIX 5 now supports IA64 processor - lt_prog_compiler_static_F77='-Bstatic' + linux*) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + tmp_addflag= + case $cc_basename,$host_cpu in + pgcc*) # Portland Group C compiler + whole_archive_flag_spec_F77='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag' + ;; + pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers + whole_archive_flag_spec_F77='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag -Mnomain' ;; + ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 + tmp_addflag=' -i_dynamic' ;; + efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 + tmp_addflag=' -i_dynamic -nofor_main' ;; + ifc* | ifort*) # Intel Fortran compiler + tmp_addflag=' -nofor_main' ;; + esac + archive_cmds_F77='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + + if test $supports_anon_versioning = yes; then + archive_expsym_cmds_F77='$echo "{ global:" > $output_objdir/$libname.ver~ + cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ + $echo "local: *; };" >> $output_objdir/$libname.ver~ + $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib' + fi else - lt_prog_compiler_static_F77='-bnso -bI:/lib/syscalls.exp' + ld_shlibs_F77=no fi ;; - darwin*) - # PIC is the default on this platform - # Common symbols not allowed in MH_DYLIB files - case $cc_basename in - xlc*) - lt_prog_compiler_pic_F77='-qnocommon' - lt_prog_compiler_wl_F77='-Wl,' - ;; - esac - ;; - - mingw* | pw32* | os2*) - # This hack is so that the source file can tell whether it is being - # built for inclusion in a dll (and should export symbols for example). - lt_prog_compiler_pic_F77='-DDLL_EXPORT' - ;; - hpux9* | hpux10* | hpux11*) - lt_prog_compiler_wl_F77='-Wl,' - # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but - # not for PA HP-UX. - case "$host_cpu" in - hppa*64*|ia64*) - # +Z the default - ;; - *) - lt_prog_compiler_pic_F77='+Z' - ;; - esac - # Is there a better lt_prog_compiler_static that works with the bundled CC? - lt_prog_compiler_static_F77='${wl}-a ${wl}archive' + netbsd*) + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + archive_cmds_F77='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' + wlarc= + else + archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + fi ;; - irix5* | irix6* | nonstopux*) - lt_prog_compiler_wl_F77='-Wl,' - # PIC (with -KPIC) is the default. - lt_prog_compiler_static_F77='-non_shared' - ;; + solaris*) + if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then + ld_shlibs_F77=no + cat <&2 - newsos6) - lt_prog_compiler_pic_F77='-KPIC' - lt_prog_compiler_static_F77='-Bstatic' - ;; +*** Warning: The releases 2.8.* of the GNU linker cannot reliably +*** create shared libraries on Solaris systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.9.1 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. - linux*) - case $cc_basename in - icc* | ecc*) - lt_prog_compiler_wl_F77='-Wl,' - lt_prog_compiler_pic_F77='-KPIC' - lt_prog_compiler_static_F77='-static' - ;; - pgcc* | pgf77* | pgf90*) - # Portland Group compilers (*not* the Pentium gcc compiler, - # which looks to be a dead project) - lt_prog_compiler_wl_F77='-Wl,' - lt_prog_compiler_pic_F77='-fpic' - lt_prog_compiler_static_F77='-static' - ;; - ccc*) - lt_prog_compiler_wl_F77='-Wl,' - # All Alpha code is PIC. - lt_prog_compiler_static_F77='-non_shared' - ;; - esac +EOF + elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + else + ld_shlibs_F77=no + fi ;; - osf3* | osf4* | osf5*) - lt_prog_compiler_wl_F77='-Wl,' - # All OSF/1 code is PIC. - lt_prog_compiler_static_F77='-non_shared' - ;; + sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) + case `$LD -v 2>&1` in + *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) + ld_shlibs_F77=no + cat <<_LT_EOF 1>&2 - sco3.2v5*) - lt_prog_compiler_pic_F77='-Kpic' - lt_prog_compiler_static_F77='-dn' - ;; +*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not +*** reliably create shared libraries on SCO systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.16.91.0.3 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. - solaris*) - lt_prog_compiler_pic_F77='-KPIC' - lt_prog_compiler_static_F77='-Bstatic' - case $cc_basename in - f77* | f90* | f95*) - lt_prog_compiler_wl_F77='-Qoption ld ';; - *) - lt_prog_compiler_wl_F77='-Wl,';; +_LT_EOF + ;; + *) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + hardcode_libdir_flag_spec_F77='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' + archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' + archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib' + else + ld_shlibs_F77=no + fi + ;; esac ;; sunos4*) - lt_prog_compiler_wl_F77='-Qoption ld ' - lt_prog_compiler_pic_F77='-PIC' - lt_prog_compiler_static_F77='-Bstatic' - ;; - - sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) - lt_prog_compiler_wl_F77='-Wl,' - lt_prog_compiler_pic_F77='-KPIC' - lt_prog_compiler_static_F77='-Bstatic' + archive_cmds_F77='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' + wlarc= + hardcode_direct_F77=yes + hardcode_shlibpath_var_F77=no ;; - sysv4*MP*) - if test -d /usr/nec ;then - lt_prog_compiler_pic_F77='-Kconform_pic' - lt_prog_compiler_static_F77='-Bstatic' + *) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + else + ld_shlibs_F77=no fi ;; + esac - unicos*) - lt_prog_compiler_wl_F77='-Wl,' - lt_prog_compiler_can_build_shared_F77=no + if test "$ld_shlibs_F77" = no; then + runpath_var= + hardcode_libdir_flag_spec_F77= + export_dynamic_flag_spec_F77= + whole_archive_flag_spec_F77= + fi + else + # PORTME fill in a description of your system's linker (not GNU ld) + case $host_os in + aix3*) + allow_undefined_flag_F77=unsupported + always_export_symbols_F77=yes + archive_expsym_cmds_F77='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' + # Note: this linker hardcodes the directories in LIBPATH if there + # are no directories specified by -L. + hardcode_minus_L_F77=yes + if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then + # Neither direct hardcoding nor static linking is supported with a + # broken collect2. + hardcode_direct_F77=unsupported + fi ;; - uts4*) - lt_prog_compiler_pic_F77='-pic' - lt_prog_compiler_static_F77='-Bstatic' - ;; + aix4* | aix5*) + if test "$host_cpu" = ia64; then + # On IA64, the linker does run time linking by default, so we don't + # have to do anything special. + aix_use_runtimelinking=no + exp_sym_flag='-Bexport' + no_entry_flag="" + else + # If we're using GNU nm, then we don't want the "-C" option. + # -C means demangle to AIX nm, but means don't demangle with GNU nm + if $NM -V 2>&1 | grep 'GNU' > /dev/null; then + export_symbols_cmds_F77='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' + else + export_symbols_cmds_F77='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' + fi + aix_use_runtimelinking=no - *) - lt_prog_compiler_can_build_shared_F77=no - ;; - esac - fi + # Test if we are trying to use run time linking or normal + # AIX style linking. If -brtl is somewhere in LDFLAGS, we + # need to do runtime linking. + case $host_os in aix4.[23]|aix4.[23].*|aix5*) + for ld_flag in $LDFLAGS; do + if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then + aix_use_runtimelinking=yes + break + fi + done + ;; + esac -echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_F77" >&5 -echo "${ECHO_T}$lt_prog_compiler_pic_F77" >&6 + exp_sym_flag='-bexport' + no_entry_flag='-bnoentry' + fi -# -# Check to make sure the PIC flag actually works. -# -if test -n "$lt_prog_compiler_pic_F77"; then + # When large executables or shared objects are built, AIX ld can + # have problems creating the table of contents. If linking a library + # or program results in "error TOC overflow" add -mminimal-toc to + # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not + # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. -echo "$as_me:$LINENO: checking if $compiler PIC flag $lt_prog_compiler_pic_F77 works" >&5 -echo $ECHO_N "checking if $compiler PIC flag $lt_prog_compiler_pic_F77 works... $ECHO_C" >&6 -if test "${lt_prog_compiler_pic_works_F77+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - lt_prog_compiler_pic_works_F77=no - ac_outfile=conftest.$ac_objext - printf "$lt_simple_compile_test_code" > conftest.$ac_ext - lt_compiler_flag="$lt_prog_compiler_pic_F77" - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - # The option is referenced via a variable to avoid confusing sed. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:13695: $lt_compile\"" >&5) - (eval "$lt_compile" 2>conftest.err) - ac_status=$? - cat conftest.err >&5 - echo "$as_me:13699: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s "$ac_outfile"; then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings other than the usual output. - $echo "X$_lt_compiler_boilerplate" | $Xsed >conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if test ! -s conftest.err || diff conftest.exp conftest.er2 >/dev/null; then - lt_prog_compiler_pic_works_F77=yes - fi - fi - $rm conftest* + archive_cmds_F77='' + hardcode_direct_F77=yes + hardcode_libdir_separator_F77=':' + link_all_deplibs_F77=yes -fi -echo "$as_me:$LINENO: result: $lt_prog_compiler_pic_works_F77" >&5 -echo "${ECHO_T}$lt_prog_compiler_pic_works_F77" >&6 + if test "$GCC" = yes; then + case $host_os in aix4.[012]|aix4.[012].*) + # We only want to do this on AIX 4.2 and lower, the check + # below for broken collect2 doesn't work under 4.3+ + collect2name=`${CC} -print-prog-name=collect2` + if test -f "$collect2name" && \ + strings "$collect2name" | grep resolve_lib_name >/dev/null + then + # We have reworked collect2 + hardcode_direct_F77=yes + else + # We have old collect2 + hardcode_direct_F77=unsupported + # It fails to find uninstalled libraries when the uninstalled + # path is not listed in the libpath. Setting hardcode_minus_L + # to unsupported forces relinking + hardcode_minus_L_F77=yes + hardcode_libdir_flag_spec_F77='-L$libdir' + hardcode_libdir_separator_F77= + fi + ;; + esac + shared_flag='-shared' + if test "$aix_use_runtimelinking" = yes; then + shared_flag="$shared_flag "'${wl}-G' + fi + else + # not using gcc + if test "$host_cpu" = ia64; then + # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release + # chokes on -Wl,-G. The following line is correct: + shared_flag='-G' + else + if test "$aix_use_runtimelinking" = yes; then + shared_flag='${wl}-G' + else + shared_flag='${wl}-bM:SRE' + fi + fi + fi -if test x"$lt_prog_compiler_pic_works_F77" = xyes; then - case $lt_prog_compiler_pic_F77 in - "" | " "*) ;; - *) lt_prog_compiler_pic_F77=" $lt_prog_compiler_pic_F77" ;; - esac -else - lt_prog_compiler_pic_F77= - lt_prog_compiler_can_build_shared_F77=no -fi + # It seems that -bexpall does not export symbols beginning with + # underscore (_), so it is better to generate a list of symbols to export. + always_export_symbols_F77=yes + if test "$aix_use_runtimelinking" = yes; then + # Warning - without using the other runtime loading flags (-brtl), + # -berok will link without error, but may produce a broken library. + allow_undefined_flag_F77='-berok' + # Determine the default libpath from the value encoded in an empty executable. + cat >conftest.$ac_ext <<_ACEOF + program main -fi -case "$host_os" in - # For platforms which do not support PIC, -DPIC is meaningless: - *djgpp*) - lt_prog_compiler_pic_F77= - ;; - *) - lt_prog_compiler_pic_F77="$lt_prog_compiler_pic_F77" - ;; -esac + end +_ACEOF +rm -f conftest.$ac_objext conftest$ac_exeext +if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 + (eval $ac_link) 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && + { ac_try='test -z "$ac_f77_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest$ac_exeext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then -echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 -echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 -if test "${lt_cv_prog_compiler_c_o_F77+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 +aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } +}'` +# Check for a 64-bit object if we didn't find anything. +if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } +}'`; fi else - lt_cv_prog_compiler_c_o_F77=no - $rm -r conftest 2>/dev/null - mkdir conftest - cd conftest - mkdir out - printf "$lt_simple_compile_test_code" > conftest.$ac_ext - - lt_compiler_flag="-o out/conftest2.$ac_objext" - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:13757: $lt_compile\"" >&5) - (eval "$lt_compile" 2>out/conftest.err) - ac_status=$? - cat out/conftest.err >&5 - echo "$as_me:13761: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s out/conftest2.$ac_objext - then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings - $echo "X$_lt_compiler_boilerplate" | $Xsed > out/conftest.exp - $SED '/^$/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.err || diff out/conftest.exp out/conftest.er2 >/dev/null; then - lt_cv_prog_compiler_c_o_F77=yes - fi - fi - chmod u+w . - $rm conftest* - # SGI C++ compiler will create directory out/ii_files/ for - # template instantiation - test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files - $rm out/* && rmdir out - cd .. - rmdir conftest - $rm conftest* + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 fi -echo "$as_me:$LINENO: result: $lt_cv_prog_compiler_c_o_F77" >&5 -echo "${ECHO_T}$lt_cv_prog_compiler_c_o_F77" >&6 +rm -f conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi + + hardcode_libdir_flag_spec_F77='${wl}-blibpath:$libdir:'"$aix_libpath" + archive_expsym_cmds_F77="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" + else + if test "$host_cpu" = ia64; then + hardcode_libdir_flag_spec_F77='${wl}-R $libdir:/usr/lib:/lib' + allow_undefined_flag_F77="-z nodefs" + archive_expsym_cmds_F77="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" + else + # Determine the default libpath from the value encoded in an empty executable. + cat >conftest.$ac_ext <<_ACEOF + program main + end +_ACEOF +rm -f conftest.$ac_objext conftest$ac_exeext +if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 + (eval $ac_link) 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && + { ac_try='test -z "$ac_f77_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest$ac_exeext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then -hard_links="nottested" -if test "$lt_cv_prog_compiler_c_o_F77" = no && test "$need_locks" != no; then - # do not overwrite the value of need_locks provided by the user - echo "$as_me:$LINENO: checking if we can lock with hard links" >&5 -echo $ECHO_N "checking if we can lock with hard links... $ECHO_C" >&6 - hard_links=yes - $rm conftest* - ln conftest.a conftest.b 2>/dev/null && hard_links=no - touch conftest.a - ln conftest.a conftest.b 2>&5 || hard_links=no - ln conftest.a conftest.b 2>/dev/null && hard_links=no - echo "$as_me:$LINENO: result: $hard_links" >&5 -echo "${ECHO_T}$hard_links" >&6 - if test "$hard_links" = no; then - { echo "$as_me:$LINENO: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&5 -echo "$as_me: WARNING: \`$CC' does not support \`-c -o', so \`make -j' may be unsafe" >&2;} - need_locks=warn - fi +aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } +}'` +# Check for a 64-bit object if we didn't find anything. +if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } +}'`; fi else - need_locks=no -fi + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 -echo "$as_me:$LINENO: checking whether the $compiler linker ($LD) supports shared libraries" >&5 -echo $ECHO_N "checking whether the $compiler linker ($LD) supports shared libraries... $ECHO_C" >&6 +fi +rm -f conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi - runpath_var= - allow_undefined_flag_F77= - enable_shared_with_static_runtimes_F77=no - archive_cmds_F77= - archive_expsym_cmds_F77= - old_archive_From_new_cmds_F77= - old_archive_from_expsyms_cmds_F77= - export_dynamic_flag_spec_F77= - whole_archive_flag_spec_F77= - thread_safe_flag_spec_F77= - hardcode_libdir_flag_spec_F77= - hardcode_libdir_flag_spec_ld_F77= - hardcode_libdir_separator_F77= - hardcode_direct_F77=no - hardcode_minus_L_F77=no - hardcode_shlibpath_var_F77=unsupported - link_all_deplibs_F77=unknown - hardcode_automatic_F77=no - module_cmds_F77= - module_expsym_cmds_F77= - always_export_symbols_F77=no - export_symbols_cmds_F77='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' - # include_expsyms should be a list of space-separated symbols to be *always* - # included in the symbol list - include_expsyms_F77= - # exclude_expsyms can be an extended regexp of symbols to exclude - # it will be wrapped by ` (' and `)$', so one must not match beginning or - # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc', - # as well as any symbol that contains `d'. - exclude_expsyms_F77="_GLOBAL_OFFSET_TABLE_" - # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out - # platforms (ab)use it in PIC code, but their linkers get confused if - # the symbol is explicitly referenced. Since portable code cannot - # rely on this symbol name, it's probably fine to never include it in - # preloaded symbol tables. - extract_expsyms_cmds= - # Just being paranoid about ensuring that cc_basename is set. - for cc_temp in $compiler""; do - case $cc_temp in - compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; - distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; - \-*) ;; - *) break;; - esac -done -cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` - - case $host_os in - cygwin* | mingw* | pw32*) - # FIXME: the MSVC++ port hasn't been tested in a loooong time - # When not using gcc, we currently assume that we are using - # Microsoft Visual C++. - if test "$GCC" != yes; then - with_gnu_ld=no - fi - ;; - openbsd*) - with_gnu_ld=no - ;; - esac - - ld_shlibs_F77=yes - if test "$with_gnu_ld" = yes; then - # If archive_cmds runs LD, not CC, wlarc should be empty - wlarc='${wl}' - - # Set some defaults for GNU ld with shared library support. These - # are reset later if shared libraries are not supported. Putting them - # here allows them to be overridden if necessary. - runpath_var=LD_RUN_PATH - hardcode_libdir_flag_spec_F77='${wl}--rpath ${wl}$libdir' - export_dynamic_flag_spec_F77='${wl}--export-dynamic' - # ancient GNU ld didn't support --whole-archive et. al. - if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then - whole_archive_flag_spec_F77="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' - else - whole_archive_flag_spec_F77= - fi - supports_anon_versioning=no - case `$LD -v 2>/dev/null` in - *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 - *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... - *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... - *\ 2.11.*) ;; # other 2.11 versions - *) supports_anon_versioning=yes ;; - esac - - # See if GNU ld supports shared libraries. - case $host_os in - aix3* | aix4* | aix5*) - # On AIX/PPC, the GNU linker is very broken - if test "$host_cpu" != ia64; then - ld_shlibs_F77=no - cat <&2 - -*** Warning: the GNU linker, at least up to release 2.9.1, is reported -*** to be unable to reliably create shared libraries on AIX. -*** Therefore, libtool is disabling shared libraries support. If you -*** really care for shared libraries, you may want to modify your PATH -*** so that a non-GNU linker is found, and then restart. - -EOF + hardcode_libdir_flag_spec_F77='${wl}-blibpath:$libdir:'"$aix_libpath" + # Warning - without using the other run time loading flags, + # -berok will link without error, but may produce a broken library. + no_undefined_flag_F77=' ${wl}-bernotok' + allow_undefined_flag_F77=' ${wl}-berok' + # Exported symbols can be pulled into shared objects from archives + whole_archive_flag_spec_F77='$convenience' + archive_cmds_need_lc_F77=yes + # This is similar to how AIX traditionally builds its shared libraries. + archive_expsym_cmds_F77="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' + fi fi ;; @@ -13918,713 +13832,368 @@ EOF archive_cmds_F77='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' hardcode_libdir_flag_spec_F77='-L$libdir' hardcode_minus_L_F77=yes - - # Samuel A. Falvo II reports - # that the semantics of dynamic libraries on AmigaOS, at least up - # to version 4, is to share data among multiple programs linked - # with the same dynamic library. Since this doesn't match the - # behavior of shared libraries on other platforms, we can't use - # them. + # see comment about different semantics on the GNU ld section ld_shlibs_F77=no ;; - beos*) - if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then - allow_undefined_flag_F77=unsupported - # Joseph Beckenbach says some releases of gcc - # support --undefined. This deserves some investigation. FIXME - archive_cmds_F77='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' - else - ld_shlibs_F77=no - fi + bsdi[45]*) + export_dynamic_flag_spec_F77=-rdynamic ;; cygwin* | mingw* | pw32*) - # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, F77) is actually meaningless, - # as there is no search path for DLLs. - hardcode_libdir_flag_spec_F77='-L$libdir' + # When not using gcc, we currently assume that we are using + # Microsoft Visual C++. + # hardcode_libdir_flag_spec is actually meaningless, as there is + # no search path for DLLs. + hardcode_libdir_flag_spec_F77=' ' allow_undefined_flag_F77=unsupported - always_export_symbols_F77=no + # Tell ltmain to make .lib files, not .a files. + libext=lib + # Tell ltmain to make .dll files, not .so files. + shrext_cmds=".dll" + # FIXME: Setting linknames here is a bad hack. + archive_cmds_F77='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames=' + # The linker will automatically build a .lib file if we build a DLL. + old_archive_From_new_cmds_F77='true' + # FIXME: Should let the user specify the lib program. + old_archive_cmds_F77='lib /OUT:$oldlib$oldobjs$old_deplibs' + fix_srcfile_path_F77='`cygpath -w "$srcfile"`' enable_shared_with_static_runtimes_F77=yes - export_symbols_cmds_F77='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' + ;; - if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then - archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' - # If the export-symbols file already is a .def file (1st line - # is EXPORTS), use it as is; otherwise, prepend... - archive_expsym_cmds_F77='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then - cp $export_symbols $output_objdir/$soname.def; - else - echo EXPORTS > $output_objdir/$soname.def; - cat $export_symbols >> $output_objdir/$soname.def; - fi~ - $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' - else - ld_shlibs_F77=no - fi + darwin* | rhapsody*) + case $host_os in + rhapsody* | darwin1.[012]) + allow_undefined_flag_F77='${wl}-undefined ${wl}suppress' + ;; + *) # Darwin 1.3 on + if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then + allow_undefined_flag_F77='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' + else + case ${MACOSX_DEPLOYMENT_TARGET} in + 10.[012]) + allow_undefined_flag_F77='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' + ;; + 10.*) + allow_undefined_flag_F77='${wl}-undefined ${wl}dynamic_lookup' + ;; + esac + fi + ;; + esac + archive_cmds_need_lc_F77=no + hardcode_direct_F77=no + hardcode_automatic_F77=yes + hardcode_shlibpath_var_F77=unsupported + whole_archive_flag_spec_F77='' + link_all_deplibs_F77=yes + if test "$GCC" = yes ; then + output_verbose_link_cmd='echo' + archive_cmds_F77='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' + module_cmds_F77='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds + archive_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + module_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + else + case $cc_basename in + xlc*) + output_verbose_link_cmd='echo' + archive_cmds_F77='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' + module_cmds_F77='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds + archive_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + module_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + ;; + *) + ld_shlibs_F77=no + ;; + esac + fi ;; - linux*) - if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then - tmp_addflag= - case $cc_basename,$host_cpu in - pgcc*) # Portland Group C compiler - whole_archive_flag_spec_F77= - ;; - pgf77* | pgf90* ) # Portland Group f77 and f90 compilers - whole_archive_flag_spec_F77= - tmp_addflag=' -fpic -Mnomain' ;; - ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 - tmp_addflag=' -i_dynamic' ;; - efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 - tmp_addflag=' -i_dynamic -nofor_main' ;; - ifc* | ifort*) # Intel Fortran compiler - tmp_addflag=' -nofor_main' ;; - esac - archive_cmds_F77='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + dgux*) + archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_libdir_flag_spec_F77='-L$libdir' + hardcode_shlibpath_var_F77=no + ;; - if test $supports_anon_versioning = yes; then - archive_expsym_cmds_F77='$echo "{ global:" > $output_objdir/$libname.ver~ - cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ - $echo "local: *; };" >> $output_objdir/$libname.ver~ - $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib' - fi - else - ld_shlibs_F77=no - fi + freebsd1*) + ld_shlibs_F77=no ;; - netbsd*) - if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then - archive_cmds_F77='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' - wlarc= - else - archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' - archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' - fi + # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor + # support. Future versions do this automatically, but an explicit c++rt0.o + # does not break anything, and helps significantly (at the cost of a little + # extra space). + freebsd2.2*) + archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' + hardcode_libdir_flag_spec_F77='-R$libdir' + hardcode_direct_F77=yes + hardcode_shlibpath_var_F77=no ;; - solaris* | sysv5*) - if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then - ld_shlibs_F77=no - cat <&2 - -*** Warning: The releases 2.8.* of the GNU linker cannot reliably -*** create shared libraries on Solaris systems. Therefore, libtool -*** is disabling shared libraries support. We urge you to upgrade GNU -*** binutils to release 2.9.1 or newer. Another option is to modify -*** your PATH or compiler configuration so that the native linker is -*** used, and then restart. - -EOF - elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then - archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' - archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' - else - ld_shlibs_F77=no - fi + # Unfortunately, older versions of FreeBSD 2 do not have this feature. + freebsd2*) + archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct_F77=yes + hardcode_minus_L_F77=yes + hardcode_shlibpath_var_F77=no ;; - sunos4*) - archive_cmds_F77='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' - wlarc= + # FreeBSD 3 and greater uses gcc -shared to do shared libraries. + freebsd* | kfreebsd*-gnu | dragonfly*) + archive_cmds_F77='$CC -shared -o $lib $libobjs $deplibs $compiler_flags' + hardcode_libdir_flag_spec_F77='-R$libdir' hardcode_direct_F77=yes hardcode_shlibpath_var_F77=no ;; - *) - if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then - archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' - archive_expsym_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + hpux9*) + if test "$GCC" = yes; then + archive_cmds_F77='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' else - ld_shlibs_F77=no + archive_cmds_F77='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' fi - ;; - esac + hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' + hardcode_libdir_separator_F77=: + hardcode_direct_F77=yes - if test "$ld_shlibs_F77" = no; then - runpath_var= - hardcode_libdir_flag_spec_F77= - export_dynamic_flag_spec_F77= - whole_archive_flag_spec_F77= - fi - else - # PORTME fill in a description of your system's linker (not GNU ld) - case $host_os in - aix3*) - allow_undefined_flag_F77=unsupported - always_export_symbols_F77=yes - archive_expsym_cmds_F77='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' - # Note: this linker hardcodes the directories in LIBPATH if there - # are no directories specified by -L. + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. hardcode_minus_L_F77=yes - if test "$GCC" = yes && test -z "$link_static_flag"; then - # Neither direct hardcoding nor static linking is supported with a - # broken collect2. - hardcode_direct_F77=unsupported + export_dynamic_flag_spec_F77='${wl}-E' + ;; + + hpux10*) + if test "$GCC" = yes -a "$with_gnu_ld" = no; then + archive_cmds_F77='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds_F77='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + fi + if test "$with_gnu_ld" = no; then + hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' + hardcode_libdir_separator_F77=: + + hardcode_direct_F77=yes + export_dynamic_flag_spec_F77='${wl}-E' + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L_F77=yes fi ;; - aix4* | aix5*) - if test "$host_cpu" = ia64; then - # On IA64, the linker does run time linking by default, so we don't - # have to do anything special. - aix_use_runtimelinking=no - exp_sym_flag='-Bexport' - no_entry_flag="" + hpux11*) + if test "$GCC" = yes -a "$with_gnu_ld" = no; then + case $host_cpu in + hppa*64*) + archive_cmds_F77='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + archive_cmds_F77='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + archive_cmds_F77='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac else - # If we're using GNU nm, then we don't want the "-C" option. - # -C means demangle to AIX nm, but means don't demangle with GNU nm - if $NM -V 2>&1 | grep 'GNU' > /dev/null; then - export_symbols_cmds_F77='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' - else - export_symbols_cmds_F77='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\$2 == "T") || (\$2 == "D") || (\$2 == "B")) && (substr(\$3,1,1) != ".")) { print \$3 } }'\'' | sort -u > $export_symbols' - fi - aix_use_runtimelinking=no + case $host_cpu in + hppa*64*) + archive_cmds_F77='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + archive_cmds_F77='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + archive_cmds_F77='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + fi + if test "$with_gnu_ld" = no; then + hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' + hardcode_libdir_separator_F77=: - # Test if we are trying to use run time linking or normal - # AIX style linking. If -brtl is somewhere in LDFLAGS, we - # need to do runtime linking. - case $host_os in aix4.[23]|aix4.[23].*|aix5*) - for ld_flag in $LDFLAGS; do - if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then - aix_use_runtimelinking=yes - break - fi - done + case $host_cpu in + hppa*64*|ia64*) + hardcode_libdir_flag_spec_ld_F77='+b $libdir' + hardcode_direct_F77=no + hardcode_shlibpath_var_F77=no + ;; + *) + hardcode_direct_F77=yes + export_dynamic_flag_spec_F77='${wl}-E' + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L_F77=yes + ;; esac + fi + ;; - exp_sym_flag='-bexport' - no_entry_flag='-bnoentry' + irix5* | irix6* | nonstopux*) + if test "$GCC" = yes; then + archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + else + archive_cmds_F77='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + hardcode_libdir_flag_spec_ld_F77='-rpath $libdir' fi + hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' + hardcode_libdir_separator_F77=: + link_all_deplibs_F77=yes + ;; - # When large executables or shared objects are built, AIX ld can - # have problems creating the table of contents. If linking a library - # or program results in "error TOC overflow" add -mminimal-toc to - # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not - # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. + netbsd*) + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out + else + archive_cmds_F77='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF + fi + hardcode_libdir_flag_spec_F77='-R$libdir' + hardcode_direct_F77=yes + hardcode_shlibpath_var_F77=no + ;; - archive_cmds_F77='' + newsos6) + archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_direct_F77=yes - hardcode_libdir_separator_F77=':' - link_all_deplibs_F77=yes + hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' + hardcode_libdir_separator_F77=: + hardcode_shlibpath_var_F77=no + ;; - if test "$GCC" = yes; then - case $host_os in aix4.[012]|aix4.[012].*) - # We only want to do this on AIX 4.2 and lower, the check - # below for broken collect2 doesn't work under 4.3+ - collect2name=`${CC} -print-prog-name=collect2` - if test -f "$collect2name" && \ - strings "$collect2name" | grep resolve_lib_name >/dev/null - then - # We have reworked collect2 - hardcode_direct_F77=yes - else - # We have old collect2 - hardcode_direct_F77=unsupported - # It fails to find uninstalled libraries when the uninstalled - # path is not listed in the libpath. Setting hardcode_minus_L - # to unsupported forces relinking - hardcode_minus_L_F77=yes - hardcode_libdir_flag_spec_F77='-L$libdir' - hardcode_libdir_separator_F77= - fi - esac - shared_flag='-shared' - if test "$aix_use_runtimelinking" = yes; then - shared_flag="$shared_flag "'${wl}-G' - fi + openbsd*) + hardcode_direct_F77=yes + hardcode_shlibpath_var_F77=no + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then + archive_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols' + hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' + export_dynamic_flag_spec_F77='${wl}-E' else - # not using gcc - if test "$host_cpu" = ia64; then - # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release - # chokes on -Wl,-G. The following line is correct: - shared_flag='-G' - else - if test "$aix_use_runtimelinking" = yes; then - shared_flag='${wl}-G' - else - shared_flag='${wl}-bM:SRE' - fi - fi + case $host_os in + openbsd[01].* | openbsd2.[0-7] | openbsd2.[0-7].*) + archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' + hardcode_libdir_flag_spec_F77='-R$libdir' + ;; + *) + archive_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' + ;; + esac fi + ;; - # It seems that -bexpall does not export symbols beginning with - # underscore (_), so it is better to generate a list of symbols to export. - always_export_symbols_F77=yes - if test "$aix_use_runtimelinking" = yes; then - # Warning - without using the other runtime loading flags (-brtl), - # -berok will link without error, but may produce a broken library. - allow_undefined_flag_F77='-berok' - # Determine the default libpath from the value encoded in an empty executable. - cat >conftest.$ac_ext <<_ACEOF - program main + os2*) + hardcode_libdir_flag_spec_F77='-L$libdir' + hardcode_minus_L_F77=yes + allow_undefined_flag_F77=unsupported + archive_cmds_F77='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def' + old_archive_From_new_cmds_F77='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def' + ;; - end -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_f77_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - -aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } -}'` -# Check for a 64-bit object if we didn't find anything. -if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } -}'`; fi -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi - - hardcode_libdir_flag_spec_F77='${wl}-blibpath:$libdir:'"$aix_libpath" - archive_expsym_cmds_F77="\$CC"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$no_entry_flag \${wl}$exp_sym_flag:\$export_symbols $shared_flag" - else - if test "$host_cpu" = ia64; then - hardcode_libdir_flag_spec_F77='${wl}-R $libdir:/usr/lib:/lib' - allow_undefined_flag_F77="-z nodefs" - archive_expsym_cmds_F77="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$no_entry_flag \${wl}$exp_sym_flag:\$export_symbols" - else - # Determine the default libpath from the value encoded in an empty executable. - cat >conftest.$ac_ext <<_ACEOF - program main - - end -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_f77_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - -aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } -}'` -# Check for a 64-bit object if we didn't find anything. -if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } -}'`; fi -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi - - hardcode_libdir_flag_spec_F77='${wl}-blibpath:$libdir:'"$aix_libpath" - # Warning - without using the other run time loading flags, - # -berok will link without error, but may produce a broken library. - no_undefined_flag_F77=' ${wl}-bernotok' - allow_undefined_flag_F77=' ${wl}-berok' - # -bexpall does not export symbols beginning with underscore (_) - always_export_symbols_F77=yes - # Exported symbols can be pulled into shared objects from archives - whole_archive_flag_spec_F77=' ' - archive_cmds_need_lc_F77=yes - # This is similar to how AIX traditionally builds it's shared libraries. - archive_expsym_cmds_F77="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}-bE:$export_symbols ${wl}-bnoentry${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' - fi + osf3*) + if test "$GCC" = yes; then + allow_undefined_flag_F77=' ${wl}-expect_unresolved ${wl}\*' + archive_cmds_F77='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + else + allow_undefined_flag_F77=' -expect_unresolved \*' + archive_cmds_F77='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' fi + hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' + hardcode_libdir_separator_F77=: ;; - amigaos*) - archive_cmds_F77='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' - hardcode_libdir_flag_spec_F77='-L$libdir' - hardcode_minus_L_F77=yes - # see comment about different semantics on the GNU ld section - ld_shlibs_F77=no - ;; - - bsdi[45]*) - export_dynamic_flag_spec_F77=-rdynamic - ;; + osf4* | osf5*) # as osf3* with the addition of -msym flag + if test "$GCC" = yes; then + allow_undefined_flag_F77=' ${wl}-expect_unresolved ${wl}\*' + archive_cmds_F77='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' + else + allow_undefined_flag_F77=' -expect_unresolved \*' + archive_cmds_F77='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + archive_expsym_cmds_F77='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~ + $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp' - cygwin* | mingw* | pw32*) - # When not using gcc, we currently assume that we are using - # Microsoft Visual C++. - # hardcode_libdir_flag_spec is actually meaningless, as there is - # no search path for DLLs. - hardcode_libdir_flag_spec_F77=' ' - allow_undefined_flag_F77=unsupported - # Tell ltmain to make .lib files, not .a files. - libext=lib - # Tell ltmain to make .dll files, not .so files. - shrext_cmds=".dll" - # FIXME: Setting linknames here is a bad hack. - archive_cmds_F77='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames=' - # The linker will automatically build a .lib file if we build a DLL. - old_archive_From_new_cmds_F77='true' - # FIXME: Should let the user specify the lib program. - old_archive_cmds_F77='lib /OUT:$oldlib$oldobjs$old_deplibs' - fix_srcfile_path_F77='`cygpath -w "$srcfile"`' - enable_shared_with_static_runtimes_F77=yes + # Both c and cxx compiler support -rpath directly + hardcode_libdir_flag_spec_F77='-rpath $libdir' + fi + hardcode_libdir_separator_F77=: ;; - darwin* | rhapsody*) - case "$host_os" in - rhapsody* | darwin1.[012]) - allow_undefined_flag_F77='${wl}-undefined ${wl}suppress' - ;; - *) # Darwin 1.3 on - if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then - allow_undefined_flag_F77='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' - else - case ${MACOSX_DEPLOYMENT_TARGET} in - 10.[012]) - allow_undefined_flag_F77='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' - ;; - 10.*) - allow_undefined_flag_F77='${wl}-undefined ${wl}dynamic_lookup' - ;; - esac - fi - ;; + solaris*) + no_undefined_flag_F77=' -z text' + if test "$GCC" = yes; then + wlarc='${wl}' + archive_cmds_F77='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp' + else + wlarc='' + archive_cmds_F77='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' + archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' + fi + hardcode_libdir_flag_spec_F77='-R$libdir' + hardcode_shlibpath_var_F77=no + case $host_os in + solaris2.[0-5] | solaris2.[0-5].*) ;; + *) + # The compiler driver will combine linker options so we + # cannot just pass the convience library names through + # without $wl, iff we do not link with $LD. + # Luckily, gcc supports the same syntax we need for Sun Studio. + # Supported since Solaris 2.6 (maybe 2.5.1?) + case $wlarc in + '') + whole_archive_flag_spec_F77='-z allextract$convenience -z defaultextract' ;; + *) + whole_archive_flag_spec_F77='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; + esac ;; esac - archive_cmds_need_lc_F77=no - hardcode_direct_F77=no - hardcode_automatic_F77=yes - hardcode_shlibpath_var_F77=unsupported - whole_archive_flag_spec_F77='' link_all_deplibs_F77=yes - if test "$GCC" = yes ; then - output_verbose_link_cmd='echo' - archive_cmds_F77='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' - module_cmds_F77='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's - archive_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' - module_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' - else - case $cc_basename in - xlc*) - output_verbose_link_cmd='echo' - archive_cmds_F77='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' - module_cmds_F77='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's - archive_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' - module_expsym_cmds_F77='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' - ;; - *) - ld_shlibs_F77=no - ;; - esac - fi ;; - dgux*) - archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + sunos4*) + if test "x$host_vendor" = xsequent; then + # Use $CC to link under sequent, because it throws in some extra .o + # files that make .init and .fini sections work. + archive_cmds_F77='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds_F77='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' + fi hardcode_libdir_flag_spec_F77='-L$libdir' + hardcode_direct_F77=yes + hardcode_minus_L_F77=yes hardcode_shlibpath_var_F77=no ;; - freebsd1*) - ld_shlibs_F77=no - ;; - - # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor - # support. Future versions do this automatically, but an explicit c++rt0.o - # does not break anything, and helps significantly (at the cost of a little - # extra space). - freebsd2.2*) - archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' - hardcode_libdir_flag_spec_F77='-R$libdir' - hardcode_direct_F77=yes + sysv4) + case $host_vendor in + sni) + archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct_F77=yes # is this really true??? + ;; + siemens) + ## LD is ld it makes a PLAMLIB + ## CC just makes a GrossModule. + archive_cmds_F77='$LD -G -o $lib $libobjs $deplibs $linker_flags' + reload_cmds_F77='$CC -r -o $output$reload_objs' + hardcode_direct_F77=no + ;; + motorola) + archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct_F77=no #Motorola manual says yes, but my tests say they lie + ;; + esac + runpath_var='LD_RUN_PATH' hardcode_shlibpath_var_F77=no ;; - # Unfortunately, older versions of FreeBSD 2 do not have this feature. - freebsd2*) - archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct_F77=yes - hardcode_minus_L_F77=yes + sysv4.3*) + archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' hardcode_shlibpath_var_F77=no - ;; - - # FreeBSD 3 and greater uses gcc -shared to do shared libraries. - freebsd* | kfreebsd*-gnu | dragonfly*) - archive_cmds_F77='$CC -shared -o $lib $libobjs $deplibs $compiler_flags' - hardcode_libdir_flag_spec_F77='-R$libdir' - hardcode_direct_F77=yes - hardcode_shlibpath_var_F77=no - ;; - - hpux9*) - if test "$GCC" = yes; then - archive_cmds_F77='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' - else - archive_cmds_F77='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' - fi - hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' - hardcode_libdir_separator_F77=: - hardcode_direct_F77=yes - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L_F77=yes - export_dynamic_flag_spec_F77='${wl}-E' - ;; - - hpux10* | hpux11*) - if test "$GCC" = yes -a "$with_gnu_ld" = no; then - case "$host_cpu" in - hppa*64*|ia64*) - archive_cmds_F77='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' - ;; - *) - archive_cmds_F77='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' - ;; - esac - else - case "$host_cpu" in - hppa*64*|ia64*) - archive_cmds_F77='$LD -b +h $soname -o $lib $libobjs $deplibs $linker_flags' - ;; - *) - archive_cmds_F77='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' - ;; - esac - fi - if test "$with_gnu_ld" = no; then - case "$host_cpu" in - hppa*64*) - hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' - hardcode_libdir_flag_spec_ld_F77='+b $libdir' - hardcode_libdir_separator_F77=: - hardcode_direct_F77=no - hardcode_shlibpath_var_F77=no - ;; - ia64*) - hardcode_libdir_flag_spec_F77='-L$libdir' - hardcode_direct_F77=no - hardcode_shlibpath_var_F77=no - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L_F77=yes - ;; - *) - hardcode_libdir_flag_spec_F77='${wl}+b ${wl}$libdir' - hardcode_libdir_separator_F77=: - hardcode_direct_F77=yes - export_dynamic_flag_spec_F77='${wl}-E' - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L_F77=yes - ;; - esac - fi - ;; - - irix5* | irix6* | nonstopux*) - if test "$GCC" = yes; then - archive_cmds_F77='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' - else - archive_cmds_F77='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' - hardcode_libdir_flag_spec_ld_F77='-rpath $libdir' - fi - hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' - hardcode_libdir_separator_F77=: - link_all_deplibs_F77=yes - ;; - - netbsd*) - if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then - archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out - else - archive_cmds_F77='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF - fi - hardcode_libdir_flag_spec_F77='-R$libdir' - hardcode_direct_F77=yes - hardcode_shlibpath_var_F77=no - ;; - - newsos6) - archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct_F77=yes - hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' - hardcode_libdir_separator_F77=: - hardcode_shlibpath_var_F77=no - ;; - - openbsd*) - hardcode_direct_F77=yes - hardcode_shlibpath_var_F77=no - if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then - archive_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols' - hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' - export_dynamic_flag_spec_F77='${wl}-E' - else - case $host_os in - openbsd[01].* | openbsd2.[0-7] | openbsd2.[0-7].*) - archive_cmds_F77='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' - hardcode_libdir_flag_spec_F77='-R$libdir' - ;; - *) - archive_cmds_F77='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' - hardcode_libdir_flag_spec_F77='${wl}-rpath,$libdir' - ;; - esac - fi - ;; - - os2*) - hardcode_libdir_flag_spec_F77='-L$libdir' - hardcode_minus_L_F77=yes - allow_undefined_flag_F77=unsupported - archive_cmds_F77='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def' - old_archive_From_new_cmds_F77='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def' - ;; - - osf3*) - if test "$GCC" = yes; then - allow_undefined_flag_F77=' ${wl}-expect_unresolved ${wl}\*' - archive_cmds_F77='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' - else - allow_undefined_flag_F77=' -expect_unresolved \*' - archive_cmds_F77='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' - fi - hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' - hardcode_libdir_separator_F77=: - ;; - - osf4* | osf5*) # as osf3* with the addition of -msym flag - if test "$GCC" = yes; then - allow_undefined_flag_F77=' ${wl}-expect_unresolved ${wl}\*' - archive_cmds_F77='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' - hardcode_libdir_flag_spec_F77='${wl}-rpath ${wl}$libdir' - else - allow_undefined_flag_F77=' -expect_unresolved \*' - archive_cmds_F77='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' - archive_expsym_cmds_F77='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~ - $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp' - - # Both c and cxx compiler support -rpath directly - hardcode_libdir_flag_spec_F77='-rpath $libdir' - fi - hardcode_libdir_separator_F77=: - ;; - - sco3.2v5*) - archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_shlibpath_var_F77=no - export_dynamic_flag_spec_F77='${wl}-Bexport' - runpath_var=LD_RUN_PATH - hardcode_runpath_var=yes - ;; - - solaris*) - no_undefined_flag_F77=' -z text' - if test "$GCC" = yes; then - wlarc='${wl}' - archive_cmds_F77='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ - $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp' - else - wlarc='' - archive_cmds_F77='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' - archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ - $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' - fi - hardcode_libdir_flag_spec_F77='-R$libdir' - hardcode_shlibpath_var_F77=no - case $host_os in - solaris2.[0-5] | solaris2.[0-5].*) ;; - *) - # The compiler driver will combine linker options so we - # cannot just pass the convience library names through - # without $wl, iff we do not link with $LD. - # Luckily, gcc supports the same syntax we need for Sun Studio. - # Supported since Solaris 2.6 (maybe 2.5.1?) - case $wlarc in - '') - whole_archive_flag_spec_F77='-z allextract$convenience -z defaultextract' ;; - *) - whole_archive_flag_spec_F77='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; - esac ;; - esac - link_all_deplibs_F77=yes - ;; - - sunos4*) - if test "x$host_vendor" = xsequent; then - # Use $CC to link under sequent, because it throws in some extra .o - # files that make .init and .fini sections work. - archive_cmds_F77='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags' - else - archive_cmds_F77='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' - fi - hardcode_libdir_flag_spec_F77='-L$libdir' - hardcode_direct_F77=yes - hardcode_minus_L_F77=yes - hardcode_shlibpath_var_F77=no - ;; - - sysv4) - case $host_vendor in - sni) - archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct_F77=yes # is this really true??? - ;; - siemens) - ## LD is ld it makes a PLAMLIB - ## CC just makes a GrossModule. - archive_cmds_F77='$LD -G -o $lib $libobjs $deplibs $linker_flags' - reload_cmds_F77='$CC -r -o $output$reload_objs' - hardcode_direct_F77=no - ;; - motorola) - archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct_F77=no #Motorola manual says yes, but my tests say they lie - ;; - esac - runpath_var='LD_RUN_PATH' - hardcode_shlibpath_var_F77=no - ;; - - sysv4.3*) - archive_cmds_F77='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_shlibpath_var_F77=no - export_dynamic_flag_spec_F77='-Bexport' + export_dynamic_flag_spec_F77='-Bexport' ;; sysv4*MP*) @@ -14637,36 +14206,45 @@ if test -z "$aix_libpath"; then aix_libp fi ;; - sysv4.2uw2*) - archive_cmds_F77='$LD -G -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct_F77=yes - hardcode_minus_L_F77=no + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7*) + no_undefined_flag_F77='${wl}-z,text' + archive_cmds_need_lc_F77=no hardcode_shlibpath_var_F77=no - hardcode_runpath_var=yes - runpath_var=LD_RUN_PATH - ;; + runpath_var='LD_RUN_PATH' - sysv5OpenUNIX8* | sysv5UnixWare7* | sysv5uw[78]* | unixware7*) - no_undefined_flag_F77='${wl}-z ${wl}text' if test "$GCC" = yes; then - archive_cmds_F77='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_cmds_F77='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_F77='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' else - archive_cmds_F77='$CC -G ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_cmds_F77='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_F77='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' fi - runpath_var='LD_RUN_PATH' - hardcode_shlibpath_var_F77=no ;; - sysv5*) - no_undefined_flag_F77=' -z text' - # $CC -shared without GNU ld will not create a library from C++ - # object files and a static libstdc++, better avoid it by now - archive_cmds_F77='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' - archive_expsym_cmds_F77='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ - $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' - hardcode_libdir_flag_spec_F77= + sysv5* | sco3.2v5* | sco5v6*) + # Note: We can NOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + no_undefined_flag_F77='${wl}-z,text' + allow_undefined_flag_F77='${wl}-z,nodefs' + archive_cmds_need_lc_F77=no hardcode_shlibpath_var_F77=no + hardcode_libdir_flag_spec_F77='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' + hardcode_libdir_separator_F77=':' + link_all_deplibs_F77=yes + export_dynamic_flag_spec_F77='${wl}-Bexport' runpath_var='LD_RUN_PATH' + + if test "$GCC" = yes; then + archive_cmds_F77='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_F77='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds_F77='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_F77='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + fi ;; uts4*) @@ -14685,11 +14263,6 @@ echo "$as_me:$LINENO: result: $ld_shlibs echo "${ECHO_T}$ld_shlibs_F77" >&6 test "$ld_shlibs_F77" = no && can_build_shared=no -variables_saved_for_relink="PATH $shlibpath_var $runpath_var" -if test "$GCC" = yes; then - variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" -fi - # # Do we need to explicitly link libc? # @@ -14722,6 +14295,7 @@ echo $ECHO_N "checking whether -lc shoul libobjs=conftest.$ac_objext deplibs= wl=$lt_prog_compiler_wl_F77 + pic_flag=$lt_prog_compiler_pic_F77 compiler_flags=-v linker_flags=-v verstring= @@ -14882,7 +14456,8 @@ cygwin* | mingw* | pw32*) dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname' + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname' postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $rm \$dlpath' @@ -14935,7 +14510,7 @@ darwin* | rhapsody*) soname_spec='${libname}${release}${major}$shared_ext' shlibpath_overrides_runpath=yes shlibpath_var=DYLD_LIBRARY_PATH - shrext_cmds='$(test .$module = .yes && echo .so || echo .dylib)' + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. if test "$GCC" = yes; then sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` @@ -14973,7 +14548,14 @@ kfreebsd*-gnu) freebsd* | dragonfly*) # DragonFly does not have aout. When/if they implement a new # versioning mechanism, adjust this. - objformat=`test -x /usr/bin/objformat && /usr/bin/objformat || echo aout` + if test -x /usr/bin/objformat; then + objformat=`/usr/bin/objformat` + else + case $host_os in + freebsd[123]*) objformat=aout ;; + *) objformat=elf ;; + esac + fi version_type=freebsd-$objformat case $version_type in freebsd-elf*) @@ -14995,10 +14577,15 @@ freebsd* | dragonfly*) shlibpath_overrides_runpath=yes hardcode_into_libs=yes ;; - *) # from 3.2 on + freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ + freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) shlibpath_overrides_runpath=no hardcode_into_libs=yes ;; + freebsd*) # from 4.6 on + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; esac ;; @@ -15018,7 +14605,7 @@ hpux9* | hpux10* | hpux11*) version_type=sunos need_lib_prefix=no need_version=no - case "$host_cpu" in + case $host_cpu in ia64*) shrext_cmds='.so' hardcode_into_libs=yes @@ -15058,6 +14645,18 @@ hpux9* | hpux10* | hpux11*) postinstall_cmds='chmod 555 $lib' ;; +interix3*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; + irix5* | irix6* | nonstopux*) case $host_os in nonstopux*) version_type=nonstopux ;; @@ -15115,31 +14714,10 @@ linux*) # before this can be enabled. hardcode_into_libs=yes - # find out which ABI we are using - libsuff= - case "$host_cpu" in - x86_64*|s390x*|powerpc64*) - echo '#line 15122 "configure"' > conftest.$ac_ext - if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; then - case `/usr/bin/file conftest.$ac_objext` in - *64-bit*) - libsuff=64 - sys_lib_search_path_spec="/lib${libsuff} /usr/lib${libsuff} /usr/local/lib${libsuff}" - ;; - esac - fi - rm -rf conftest* - ;; - esac - # Append ld.so.conf contents to the search path if test -f /etc/ld.so.conf; then - lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:,\t]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` - sys_lib_dlsearch_path_spec="/lib${libsuff} /usr/lib${libsuff} $lt_ld_extra" + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` + sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi # We used to test for /lib/ld.so.1 and disable shared libraries on @@ -15200,8 +14778,13 @@ nto-qnx*) openbsd*) version_type=sunos + sys_lib_dlsearch_path_spec="/usr/lib" need_lib_prefix=no - need_version=no + # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. + case $host_os in + openbsd3.3 | openbsd3.3.*) need_version=yes ;; + *) need_version=no ;; + esac library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' shlibpath_var=LD_LIBRARY_PATH @@ -15239,13 +14822,6 @@ osf3* | osf4* | osf5*) sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" ;; -sco3.2v5*) - version_type=osf - soname_spec='${libname}${release}${shared_ext}$major' - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - ;; - solaris*) version_type=linux need_lib_prefix=no @@ -15271,7 +14847,7 @@ sunos4*) need_version=yes ;; -sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) +sysv4 | sysv4.3*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' soname_spec='${libname}${release}${shared_ext}$major' @@ -15304,6 +14880,29 @@ sysv4*MP*) fi ;; +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + version_type=freebsd-elf + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + if test "$with_gnu_ld" = yes; then + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' + shlibpath_overrides_runpath=no + else + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' + shlibpath_overrides_runpath=yes + case $host_os in + sco3.2v5*) + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" + ;; + esac + fi + sys_lib_dlsearch_path_spec='/usr/lib' + ;; + uts4*) version_type=linux library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' @@ -15319,6 +14918,11 @@ echo "$as_me:$LINENO: result: $dynamic_l echo "${ECHO_T}$dynamic_linker" >&6 test "$dynamic_linker" = no && can_build_shared=no +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" +if test "$GCC" = yes; then + variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" +fi + echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 hardcode_action_F77= @@ -15356,36 +14960,6 @@ elif test "$shlibpath_overrides_runpath" enable_fast_install=needless fi -striplib= -old_striplib= -echo "$as_me:$LINENO: checking whether stripping libraries is possible" >&5 -echo $ECHO_N "checking whether stripping libraries is possible... $ECHO_C" >&6 -if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; then - test -z "$old_striplib" && old_striplib="$STRIP --strip-debug" - test -z "$striplib" && striplib="$STRIP --strip-unneeded" - echo "$as_me:$LINENO: result: yes" >&5 -echo "${ECHO_T}yes" >&6 -else -# FIXME - insert some real tests, host_os isn't really good enough - case $host_os in - darwin*) - if test -n "$STRIP" ; then - striplib="$STRIP -x" - echo "$as_me:$LINENO: result: yes" >&5 -echo "${ECHO_T}yes" >&6 - else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - ;; - *) - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 - ;; - esac -fi - - # The else clause should only fire when bootstrapping the # libtool distribution, otherwise you forgot to ship ltmain.sh @@ -15400,7 +14974,7 @@ if test -f "$ltmain"; then # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. - for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC NM \ + for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ @@ -15518,6 +15092,9 @@ AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC +# LTCC compiler flags. +LTCFLAGS=$lt_LTCFLAGS + # A language-specific compiler. CC=$lt_compiler_F77 @@ -15828,6 +15405,9 @@ lt_simple_link_test_code='public class c # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} + # Allow CC to be a program name with arguments. compiler=$CC @@ -15835,13 +15415,13 @@ compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* @@ -15889,20 +15469,20 @@ else # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:15895: $lt_compile\"" >&5) + (eval echo "\"\$as_me:15475: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 - echo "$as_me:15899: \$? = $ac_status" >&5 + echo "$as_me:15479: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. - $echo "X$_lt_compiler_boilerplate" | $Xsed >conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if test ! -s conftest.err || diff conftest.exp conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_cv_prog_compiler_rtti_exceptions=yes fi fi @@ -15963,6 +15543,11 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_pic_GCJ='-fno-common' ;; + interix3*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; + msdosdjgpp*) # Just because we use GCC doesn't mean we suddenly get shared libraries # on systems that don't support them. @@ -15979,7 +15564,7 @@ echo $ECHO_N "checking for $compiler opt hpux*) # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. - case "$host_cpu" in + case $host_cpu in hppa*64*|ia64*) # +Z the default ;; @@ -16026,7 +15611,7 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_wl_GCJ='-Wl,' # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but # not for PA HP-UX. - case "$host_cpu" in + case $host_cpu in hppa*64*|ia64*) # +Z the default ;; @@ -16056,12 +15641,12 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-static' ;; - pgcc* | pgf77* | pgf90*) + pgcc* | pgf77* | pgf90* | pgf95*) # Portland Group compilers (*not* the Pentium gcc compiler, # which looks to be a dead project) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_pic_GCJ='-fpic' - lt_prog_compiler_static_GCJ='-static' + lt_prog_compiler_static_GCJ='-Bstatic' ;; ccc*) lt_prog_compiler_wl_GCJ='-Wl,' @@ -16077,11 +15662,6 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_static_GCJ='-non_shared' ;; - sco3.2v5*) - lt_prog_compiler_pic_GCJ='-Kpic' - lt_prog_compiler_static_GCJ='-dn' - ;; - solaris*) lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-Bstatic' @@ -16099,7 +15679,7 @@ echo $ECHO_N "checking for $compiler opt lt_prog_compiler_static_GCJ='-Bstatic' ;; - sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) + sysv4 | sysv4.2uw2* | sysv4.3*) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_pic_GCJ='-KPIC' lt_prog_compiler_static_GCJ='-Bstatic' @@ -16112,6 +15692,12 @@ echo $ECHO_N "checking for $compiler opt fi ;; + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + lt_prog_compiler_wl_GCJ='-Wl,' + lt_prog_compiler_pic_GCJ='-KPIC' + lt_prog_compiler_static_GCJ='-Bstatic' + ;; + unicos*) lt_prog_compiler_wl_GCJ='-Wl,' lt_prog_compiler_can_build_shared_GCJ=no @@ -16151,20 +15737,20 @@ else # with a dollar sign (not a hyphen), so the echo should work correctly. # The option is referenced via a variable to avoid confusing sed. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:16157: $lt_compile\"" >&5) + (eval echo "\"\$as_me:15743: $lt_compile\"" >&5) (eval "$lt_compile" 2>conftest.err) ac_status=$? cat conftest.err >&5 - echo "$as_me:16161: \$? = $ac_status" >&5 + echo "$as_me:15747: \$? = $ac_status" >&5 if (exit $ac_status) && test -s "$ac_outfile"; then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings other than the usual output. - $echo "X$_lt_compiler_boilerplate" | $Xsed >conftest.exp - $SED '/^$/d' conftest.err >conftest.er2 - if test ! -s conftest.err || diff conftest.exp conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then lt_prog_compiler_pic_works_GCJ=yes fi fi @@ -16185,7 +15771,7 @@ else fi fi -case "$host_os" in +case $host_os in # For platforms which do not support PIC, -DPIC is meaningless: *djgpp*) lt_prog_compiler_pic_GCJ= @@ -16195,6 +15781,48 @@ case "$host_os" in ;; esac +# +# Check to make sure the static flag actually works. +# +wl=$lt_prog_compiler_wl_GCJ eval lt_tmp_static_flag=\"$lt_prog_compiler_static_GCJ\" +echo "$as_me:$LINENO: checking if $compiler static flag $lt_tmp_static_flag works" >&5 +echo $ECHO_N "checking if $compiler static flag $lt_tmp_static_flag works... $ECHO_C" >&6 +if test "${lt_prog_compiler_static_works_GCJ+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + lt_prog_compiler_static_works_GCJ=no + save_LDFLAGS="$LDFLAGS" + LDFLAGS="$LDFLAGS $lt_tmp_static_flag" + printf "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&5 + $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + lt_prog_compiler_static_works_GCJ=yes + fi + else + lt_prog_compiler_static_works_GCJ=yes + fi + fi + $rm conftest* + LDFLAGS="$save_LDFLAGS" + +fi +echo "$as_me:$LINENO: result: $lt_prog_compiler_static_works_GCJ" >&5 +echo "${ECHO_T}$lt_prog_compiler_static_works_GCJ" >&6 + +if test x"$lt_prog_compiler_static_works_GCJ" = xyes; then + : +else + lt_prog_compiler_static_GCJ= +fi + + echo "$as_me:$LINENO: checking if $compiler supports -c -o file.$ac_objext" >&5 echo $ECHO_N "checking if $compiler supports -c -o file.$ac_objext... $ECHO_C" >&6 if test "${lt_cv_prog_compiler_c_o_GCJ+set}" = set; then @@ -16213,25 +15841,25 @@ else # Note that $ac_compile itself does not contain backslashes and begins # with a dollar sign (not a hyphen), so the echo should work correctly. lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}? :&$lt_compiler_flag :; t' \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:16219: $lt_compile\"" >&5) + (eval echo "\"\$as_me:15847: $lt_compile\"" >&5) (eval "$lt_compile" 2>out/conftest.err) ac_status=$? cat out/conftest.err >&5 - echo "$as_me:16223: \$? = $ac_status" >&5 + echo "$as_me:15851: \$? = $ac_status" >&5 if (exit $ac_status) && test -s out/conftest2.$ac_objext then # The compiler can only warn and ignore the option if not recognized # So say no if there are warnings - $echo "X$_lt_compiler_boilerplate" | $Xsed > out/conftest.exp - $SED '/^$/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.err || diff out/conftest.exp out/conftest.er2 >/dev/null; then + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then lt_cv_prog_compiler_c_o_GCJ=yes fi fi - chmod u+w . + chmod u+w . 2>&5 $rm conftest* # SGI C++ compiler will create directory out/ii_files/ for # template instantiation @@ -16327,6 +15955,10 @@ cc_basename=`$echo "X$cc_temp" | $Xsed - with_gnu_ld=no fi ;; + interix*) + # we just hope/assume this is gcc and not c89 (= MSVC++) + with_gnu_ld=yes + ;; openbsd*) with_gnu_ld=no ;; @@ -16411,7 +16043,7 @@ EOF export_symbols_cmds_GCJ='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS] /s/.* \([^ ]*\)/\1 DATA/'\'' | $SED -e '\''/^[AITW] /s/.* //'\'' | sort | uniq > $export_symbols' if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then - archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' + archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' # If the export-symbols file already is a .def file (1st line # is EXPORTS), use it as is; otherwise, prepend... archive_expsym_cmds_GCJ='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then @@ -16420,22 +16052,38 @@ EOF echo EXPORTS > $output_objdir/$soname.def; cat $export_symbols >> $output_objdir/$soname.def; fi~ - $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--image-base=0x10000000 ${wl}--out-implib,$lib' + $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' else ld_shlibs_GCJ=no fi ;; + interix3*) + hardcode_direct_GCJ=no + hardcode_shlibpath_var_GCJ=no + hardcode_libdir_flag_spec_GCJ='${wl}-rpath,$libdir' + export_dynamic_flag_spec_GCJ='${wl}-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + archive_cmds_GCJ='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + archive_expsym_cmds_GCJ='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + ;; + linux*) if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then tmp_addflag= case $cc_basename,$host_cpu in pgcc*) # Portland Group C compiler - whole_archive_flag_spec_GCJ= + whole_archive_flag_spec_GCJ='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag' ;; - pgf77* | pgf90* ) # Portland Group f77 and f90 compilers - whole_archive_flag_spec_GCJ= - tmp_addflag=' -fpic -Mnomain' ;; + pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers + whole_archive_flag_spec_GCJ='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag -Mnomain' ;; ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 tmp_addflag=' -i_dynamic' ;; efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 @@ -16466,7 +16114,7 @@ EOF fi ;; - solaris* | sysv5*) + solaris*) if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then ld_shlibs_GCJ=no cat <&2 @@ -16487,6 +16135,33 @@ EOF fi ;; + sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) + case `$LD -v 2>&1` in + *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) + ld_shlibs_GCJ=no + cat <<_LT_EOF 1>&2 + +*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not +*** reliably create shared libraries on SCO systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.16.91.0.3 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. + +_LT_EOF + ;; + *) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + hardcode_libdir_flag_spec_GCJ='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' + archive_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' + archive_expsym_cmds_GCJ='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-file,$export_symbols -o $lib' + else + ld_shlibs_GCJ=no + fi + ;; + esac + ;; + sunos4*) archive_cmds_GCJ='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' wlarc= @@ -16520,7 +16195,7 @@ EOF # Note: this linker hardcodes the directories in LIBPATH if there # are no directories specified by -L. hardcode_minus_L_GCJ=yes - if test "$GCC" = yes && test -z "$link_static_flag"; then + if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then # Neither direct hardcoding nor static linking is supported with a # broken collect2. hardcode_direct_GCJ=unsupported @@ -16554,6 +16229,7 @@ EOF break fi done + ;; esac exp_sym_flag='-bexport' @@ -16591,6 +16267,7 @@ EOF hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_libdir_separator_GCJ= fi + ;; esac shared_flag='-shared' if test "$aix_use_runtimelinking" = yes; then @@ -16603,11 +16280,11 @@ EOF # chokes on -Wl,-G. The following line is correct: shared_flag='-G' else - if test "$aix_use_runtimelinking" = yes; then + if test "$aix_use_runtimelinking" = yes; then shared_flag='${wl}-G' else shared_flag='${wl}-bM:SRE' - fi + fi fi fi @@ -16672,12 +16349,12 @@ rm -f conftest.err conftest.$ac_objext \ if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi hardcode_libdir_flag_spec_GCJ='${wl}-blibpath:$libdir:'"$aix_libpath" - archive_expsym_cmds_GCJ="\$CC"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$no_entry_flag \${wl}$exp_sym_flag:\$export_symbols $shared_flag" + archive_expsym_cmds_GCJ="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" else if test "$host_cpu" = ia64; then hardcode_libdir_flag_spec_GCJ='${wl}-R $libdir:/usr/lib:/lib' allow_undefined_flag_GCJ="-z nodefs" - archive_expsym_cmds_GCJ="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$no_entry_flag \${wl}$exp_sym_flag:\$export_symbols" + archive_expsym_cmds_GCJ="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" else # Determine the default libpath from the value encoded in an empty executable. cat >conftest.$ac_ext <<_ACEOF @@ -16737,13 +16414,11 @@ if test -z "$aix_libpath"; then aix_libp # -berok will link without error, but may produce a broken library. no_undefined_flag_GCJ=' ${wl}-bernotok' allow_undefined_flag_GCJ=' ${wl}-berok' - # -bexpall does not export symbols beginning with underscore (_) - always_export_symbols_GCJ=yes # Exported symbols can be pulled into shared objects from archives - whole_archive_flag_spec_GCJ=' ' + whole_archive_flag_spec_GCJ='$convenience' archive_cmds_need_lc_GCJ=yes - # This is similar to how AIX traditionally builds it's shared libraries. - archive_expsym_cmds_GCJ="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs $compiler_flags ${wl}-bE:$export_symbols ${wl}-bnoentry${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' + # This is similar to how AIX traditionally builds its shared libraries. + archive_expsym_cmds_GCJ="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' fi fi ;; @@ -16782,7 +16457,7 @@ if test -z "$aix_libpath"; then aix_libp ;; darwin* | rhapsody*) - case "$host_os" in + case $host_os in rhapsody* | darwin1.[012]) allow_undefined_flag_GCJ='${wl}-undefined ${wl}suppress' ;; @@ -16811,7 +16486,7 @@ if test -z "$aix_libpath"; then aix_libp output_verbose_link_cmd='echo' archive_cmds_GCJ='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' module_cmds_GCJ='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' else @@ -16820,7 +16495,7 @@ if test -z "$aix_libpath"; then aix_libp output_verbose_link_cmd='echo' archive_cmds_GCJ='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' module_cmds_GCJ='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' - # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin ld's + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds archive_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' module_expsym_cmds_GCJ='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' ;; @@ -16884,47 +16559,62 @@ if test -z "$aix_libpath"; then aix_libp export_dynamic_flag_spec_GCJ='${wl}-E' ;; - hpux10* | hpux11*) + hpux10*) if test "$GCC" = yes -a "$with_gnu_ld" = no; then - case "$host_cpu" in - hppa*64*|ia64*) + archive_cmds_GCJ='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds_GCJ='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + fi + if test "$with_gnu_ld" = no; then + hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' + hardcode_libdir_separator_GCJ=: + + hardcode_direct_GCJ=yes + export_dynamic_flag_spec_GCJ='${wl}-E' + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L_GCJ=yes + fi + ;; + + hpux11*) + if test "$GCC" = yes -a "$with_gnu_ld" = no; then + case $host_cpu in + hppa*64*) archive_cmds_GCJ='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' ;; + ia64*) + archive_cmds_GCJ='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; *) archive_cmds_GCJ='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac else - case "$host_cpu" in - hppa*64*|ia64*) - archive_cmds_GCJ='$LD -b +h $soname -o $lib $libobjs $deplibs $linker_flags' + case $host_cpu in + hppa*64*) + archive_cmds_GCJ='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + archive_cmds_GCJ='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) - archive_cmds_GCJ='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + archive_cmds_GCJ='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac fi if test "$with_gnu_ld" = no; then - case "$host_cpu" in - hppa*64*) - hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' + hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' + hardcode_libdir_separator_GCJ=: + + case $host_cpu in + hppa*64*|ia64*) hardcode_libdir_flag_spec_ld_GCJ='+b $libdir' - hardcode_libdir_separator_GCJ=: - hardcode_direct_GCJ=no - hardcode_shlibpath_var_GCJ=no - ;; - ia64*) - hardcode_libdir_flag_spec_GCJ='-L$libdir' hardcode_direct_GCJ=no hardcode_shlibpath_var_GCJ=no - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L_GCJ=yes ;; *) - hardcode_libdir_flag_spec_GCJ='${wl}+b ${wl}$libdir' - hardcode_libdir_separator_GCJ=: hardcode_direct_GCJ=yes export_dynamic_flag_spec_GCJ='${wl}-E' @@ -17026,14 +16716,6 @@ if test -z "$aix_libpath"; then aix_libp hardcode_libdir_separator_GCJ=: ;; - sco3.2v5*) - archive_cmds_GCJ='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_shlibpath_var_GCJ=no - export_dynamic_flag_spec_GCJ='${wl}-Bexport' - runpath_var=LD_RUN_PATH - hardcode_runpath_var=yes - ;; - solaris*) no_undefined_flag_GCJ=' -z text' if test "$GCC" = yes; then @@ -17119,36 +16801,45 @@ if test -z "$aix_libpath"; then aix_libp fi ;; - sysv4.2uw2*) - archive_cmds_GCJ='$LD -G -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct_GCJ=yes - hardcode_minus_L_GCJ=no + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7*) + no_undefined_flag_GCJ='${wl}-z,text' + archive_cmds_need_lc_GCJ=no hardcode_shlibpath_var_GCJ=no - hardcode_runpath_var=yes - runpath_var=LD_RUN_PATH - ;; + runpath_var='LD_RUN_PATH' - sysv5OpenUNIX8* | sysv5UnixWare7* | sysv5uw[78]* | unixware7*) - no_undefined_flag_GCJ='${wl}-z ${wl}text' if test "$GCC" = yes; then - archive_cmds_GCJ='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_cmds_GCJ='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_GCJ='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' else - archive_cmds_GCJ='$CC -G ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_cmds_GCJ='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_GCJ='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' fi - runpath_var='LD_RUN_PATH' - hardcode_shlibpath_var_GCJ=no ;; - sysv5*) - no_undefined_flag_GCJ=' -z text' - # $CC -shared without GNU ld will not create a library from C++ - # object files and a static libstdc++, better avoid it by now - archive_cmds_GCJ='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' - archive_expsym_cmds_GCJ='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ - $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' - hardcode_libdir_flag_spec_GCJ= + sysv5* | sco3.2v5* | sco5v6*) + # Note: We can NOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + no_undefined_flag_GCJ='${wl}-z,text' + allow_undefined_flag_GCJ='${wl}-z,nodefs' + archive_cmds_need_lc_GCJ=no hardcode_shlibpath_var_GCJ=no + hardcode_libdir_flag_spec_GCJ='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' + hardcode_libdir_separator_GCJ=':' + link_all_deplibs_GCJ=yes + export_dynamic_flag_spec_GCJ='${wl}-Bexport' runpath_var='LD_RUN_PATH' + + if test "$GCC" = yes; then + archive_cmds_GCJ='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_GCJ='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds_GCJ='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds_GCJ='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + fi ;; uts4*) @@ -17165,1512 +16856,703 @@ if test -z "$aix_libpath"; then aix_libp echo "$as_me:$LINENO: result: $ld_shlibs_GCJ" >&5 echo "${ECHO_T}$ld_shlibs_GCJ" >&6 -test "$ld_shlibs_GCJ" = no && can_build_shared=no - -variables_saved_for_relink="PATH $shlibpath_var $runpath_var" -if test "$GCC" = yes; then - variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" -fi - -# -# Do we need to explicitly link libc? -# -case "x$archive_cmds_need_lc_GCJ" in -x|xyes) - # Assume -lc should be added - archive_cmds_need_lc_GCJ=yes - - if test "$enable_shared" = yes && test "$GCC" = yes; then - case $archive_cmds_GCJ in - *'~'*) - # FIXME: we may have to deal with multi-command sequences. - ;; - '$CC '*) - # Test whether the compiler implicitly links with -lc since on some - # systems, -lgcc has to come before -lc. If gcc already passes -lc - # to ld, don't add -lc before -lgcc. - echo "$as_me:$LINENO: checking whether -lc should be explicitly linked in" >&5 -echo $ECHO_N "checking whether -lc should be explicitly linked in... $ECHO_C" >&6 - $rm conftest* - printf "$lt_simple_compile_test_code" > conftest.$ac_ext - - if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } 2>conftest.err; then - soname=conftest - lib=conftest - libobjs=conftest.$ac_objext - deplibs= - wl=$lt_prog_compiler_wl_GCJ - compiler_flags=-v - linker_flags=-v - verstring= - output_objdir=. - libname=conftest - lt_save_allow_undefined_flag=$allow_undefined_flag_GCJ - allow_undefined_flag_GCJ= - if { (eval echo "$as_me:$LINENO: \"$archive_cmds_GCJ 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1\"") >&5 - (eval $archive_cmds_GCJ 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } - then - archive_cmds_need_lc_GCJ=no - else - archive_cmds_need_lc_GCJ=yes - fi - allow_undefined_flag_GCJ=$lt_save_allow_undefined_flag - else - cat conftest.err 1>&5 - fi - $rm conftest* - echo "$as_me:$LINENO: result: $archive_cmds_need_lc_GCJ" >&5 -echo "${ECHO_T}$archive_cmds_need_lc_GCJ" >&6 - ;; - esac - fi - ;; -esac - -echo "$as_me:$LINENO: checking dynamic linker characteristics" >&5 -echo $ECHO_N "checking dynamic linker characteristics... $ECHO_C" >&6 -library_names_spec= -libname_spec='lib$name' -soname_spec= -shrext_cmds=".so" -postinstall_cmds= -postuninstall_cmds= -finish_cmds= -finish_eval= -shlibpath_var= -shlibpath_overrides_runpath=unknown -version_type=none -dynamic_linker="$host_os ld.so" -sys_lib_dlsearch_path_spec="/lib /usr/lib" -if test "$GCC" = yes; then - sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` - if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then - # if the path contains ";" then we assume it to be the separator - # otherwise default to the standard path separator (i.e. ":") - it is - # assumed that no part of a normal pathname contains ";" but that should - # okay in the real world where ";" in dirpaths is itself problematic. - sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` - else - sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` - fi -else - sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" -fi -need_lib_prefix=unknown -hardcode_into_libs=no - -# when you set need_version to no, make sure it does not cause -set_version -# flags to be left without arguments -need_version=unknown - -case $host_os in -aix3*) - version_type=linux - library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' - shlibpath_var=LIBPATH - - # AIX 3 has no versioning support, so we append a major version to the name. - soname_spec='${libname}${release}${shared_ext}$major' - ;; - -aix4* | aix5*) - version_type=linux - need_lib_prefix=no - need_version=no - hardcode_into_libs=yes - if test "$host_cpu" = ia64; then - # AIX 5 supports IA64 - library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - else - # With GCC up to 2.95.x, collect2 would create an import file - # for dependence libraries. The import file would start with - # the line `#! .'. This would cause the generated library to - # depend on `.', always an invalid library. This was fixed in - # development snapshots of GCC prior to 3.0. - case $host_os in - aix4 | aix4.[01] | aix4.[01].*) - if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' - echo ' yes ' - echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then - : - else - can_build_shared=no - fi - ;; - esac - # AIX (on Power*) has no versioning support, so currently we can not hardcode correct - # soname into executable. Probably we can add versioning support to - # collect2, so additional links can be useful in future. - if test "$aix_use_runtimelinking" = yes; then - # If using run time linking (on AIX 4.2 or later) use lib.so - # instead of lib.a to let people know that these are not - # typical AIX shared libraries. - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - else - # We preserve .a as extension for shared libraries through AIX4.2 - # and later when we are not doing run time linking. - library_names_spec='${libname}${release}.a $libname.a' - soname_spec='${libname}${release}${shared_ext}$major' - fi - shlibpath_var=LIBPATH - fi - ;; - -amigaos*) - library_names_spec='$libname.ixlibrary $libname.a' - # Create ${libname}_ixlibrary.a entries in /sys/libs. - finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' - ;; - -beos*) - library_names_spec='${libname}${shared_ext}' - dynamic_linker="$host_os ld.so" - shlibpath_var=LIBRARY_PATH - ;; - -bsdi[45]*) - version_type=linux - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' - shlibpath_var=LD_LIBRARY_PATH - sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" - sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" - # the default ld.so.conf also contains /usr/contrib/lib and - # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow - # libtool to hard-code these into programs - ;; - -cygwin* | mingw* | pw32*) - version_type=windows - shrext_cmds=".dll" - need_version=no - need_lib_prefix=no - - case $GCC,$host_os in - yes,cygwin* | yes,mingw* | yes,pw32*) - library_names_spec='$libname.dll.a' - # DLL is installed to $(libdir)/../bin by postinstall_cmds - postinstall_cmds='base_file=`basename \${file}`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $rm \$dlpath' - shlibpath_overrides_runpath=yes - - case $host_os in - cygwin*) - # Cygwin DLLs use 'cyg' prefix rather than 'lib' - soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' - sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" - ;; - mingw*) - # MinGW DLLs use traditional 'lib' prefix - soname_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' - sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` - if echo "$sys_lib_search_path_spec" | grep ';[c-zC-Z]:/' >/dev/null; then - # It is most probably a Windows format PATH printed by - # mingw gcc, but we are running on Cygwin. Gcc prints its search - # path with ; separators, and with drive letters. We can handle the - # drive letters (cygwin fileutils understands them), so leave them, - # especially as we might pass files found there to a mingw objdump, - # which wouldn't understand a cygwinified path. Ahh. - sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` - else - sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` - fi - ;; - pw32*) - # pw32 DLLs use 'pw' prefix rather than 'lib' - library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' - ;; - esac - ;; - - *) - library_names_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext} $libname.lib' - ;; - esac - dynamic_linker='Win32 ld.exe' - # FIXME: first we should search . and the directory the executable is in - shlibpath_var=PATH - ;; - -darwin* | rhapsody*) - dynamic_linker="$host_os dyld" - version_type=darwin - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' - soname_spec='${libname}${release}${major}$shared_ext' - shlibpath_overrides_runpath=yes - shlibpath_var=DYLD_LIBRARY_PATH - shrext_cmds='$(test .$module = .yes && echo .so || echo .dylib)' - # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. - if test "$GCC" = yes; then - sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` - else - sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' - fi - sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' - ;; - -dgux*) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - ;; - -freebsd1*) - dynamic_linker=no - ;; - -kfreebsd*-gnu) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - dynamic_linker='GNU ld.so' - ;; - -freebsd* | dragonfly*) - # DragonFly does not have aout. When/if they implement a new - # versioning mechanism, adjust this. - objformat=`test -x /usr/bin/objformat && /usr/bin/objformat || echo aout` - version_type=freebsd-$objformat - case $version_type in - freebsd-elf*) - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' - need_version=no - need_lib_prefix=no - ;; - freebsd-*) - library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix' - need_version=yes - ;; - esac - shlibpath_var=LD_LIBRARY_PATH - case $host_os in - freebsd2*) - shlibpath_overrides_runpath=yes - ;; - freebsd3.[01]* | freebsdelf3.[01]*) - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; - *) # from 3.2 on - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - ;; - esac - ;; - -gnu*) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - hardcode_into_libs=yes - ;; - -hpux9* | hpux10* | hpux11*) - # Give a soname corresponding to the major version so that dld.sl refuses to - # link against other versions. - version_type=sunos - need_lib_prefix=no - need_version=no - case "$host_cpu" in - ia64*) - shrext_cmds='.so' - hardcode_into_libs=yes - dynamic_linker="$host_os dld.so" - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - if test "X$HPUX_IA64_MODE" = X32; then - sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" - else - sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" - fi - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - ;; - hppa*64*) - shrext_cmds='.sl' - hardcode_into_libs=yes - dynamic_linker="$host_os dld.sl" - shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH - shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - ;; - *) - shrext_cmds='.sl' - dynamic_linker="$host_os dld.sl" - shlibpath_var=SHLIB_PATH - shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - ;; - esac - # HP-UX runs *really* slowly unless shared libraries are mode 555. - postinstall_cmds='chmod 555 $lib' - ;; - -irix5* | irix6* | nonstopux*) - case $host_os in - nonstopux*) version_type=nonstopux ;; - *) - if test "$lt_cv_prog_gnu_ld" = yes; then - version_type=linux - else - version_type=irix - fi ;; - esac - need_lib_prefix=no - need_version=no - soname_spec='${libname}${release}${shared_ext}$major' - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' - case $host_os in - irix5* | nonstopux*) - libsuff= shlibsuff= - ;; - *) - case $LD in # libtool.m4 will add one of these switches to LD - *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") - libsuff= shlibsuff= libmagic=32-bit;; - *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") - libsuff=32 shlibsuff=N32 libmagic=N32;; - *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") - libsuff=64 shlibsuff=64 libmagic=64-bit;; - *) libsuff= shlibsuff= libmagic=never-match;; - esac - ;; - esac - shlibpath_var=LD_LIBRARY${shlibsuff}_PATH - shlibpath_overrides_runpath=no - sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" - sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" - hardcode_into_libs=yes - ;; - -# No shared lib support for Linux oldld, aout, or coff. -linux*oldld* | linux*aout* | linux*coff*) - dynamic_linker=no - ;; - -# This must be Linux ELF. -linux*) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - # This implies no fast_install, which is unacceptable. - # Some rework will be needed to allow for fast_install - # before this can be enabled. - hardcode_into_libs=yes - - # find out which ABI we are using - libsuff= - case "$host_cpu" in - x86_64*|s390x*|powerpc64*) - echo '#line 17604 "configure"' > conftest.$ac_ext - if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; then - case `/usr/bin/file conftest.$ac_objext` in - *64-bit*) - libsuff=64 - sys_lib_search_path_spec="/lib${libsuff} /usr/lib${libsuff} /usr/local/lib${libsuff}" - ;; - esac - fi - rm -rf conftest* - ;; - esac - - # Append ld.so.conf contents to the search path - if test -f /etc/ld.so.conf; then - lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:,\t]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` - sys_lib_dlsearch_path_spec="/lib${libsuff} /usr/lib${libsuff} $lt_ld_extra" - fi - - # We used to test for /lib/ld.so.1 and disable shared libraries on - # powerpc, because MkLinux only supported shared libraries with the - # GNU dynamic linker. Since this was broken with cross compilers, - # most powerpc-linux boxes support dynamic linking these days and - # people can always --disable-shared, the test was removed, and we - # assume the GNU/Linux dynamic linker is in use. - dynamic_linker='GNU/Linux ld.so' - ;; - -knetbsd*-gnu) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - dynamic_linker='GNU ld.so' - ;; - -netbsd*) - version_type=sunos - need_lib_prefix=no - need_version=no - if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' - dynamic_linker='NetBSD (a.out) ld.so' - else - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - dynamic_linker='NetBSD ld.elf_so' - fi - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; - -newsos6) - version_type=linux - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - ;; - -nto-qnx*) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - ;; - -openbsd*) - version_type=sunos - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' - shlibpath_var=LD_LIBRARY_PATH - if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then - case $host_os in - openbsd2.[89] | openbsd2.[89].*) - shlibpath_overrides_runpath=no - ;; - *) - shlibpath_overrides_runpath=yes - ;; - esac - else - shlibpath_overrides_runpath=yes - fi - ;; - -os2*) - libname_spec='$name' - shrext_cmds=".dll" - need_lib_prefix=no - library_names_spec='$libname${shared_ext} $libname.a' - dynamic_linker='OS/2 ld.exe' - shlibpath_var=LIBPATH - ;; - -osf3* | osf4* | osf5*) - version_type=osf - need_lib_prefix=no - need_version=no - soname_spec='${libname}${release}${shared_ext}$major' - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" - sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" - ;; - -sco3.2v5*) - version_type=osf - soname_spec='${libname}${release}${shared_ext}$major' - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - shlibpath_var=LD_LIBRARY_PATH - ;; - -solaris*) - version_type=linux - need_lib_prefix=no - need_version=no - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - # ldd complains unless libraries are executable - postinstall_cmds='chmod +x $lib' - ;; - -sunos4*) - version_type=sunos - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' - finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - if test "$with_gnu_ld" = yes; then - need_lib_prefix=no - fi - need_version=yes - ;; - -sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) - version_type=linux - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - case $host_vendor in - sni) - shlibpath_overrides_runpath=no - need_lib_prefix=no - export_dynamic_flag_spec='${wl}-Blargedynsym' - runpath_var=LD_RUN_PATH - ;; - siemens) - need_lib_prefix=no - ;; - motorola) - need_lib_prefix=no - need_version=no - shlibpath_overrides_runpath=no - sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' - ;; - esac - ;; - -sysv4*MP*) - if test -d /usr/nec ;then - version_type=linux - library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' - soname_spec='$libname${shared_ext}.$major' - shlibpath_var=LD_LIBRARY_PATH - fi - ;; - -uts4*) - version_type=linux - library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' - soname_spec='${libname}${release}${shared_ext}$major' - shlibpath_var=LD_LIBRARY_PATH - ;; - -*) - dynamic_linker=no - ;; -esac -echo "$as_me:$LINENO: result: $dynamic_linker" >&5 -echo "${ECHO_T}$dynamic_linker" >&6 -test "$dynamic_linker" = no && can_build_shared=no - -echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 -echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 -hardcode_action_GCJ= -if test -n "$hardcode_libdir_flag_spec_GCJ" || \ - test -n "$runpath_var_GCJ" || \ - test "X$hardcode_automatic_GCJ" = "Xyes" ; then - - # We can hardcode non-existant directories. - if test "$hardcode_direct_GCJ" != no && - # If the only mechanism to avoid hardcoding is shlibpath_var, we - # have to relink, otherwise we might link with an installed library - # when we should be linking with a yet-to-be-installed one - ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, GCJ)" != no && - test "$hardcode_minus_L_GCJ" != no; then - # Linking always hardcodes the temporary library directory. - hardcode_action_GCJ=relink - else - # We can link without hardcoding, and we can hardcode nonexisting dirs. - hardcode_action_GCJ=immediate - fi -else - # We cannot hardcode anything, or else we can only hardcode existing - # directories. - hardcode_action_GCJ=unsupported -fi -echo "$as_me:$LINENO: result: $hardcode_action_GCJ" >&5 -echo "${ECHO_T}$hardcode_action_GCJ" >&6 - -if test "$hardcode_action_GCJ" = relink; then - # Fast installation is not supported - enable_fast_install=no -elif test "$shlibpath_overrides_runpath" = yes || - test "$enable_shared" = no; then - # Fast installation is not necessary - enable_fast_install=needless -fi - -striplib= -old_striplib= -echo "$as_me:$LINENO: checking whether stripping libraries is possible" >&5 -echo $ECHO_N "checking whether stripping libraries is possible... $ECHO_C" >&6 -if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; then - test -z "$old_striplib" && old_striplib="$STRIP --strip-debug" - test -z "$striplib" && striplib="$STRIP --strip-unneeded" - echo "$as_me:$LINENO: result: yes" >&5 -echo "${ECHO_T}yes" >&6 -else -# FIXME - insert some real tests, host_os isn't really good enough - case $host_os in - darwin*) - if test -n "$STRIP" ; then - striplib="$STRIP -x" - echo "$as_me:$LINENO: result: yes" >&5 -echo "${ECHO_T}yes" >&6 - else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - ;; - *) - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 - ;; - esac -fi - -if test "x$enable_dlopen" != xyes; then - enable_dlopen=unknown - enable_dlopen_self=unknown - enable_dlopen_self_static=unknown -else - lt_cv_dlopen=no - lt_cv_dlopen_libs= - - case $host_os in - beos*) - lt_cv_dlopen="load_add_on" - lt_cv_dlopen_libs= - lt_cv_dlopen_self=yes - ;; - - mingw* | pw32*) - lt_cv_dlopen="LoadLibrary" - lt_cv_dlopen_libs= - ;; - - cygwin*) - lt_cv_dlopen="dlopen" - lt_cv_dlopen_libs= - ;; - - darwin*) - # if libdl is installed we need to link against it - echo "$as_me:$LINENO: checking for dlopen in -ldl" >&5 -echo $ECHO_N "checking for dlopen in -ldl... $ECHO_C" >&6 -if test "${ac_cv_lib_dl_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldl $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -int -main () -{ -dlopen (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dl_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_lib_dl_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dl_dlopen" >&5 -echo "${ECHO_T}$ac_cv_lib_dl_dlopen" >&6 -if test $ac_cv_lib_dl_dlopen = yes; then - lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl" -else - - lt_cv_dlopen="dyld" - lt_cv_dlopen_libs= - lt_cv_dlopen_self=yes - -fi - - ;; - - *) - echo "$as_me:$LINENO: checking for shl_load" >&5 -echo $ECHO_N "checking for shl_load... $ECHO_C" >&6 -if test "${ac_cv_func_shl_load+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ -/* Define shl_load to an innocuous variant, in case declares shl_load. - For example, HP-UX 11i declares gettimeofday. */ -#define shl_load innocuous_shl_load - -/* System header to define __stub macros and hopefully few prototypes, - which can conflict with char shl_load (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ - -#ifdef __STDC__ -# include -#else -# include -#endif - -#undef shl_load - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -{ -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char shl_load (); -/* The GNU C library defines this for functions which it implements - to always fail with ENOSYS. Some functions are actually named - something starting with __ and the normal name is an alias. */ -#if defined (__stub_shl_load) || defined (__stub___shl_load) -choke me -#else -char (*f) () = shl_load; -#endif -#ifdef __cplusplus -} -#endif - -int -main () -{ -return f != shl_load; - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_func_shl_load=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_func_shl_load=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -fi -echo "$as_me:$LINENO: result: $ac_cv_func_shl_load" >&5 -echo "${ECHO_T}$ac_cv_func_shl_load" >&6 -if test $ac_cv_func_shl_load = yes; then - lt_cv_dlopen="shl_load" -else - echo "$as_me:$LINENO: checking for shl_load in -ldld" >&5 -echo $ECHO_N "checking for shl_load in -ldld... $ECHO_C" >&6 -if test "${ac_cv_lib_dld_shl_load+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldld $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char shl_load (); -int -main () -{ -shl_load (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dld_shl_load=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_lib_dld_shl_load=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dld_shl_load" >&5 -echo "${ECHO_T}$ac_cv_lib_dld_shl_load" >&6 -if test $ac_cv_lib_dld_shl_load = yes; then - lt_cv_dlopen="shl_load" lt_cv_dlopen_libs="-dld" -else - echo "$as_me:$LINENO: checking for dlopen" >&5 -echo $ECHO_N "checking for dlopen... $ECHO_C" >&6 -if test "${ac_cv_func_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ -/* Define dlopen to an innocuous variant, in case declares dlopen. - For example, HP-UX 11i declares gettimeofday. */ -#define dlopen innocuous_dlopen - -/* System header to define __stub macros and hopefully few prototypes, - which can conflict with char dlopen (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ - -#ifdef __STDC__ -# include -#else -# include -#endif - -#undef dlopen - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -{ -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -/* The GNU C library defines this for functions which it implements - to always fail with ENOSYS. Some functions are actually named - something starting with __ and the normal name is an alias. */ -#if defined (__stub_dlopen) || defined (__stub___dlopen) -choke me -#else -char (*f) () = dlopen; -#endif -#ifdef __cplusplus -} -#endif - -int -main () -{ -return f != dlopen; - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_func_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_func_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -fi -echo "$as_me:$LINENO: result: $ac_cv_func_dlopen" >&5 -echo "${ECHO_T}$ac_cv_func_dlopen" >&6 -if test $ac_cv_func_dlopen = yes; then - lt_cv_dlopen="dlopen" -else - echo "$as_me:$LINENO: checking for dlopen in -ldl" >&5 -echo $ECHO_N "checking for dlopen in -ldl... $ECHO_C" >&6 -if test "${ac_cv_lib_dl_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldl $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -int -main () -{ -dlopen (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dl_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_lib_dl_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dl_dlopen" >&5 -echo "${ECHO_T}$ac_cv_lib_dl_dlopen" >&6 -if test $ac_cv_lib_dl_dlopen = yes; then - lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl" -else - echo "$as_me:$LINENO: checking for dlopen in -lsvld" >&5 -echo $ECHO_N "checking for dlopen in -lsvld... $ECHO_C" >&6 -if test "${ac_cv_lib_svld_dlopen+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsvld $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dlopen (); -int -main () -{ -dlopen (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_svld_dlopen=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_lib_svld_dlopen=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_svld_dlopen" >&5 -echo "${ECHO_T}$ac_cv_lib_svld_dlopen" >&6 -if test $ac_cv_lib_svld_dlopen = yes; then - lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-lsvld" -else - echo "$as_me:$LINENO: checking for dld_link in -ldld" >&5 -echo $ECHO_N "checking for dld_link in -ldld... $ECHO_C" >&6 -if test "${ac_cv_lib_dld_dld_link+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldld $LIBS" -cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ +test "$ld_shlibs_GCJ" = no && can_build_shared=no -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char dld_link (); -int -main () -{ -dld_link (); - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 +# +# Do we need to explicitly link libc? +# +case "x$archive_cmds_need_lc_GCJ" in +x|xyes) + # Assume -lc should be added + archive_cmds_need_lc_GCJ=yes + + if test "$enable_shared" = yes && test "$GCC" = yes; then + case $archive_cmds_GCJ in + *'~'*) + # FIXME: we may have to deal with multi-command sequences. + ;; + '$CC '*) + # Test whether the compiler implicitly links with -lc since on some + # systems, -lgcc has to come before -lc. If gcc already passes -lc + # to ld, don't add -lc before -lgcc. + echo "$as_me:$LINENO: checking whether -lc should be explicitly linked in" >&5 +echo $ECHO_N "checking whether -lc should be explicitly linked in... $ECHO_C" >&6 + $rm conftest* + printf "$lt_simple_compile_test_code" > conftest.$ac_ext + + if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 + (exit $ac_status); } 2>conftest.err; then + soname=conftest + lib=conftest + libobjs=conftest.$ac_objext + deplibs= + wl=$lt_prog_compiler_wl_GCJ + pic_flag=$lt_prog_compiler_pic_GCJ + compiler_flags=-v + linker_flags=-v + verstring= + output_objdir=. + libname=conftest + lt_save_allow_undefined_flag=$allow_undefined_flag_GCJ + allow_undefined_flag_GCJ= + if { (eval echo "$as_me:$LINENO: \"$archive_cmds_GCJ 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1\"") >&5 + (eval $archive_cmds_GCJ 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_lib_dld_dld_link=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 + (exit $ac_status); } + then + archive_cmds_need_lc_GCJ=no + else + archive_cmds_need_lc_GCJ=yes + fi + allow_undefined_flag_GCJ=$lt_save_allow_undefined_flag + else + cat conftest.err 1>&5 + fi + $rm conftest* + echo "$as_me:$LINENO: result: $archive_cmds_need_lc_GCJ" >&5 +echo "${ECHO_T}$archive_cmds_need_lc_GCJ" >&6 + ;; + esac + fi + ;; +esac -ac_cv_lib_dld_dld_link=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -echo "$as_me:$LINENO: result: $ac_cv_lib_dld_dld_link" >&5 -echo "${ECHO_T}$ac_cv_lib_dld_dld_link" >&6 -if test $ac_cv_lib_dld_dld_link = yes; then - lt_cv_dlopen="dld_link" lt_cv_dlopen_libs="-dld" +echo "$as_me:$LINENO: checking dynamic linker characteristics" >&5 +echo $ECHO_N "checking dynamic linker characteristics... $ECHO_C" >&6 +library_names_spec= +libname_spec='lib$name' +soname_spec= +shrext_cmds=".so" +postinstall_cmds= +postuninstall_cmds= +finish_cmds= +finish_eval= +shlibpath_var= +shlibpath_overrides_runpath=unknown +version_type=none +dynamic_linker="$host_os ld.so" +sys_lib_dlsearch_path_spec="/lib /usr/lib" +if test "$GCC" = yes; then + sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` + if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then + # if the path contains ";" then we assume it to be the separator + # otherwise default to the standard path separator (i.e. ":") - it is + # assumed that no part of a normal pathname contains ";" but that should + # okay in the real world where ";" in dirpaths is itself problematic. + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` + else + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` + fi +else + sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" fi +need_lib_prefix=unknown +hardcode_into_libs=no +# when you set need_version to no, make sure it does not cause -set_version +# flags to be left without arguments +need_version=unknown -fi +case $host_os in +aix3*) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' + shlibpath_var=LIBPATH + # AIX 3 has no versioning support, so we append a major version to the name. + soname_spec='${libname}${release}${shared_ext}$major' + ;; -fi +aix4* | aix5*) + version_type=linux + need_lib_prefix=no + need_version=no + hardcode_into_libs=yes + if test "$host_cpu" = ia64; then + # AIX 5 supports IA64 + library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' + shlibpath_var=LD_LIBRARY_PATH + else + # With GCC up to 2.95.x, collect2 would create an import file + # for dependence libraries. The import file would start with + # the line `#! .'. This would cause the generated library to + # depend on `.', always an invalid library. This was fixed in + # development snapshots of GCC prior to 3.0. + case $host_os in + aix4 | aix4.[01] | aix4.[01].*) + if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' + echo ' yes ' + echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then + : + else + can_build_shared=no + fi + ;; + esac + # AIX (on Power*) has no versioning support, so currently we can not hardcode correct + # soname into executable. Probably we can add versioning support to + # collect2, so additional links can be useful in future. + if test "$aix_use_runtimelinking" = yes; then + # If using run time linking (on AIX 4.2 or later) use lib.so + # instead of lib.a to let people know that these are not + # typical AIX shared libraries. + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + else + # We preserve .a as extension for shared libraries through AIX4.2 + # and later when we are not doing run time linking. + library_names_spec='${libname}${release}.a $libname.a' + soname_spec='${libname}${release}${shared_ext}$major' + fi + shlibpath_var=LIBPATH + fi + ;; +amigaos*) + library_names_spec='$libname.ixlibrary $libname.a' + # Create ${libname}_ixlibrary.a entries in /sys/libs. + finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' + ;; -fi +beos*) + library_names_spec='${libname}${shared_ext}' + dynamic_linker="$host_os ld.so" + shlibpath_var=LIBRARY_PATH + ;; +bsdi[45]*) + version_type=linux + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' + shlibpath_var=LD_LIBRARY_PATH + sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" + sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" + # the default ld.so.conf also contains /usr/contrib/lib and + # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow + # libtool to hard-code these into programs + ;; -fi +cygwin* | mingw* | pw32*) + version_type=windows + shrext_cmds=".dll" + need_version=no + need_lib_prefix=no + case $GCC,$host_os in + yes,cygwin* | yes,mingw* | yes,pw32*) + library_names_spec='$libname.dll.a' + # DLL is installed to $(libdir)/../bin by postinstall_cmds + postinstall_cmds='base_file=`basename \${file}`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + test -d \$dldir || mkdir -p \$dldir~ + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname' + postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ + dlpath=$dir/\$dldll~ + $rm \$dlpath' + shlibpath_overrides_runpath=yes -fi + case $host_os in + cygwin*) + # Cygwin DLLs use 'cyg' prefix rather than 'lib' + soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' + sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" + ;; + mingw*) + # MinGW DLLs use traditional 'lib' prefix + soname_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' + sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` + if echo "$sys_lib_search_path_spec" | grep ';[c-zC-Z]:/' >/dev/null; then + # It is most probably a Windows format PATH printed by + # mingw gcc, but we are running on Cygwin. Gcc prints its search + # path with ; separators, and with drive letters. We can handle the + # drive letters (cygwin fileutils understands them), so leave them, + # especially as we might pass files found there to a mingw objdump, + # which wouldn't understand a cygwinified path. Ahh. + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` + else + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` + fi + ;; + pw32*) + # pw32 DLLs use 'pw' prefix rather than 'lib' + library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext}' + ;; + esac + ;; + *) + library_names_spec='${libname}`echo ${release} | $SED -e 's/[.]/-/g'`${versuffix}${shared_ext} $libname.lib' ;; esac + dynamic_linker='Win32 ld.exe' + # FIXME: first we should search . and the directory the executable is in + shlibpath_var=PATH + ;; - if test "x$lt_cv_dlopen" != xno; then - enable_dlopen=yes +darwin* | rhapsody*) + dynamic_linker="$host_os dyld" + version_type=darwin + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' + soname_spec='${libname}${release}${major}$shared_ext' + shlibpath_overrides_runpath=yes + shlibpath_var=DYLD_LIBRARY_PATH + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' + # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. + if test "$GCC" = yes; then + sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` else - enable_dlopen=no + sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' fi + sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' + ;; - case $lt_cv_dlopen in - dlopen) - save_CPPFLAGS="$CPPFLAGS" - test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" +dgux*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + ;; - save_LDFLAGS="$LDFLAGS" - eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" +freebsd1*) + dynamic_linker=no + ;; - save_LIBS="$LIBS" - LIBS="$lt_cv_dlopen_libs $LIBS" +kfreebsd*-gnu) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + dynamic_linker='GNU ld.so' + ;; - echo "$as_me:$LINENO: checking whether a program can dlopen itself" >&5 -echo $ECHO_N "checking whether a program can dlopen itself... $ECHO_C" >&6 -if test "${lt_cv_dlopen_self+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test "$cross_compiling" = yes; then : - lt_cv_dlopen_self=cross -else - lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 - lt_status=$lt_dlunknown - cat > conftest.$ac_ext < -#endif +gnu*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + ;; -#include +hpux9* | hpux10* | hpux11*) + # Give a soname corresponding to the major version so that dld.sl refuses to + # link against other versions. + version_type=sunos + need_lib_prefix=no + need_version=no + case $host_cpu in + ia64*) + shrext_cmds='.so' + hardcode_into_libs=yes + dynamic_linker="$host_os dld.so" + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + if test "X$HPUX_IA64_MODE" = X32; then + sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" + else + sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" + fi + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + ;; + hppa*64*) + shrext_cmds='.sl' + hardcode_into_libs=yes + dynamic_linker="$host_os dld.sl" + shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + ;; + *) + shrext_cmds='.sl' + dynamic_linker="$host_os dld.sl" + shlibpath_var=SHLIB_PATH + shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + ;; + esac + # HP-UX runs *really* slowly unless shared libraries are mode 555. + postinstall_cmds='chmod 555 $lib' + ;; -#ifdef RTLD_GLOBAL -# define LT_DLGLOBAL RTLD_GLOBAL -#else -# ifdef DL_GLOBAL -# define LT_DLGLOBAL DL_GLOBAL -# else -# define LT_DLGLOBAL 0 -# endif -#endif +interix3*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; -/* We may have to define LT_DLLAZY_OR_NOW in the command line if we - find out it does not work in some platform. */ -#ifndef LT_DLLAZY_OR_NOW -# ifdef RTLD_LAZY -# define LT_DLLAZY_OR_NOW RTLD_LAZY -# else -# ifdef DL_LAZY -# define LT_DLLAZY_OR_NOW DL_LAZY -# else -# ifdef RTLD_NOW -# define LT_DLLAZY_OR_NOW RTLD_NOW -# else -# ifdef DL_NOW -# define LT_DLLAZY_OR_NOW DL_NOW -# else -# define LT_DLLAZY_OR_NOW 0 -# endif -# endif -# endif -# endif -#endif +irix5* | irix6* | nonstopux*) + case $host_os in + nonstopux*) version_type=nonstopux ;; + *) + if test "$lt_cv_prog_gnu_ld" = yes; then + version_type=linux + else + version_type=irix + fi ;; + esac + need_lib_prefix=no + need_version=no + soname_spec='${libname}${release}${shared_ext}$major' + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' + case $host_os in + irix5* | nonstopux*) + libsuff= shlibsuff= + ;; + *) + case $LD in # libtool.m4 will add one of these switches to LD + *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") + libsuff= shlibsuff= libmagic=32-bit;; + *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") + libsuff=32 shlibsuff=N32 libmagic=N32;; + *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") + libsuff=64 shlibsuff=64 libmagic=64-bit;; + *) libsuff= shlibsuff= libmagic=never-match;; + esac + ;; + esac + shlibpath_var=LD_LIBRARY${shlibsuff}_PATH + shlibpath_overrides_runpath=no + sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" + sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" + hardcode_into_libs=yes + ;; + +# No shared lib support for Linux oldld, aout, or coff. +linux*oldld* | linux*aout* | linux*coff*) + dynamic_linker=no + ;; + +# This must be Linux ELF. +linux*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + # This implies no fast_install, which is unacceptable. + # Some rework will be needed to allow for fast_install + # before this can be enabled. + hardcode_into_libs=yes -#ifdef __cplusplus -extern "C" void exit (int); -#endif + # Append ld.so.conf contents to the search path + if test -f /etc/ld.so.conf; then + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` + sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" + fi -void fnord() { int i=42;} -int main () -{ - void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); - int status = $lt_dlunknown; + # We used to test for /lib/ld.so.1 and disable shared libraries on + # powerpc, because MkLinux only supported shared libraries with the + # GNU dynamic linker. Since this was broken with cross compilers, + # most powerpc-linux boxes support dynamic linking these days and + # people can always --disable-shared, the test was removed, and we + # assume the GNU/Linux dynamic linker is in use. + dynamic_linker='GNU/Linux ld.so' + ;; - if (self) - { - if (dlsym (self,"fnord")) status = $lt_dlno_uscore; - else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; - /* dlclose (self); */ - } +knetbsd*-gnu) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + dynamic_linker='GNU ld.so' + ;; - exit (status); -} -EOF - if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then - (./conftest; exit; ) 2>/dev/null - lt_status=$? - case x$lt_status in - x$lt_dlno_uscore) lt_cv_dlopen_self=yes ;; - x$lt_dlneed_uscore) lt_cv_dlopen_self=yes ;; - x$lt_unknown|x*) lt_cv_dlopen_self=no ;; - esac - else : - # compilation failed - lt_cv_dlopen_self=no +netbsd*) + version_type=sunos + need_lib_prefix=no + need_version=no + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' + dynamic_linker='NetBSD (a.out) ld.so' + else + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='NetBSD ld.elf_so' fi -fi -rm -fr conftest* - - -fi -echo "$as_me:$LINENO: result: $lt_cv_dlopen_self" >&5 -echo "${ECHO_T}$lt_cv_dlopen_self" >&6 + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; - if test "x$lt_cv_dlopen_self" = xyes; then - LDFLAGS="$LDFLAGS $link_static_flag" - echo "$as_me:$LINENO: checking whether a statically linked program can dlopen itself" >&5 -echo $ECHO_N "checking whether a statically linked program can dlopen itself... $ECHO_C" >&6 -if test "${lt_cv_dlopen_self_static+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test "$cross_compiling" = yes; then : - lt_cv_dlopen_self_static=cross -else - lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 - lt_status=$lt_dlunknown - cat > conftest.$ac_ext < -#endif +nto-qnx*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + ;; -#include +openbsd*) + version_type=sunos + sys_lib_dlsearch_path_spec="/usr/lib" + need_lib_prefix=no + # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. + case $host_os in + openbsd3.3 | openbsd3.3.*) need_version=yes ;; + *) need_version=no ;; + esac + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' + shlibpath_var=LD_LIBRARY_PATH + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then + case $host_os in + openbsd2.[89] | openbsd2.[89].*) + shlibpath_overrides_runpath=no + ;; + *) + shlibpath_overrides_runpath=yes + ;; + esac + else + shlibpath_overrides_runpath=yes + fi + ;; -#ifdef RTLD_GLOBAL -# define LT_DLGLOBAL RTLD_GLOBAL -#else -# ifdef DL_GLOBAL -# define LT_DLGLOBAL DL_GLOBAL -# else -# define LT_DLGLOBAL 0 -# endif -#endif +os2*) + libname_spec='$name' + shrext_cmds=".dll" + need_lib_prefix=no + library_names_spec='$libname${shared_ext} $libname.a' + dynamic_linker='OS/2 ld.exe' + shlibpath_var=LIBPATH + ;; -/* We may have to define LT_DLLAZY_OR_NOW in the command line if we - find out it does not work in some platform. */ -#ifndef LT_DLLAZY_OR_NOW -# ifdef RTLD_LAZY -# define LT_DLLAZY_OR_NOW RTLD_LAZY -# else -# ifdef DL_LAZY -# define LT_DLLAZY_OR_NOW DL_LAZY -# else -# ifdef RTLD_NOW -# define LT_DLLAZY_OR_NOW RTLD_NOW -# else -# ifdef DL_NOW -# define LT_DLLAZY_OR_NOW DL_NOW -# else -# define LT_DLLAZY_OR_NOW 0 -# endif -# endif -# endif -# endif -#endif +osf3* | osf4* | osf5*) + version_type=osf + need_lib_prefix=no + need_version=no + soname_spec='${libname}${release}${shared_ext}$major' + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + shlibpath_var=LD_LIBRARY_PATH + sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" + sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" + ;; -#ifdef __cplusplus -extern "C" void exit (int); -#endif +solaris*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + # ldd complains unless libraries are executable + postinstall_cmds='chmod +x $lib' + ;; -void fnord() { int i=42;} -int main () -{ - void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); - int status = $lt_dlunknown; +sunos4*) + version_type=sunos + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' + finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + if test "$with_gnu_ld" = yes; then + need_lib_prefix=no + fi + need_version=yes + ;; - if (self) - { - if (dlsym (self,"fnord")) status = $lt_dlno_uscore; - else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; - /* dlclose (self); */ - } +sysv4 | sysv4.3*) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + case $host_vendor in + sni) + shlibpath_overrides_runpath=no + need_lib_prefix=no + export_dynamic_flag_spec='${wl}-Blargedynsym' + runpath_var=LD_RUN_PATH + ;; + siemens) + need_lib_prefix=no + ;; + motorola) + need_lib_prefix=no + need_version=no + shlibpath_overrides_runpath=no + sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' + ;; + esac + ;; - exit (status); -} -EOF - if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && test -s conftest${ac_exeext} 2>/dev/null; then - (./conftest; exit; ) 2>/dev/null - lt_status=$? - case x$lt_status in - x$lt_dlno_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_dlneed_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_unknown|x*) lt_cv_dlopen_self_static=no ;; +sysv4*MP*) + if test -d /usr/nec ;then + version_type=linux + library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' + soname_spec='$libname${shared_ext}.$major' + shlibpath_var=LD_LIBRARY_PATH + fi + ;; + +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + version_type=freebsd-elf + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + if test "$with_gnu_ld" = yes; then + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' + shlibpath_overrides_runpath=no + else + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' + shlibpath_overrides_runpath=yes + case $host_os in + sco3.2v5*) + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" + ;; esac - else : - # compilation failed - lt_cv_dlopen_self_static=no fi -fi -rm -fr conftest* + sys_lib_dlsearch_path_spec='/usr/lib' + ;; + +uts4*) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + ;; +*) + dynamic_linker=no + ;; +esac +echo "$as_me:$LINENO: result: $dynamic_linker" >&5 +echo "${ECHO_T}$dynamic_linker" >&6 +test "$dynamic_linker" = no && can_build_shared=no +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" +if test "$GCC" = yes; then + variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" fi -echo "$as_me:$LINENO: result: $lt_cv_dlopen_self_static" >&5 -echo "${ECHO_T}$lt_cv_dlopen_self_static" >&6 - fi - CPPFLAGS="$save_CPPFLAGS" - LDFLAGS="$save_LDFLAGS" - LIBS="$save_LIBS" - ;; - esac +echo "$as_me:$LINENO: checking how to hardcode library paths into programs" >&5 +echo $ECHO_N "checking how to hardcode library paths into programs... $ECHO_C" >&6 +hardcode_action_GCJ= +if test -n "$hardcode_libdir_flag_spec_GCJ" || \ + test -n "$runpath_var_GCJ" || \ + test "X$hardcode_automatic_GCJ" = "Xyes" ; then - case $lt_cv_dlopen_self in - yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; - *) enable_dlopen_self=unknown ;; - esac + # We can hardcode non-existant directories. + if test "$hardcode_direct_GCJ" != no && + # If the only mechanism to avoid hardcoding is shlibpath_var, we + # have to relink, otherwise we might link with an installed library + # when we should be linking with a yet-to-be-installed one + ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, GCJ)" != no && + test "$hardcode_minus_L_GCJ" != no; then + # Linking always hardcodes the temporary library directory. + hardcode_action_GCJ=relink + else + # We can link without hardcoding, and we can hardcode nonexisting dirs. + hardcode_action_GCJ=immediate + fi +else + # We cannot hardcode anything, or else we can only hardcode existing + # directories. + hardcode_action_GCJ=unsupported +fi +echo "$as_me:$LINENO: result: $hardcode_action_GCJ" >&5 +echo "${ECHO_T}$hardcode_action_GCJ" >&6 - case $lt_cv_dlopen_self_static in - yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; - *) enable_dlopen_self_static=unknown ;; - esac +if test "$hardcode_action_GCJ" = relink; then + # Fast installation is not supported + enable_fast_install=no +elif test "$shlibpath_overrides_runpath" = yes || + test "$enable_shared" = no; then + # Fast installation is not necessary + enable_fast_install=needless fi @@ -18687,7 +17569,7 @@ if test -f "$ltmain"; then # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. - for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC NM \ + for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ @@ -18805,6 +17687,9 @@ AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC +# LTCC compiler flags. +LTCFLAGS=$lt_LTCFLAGS + # A language-specific compiler. CC=$lt_compiler_GCJ @@ -19114,6 +17999,9 @@ lt_simple_link_test_code="$lt_simple_com # If no C compiler was specified, use CC. LTCC=${LTCC-"$CC"} +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} + # Allow CC to be a program name with arguments. compiler=$CC @@ -19121,13 +18009,13 @@ compiler=$CC # save warnings/boilerplate of simple test code ac_outfile=conftest.$ac_objext printf "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_compiler_boilerplate=`cat conftest.err` $rm conftest* ac_outfile=conftest.$ac_objext printf "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d' >conftest.err +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err _lt_linker_boilerplate=`cat conftest.err` $rm conftest* @@ -19162,7 +18050,7 @@ if test -f "$ltmain"; then # Now quote all the things that may contain metacharacters while being # careful not to overquote the AC_SUBSTed values. We take copies of the # variables and quote the copies for generation of the libtool script. - for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC NM \ + for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC LTCFLAGS NM \ SED SHELL STRIP \ libname_spec library_names_spec soname_spec extract_expsyms_cmds \ old_striplib striplib file_magic_cmd finish_cmds finish_eval \ @@ -19280,6 +18168,9 @@ AR_FLAGS=$lt_AR_FLAGS # A C compiler. LTCC=$lt_LTCC +# LTCC compiler flags. +LTCFLAGS=$lt_LTCFLAGS + # A language-specific compiler. CC=$lt_compiler_RC @@ -19621,18 +18512,353 @@ LIBTOOL='$(SHELL) $(top_builddir)/libtoo -# Check whether --enable-libcheck or --disable-libcheck was given. -if test "${enable_libcheck+set}" = set; then - enableval="$enable_libcheck" - if test x$enableval = xno ; then - disable_libcheck=yes - fi +# Check whether --enable-libcheck or --disable-libcheck was given. +if test "${enable_libcheck+set}" = set; then + enableval="$enable_libcheck" + if test x$enableval = xno ; then + disable_libcheck=yes + fi + +fi; + +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu +if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. +set dummy ${ac_tool_prefix}gcc; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_prog_CC="${ac_tool_prefix}gcc" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +fi +fi +CC=$ac_cv_prog_CC +if test -n "$CC"; then + echo "$as_me:$LINENO: result: $CC" >&5 +echo "${ECHO_T}$CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + +fi +if test -z "$ac_cv_prog_CC"; then + ac_ct_CC=$CC + # Extract the first word of "gcc", so it can be a program name with args. +set dummy gcc; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_ac_ct_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$ac_ct_CC"; then + ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_CC="gcc" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +fi +fi +ac_ct_CC=$ac_cv_prog_ac_ct_CC +if test -n "$ac_ct_CC"; then + echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 +echo "${ECHO_T}$ac_ct_CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + + CC=$ac_ct_CC +else + CC="$ac_cv_prog_CC" +fi + +if test -z "$CC"; then + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. +set dummy ${ac_tool_prefix}cc; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_prog_CC="${ac_tool_prefix}cc" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +fi +fi +CC=$ac_cv_prog_CC +if test -n "$CC"; then + echo "$as_me:$LINENO: result: $CC" >&5 +echo "${ECHO_T}$CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + +fi +if test -z "$ac_cv_prog_CC"; then + ac_ct_CC=$CC + # Extract the first word of "cc", so it can be a program name with args. +set dummy cc; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_ac_ct_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$ac_ct_CC"; then + ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_CC="cc" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +fi +fi +ac_ct_CC=$ac_cv_prog_ac_ct_CC +if test -n "$ac_ct_CC"; then + echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 +echo "${ECHO_T}$ac_ct_CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + + CC=$ac_ct_CC +else + CC="$ac_cv_prog_CC" +fi + +fi +if test -z "$CC"; then + # Extract the first word of "cc", so it can be a program name with args. +set dummy cc; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else + ac_prog_rejected=no +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then + ac_prog_rejected=yes + continue + fi + ac_cv_prog_CC="cc" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +if test $ac_prog_rejected = yes; then + # We found a bogon in the path, so make sure we never use it. + set dummy $ac_cv_prog_CC + shift + if test $# != 0; then + # We chose a different compiler from the bogus one. + # However, it has the same basename, so the bogon will be chosen + # first if we set CC to just the basename; use the full file name. + shift + ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" + fi +fi +fi +fi +CC=$ac_cv_prog_CC +if test -n "$CC"; then + echo "$as_me:$LINENO: result: $CC" >&5 +echo "${ECHO_T}$CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + +fi +if test -z "$CC"; then + if test -n "$ac_tool_prefix"; then + for ac_prog in cl + do + # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. +set dummy $ac_tool_prefix$ac_prog; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_prog_CC="$ac_tool_prefix$ac_prog" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +fi +fi +CC=$ac_cv_prog_CC +if test -n "$CC"; then + echo "$as_me:$LINENO: result: $CC" >&5 +echo "${ECHO_T}$CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + + test -n "$CC" && break + done +fi +if test -z "$CC"; then + ac_ct_CC=$CC + for ac_prog in cl +do + # Extract the first word of "$ac_prog", so it can be a program name with args. +set dummy $ac_prog; ac_word=$2 +echo "$as_me:$LINENO: checking for $ac_word" >&5 +echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 +if test "${ac_cv_prog_ac_ct_CC+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + if test -n "$ac_ct_CC"; then + ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_CC="$ac_prog" + echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done +done + +fi +fi +ac_ct_CC=$ac_cv_prog_ac_ct_CC +if test -n "$ac_ct_CC"; then + echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 +echo "${ECHO_T}$ac_ct_CC" >&6 +else + echo "$as_me:$LINENO: result: no" >&5 +echo "${ECHO_T}no" >&6 +fi + + test -n "$ac_ct_CC" && break +done + + CC=$ac_ct_CC +fi + +fi + + +test -z "$CC" && { { echo "$as_me:$LINENO: error: no acceptable C compiler found in \$PATH +See \`config.log' for more details." >&5 +echo "$as_me: error: no acceptable C compiler found in \$PATH +See \`config.log' for more details." >&2;} + { (exit 1); exit 1; }; } -fi; +# Provide some information about the compiler. +echo "$as_me:$LINENO:" \ + "checking for C compiler version" >&5 +ac_compiler=`set X $ac_compile; echo $2` +{ (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 + (eval $ac_compiler --version &5) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } +{ (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 + (eval $ac_compiler -v &5) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } +{ (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 + (eval $ac_compiler -V &5) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } -echo "$as_me:$LINENO: checking for ANSI C header files" >&5 -echo $ECHO_N "checking for ANSI C header files... $ECHO_C" >&6 -if test "${ac_cv_header_stdc+set}" = set; then +echo "$as_me:$LINENO: checking whether we are using the GNU C compiler" >&5 +echo $ECHO_N "checking whether we are using the GNU C compiler... $ECHO_C" >&6 +if test "${ac_cv_c_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF @@ -19641,14 +18867,13 @@ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ -#include -#include -#include -#include int main () { +#ifndef __GNUC__ + choke me +#endif ; return 0; @@ -19659,78 +18884,44 @@ if { (eval echo "$as_me:$LINENO: \"$ac_c (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest.$ac_objext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - ac_cv_header_stdc=yes -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -ac_cv_header_stdc=no -fi -rm -f conftest.err conftest.$ac_objext conftest.$ac_ext - -if test $ac_cv_header_stdc = yes; then - # SunOS 4.x string.h does not declare mem*, contrary to ANSI. - cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ -#include - -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "memchr" >/dev/null 2>&1; then - : -else - ac_cv_header_stdc=no -fi -rm -f conftest* - -fi - -if test $ac_cv_header_stdc = yes; then - # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. - cat >conftest.$ac_ext <<_ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h >>conftest.$ac_ext -cat >>conftest.$ac_ext <<_ACEOF -/* end confdefs.h. */ -#include - -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "free" >/dev/null 2>&1; then - : + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && + { ac_try='test -z "$ac_c_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest.$ac_objext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then + ac_compiler_gnu=yes else - ac_cv_header_stdc=no -fi -rm -f conftest* + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 +ac_compiler_gnu=no fi +rm -f conftest.err conftest.$ac_objext conftest.$ac_ext +ac_cv_c_compiler_gnu=$ac_compiler_gnu -if test $ac_cv_header_stdc = yes; then - # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. - if test "$cross_compiling" = yes; then - : +fi +echo "$as_me:$LINENO: result: $ac_cv_c_compiler_gnu" >&5 +echo "${ECHO_T}$ac_cv_c_compiler_gnu" >&6 +GCC=`test $ac_compiler_gnu = yes && echo yes` +ac_test_CFLAGS=${CFLAGS+set} +ac_save_CFLAGS=$CFLAGS +CFLAGS="-g" +echo "$as_me:$LINENO: checking whether $CC accepts -g" >&5 +echo $ECHO_N "checking whether $CC accepts -g... $ECHO_C" >&6 +if test "${ac_cv_prog_cc_g+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ @@ -19738,101 +18929,135 @@ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ -#include -#if ((' ' & 0x0FF) == 0x020) -# define ISLOWER(c) ('a' <= (c) && (c) <= 'z') -# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) -#else -# define ISLOWER(c) \ - (('a' <= (c) && (c) <= 'i') \ - || ('j' <= (c) && (c) <= 'r') \ - || ('s' <= (c) && (c) <= 'z')) -# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) -#endif -#define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) int main () { - int i; - for (i = 0; i < 256; i++) - if (XOR (islower (i), ISLOWER (i)) - || toupper (i) != TOUPPER (i)) - exit(2); - exit (0); + + ; + return 0; } _ACEOF -rm -f conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>&5 +rm -f conftest.$ac_objext +if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>conftest.er1 ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && { ac_try='./conftest$ac_exeext' + (exit $ac_status); } && + { ac_try='test -z "$ac_c_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - : + ac_cv_prog_cc_g=yes else - echo "$as_me: program exited with status $ac_status" >&5 -echo "$as_me: failed program was:" >&5 + echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -( exit $ac_status ) -ac_cv_header_stdc=no -fi -rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext -fi +ac_cv_prog_cc_g=no fi +rm -f conftest.err conftest.$ac_objext conftest.$ac_ext fi -echo "$as_me:$LINENO: result: $ac_cv_header_stdc" >&5 -echo "${ECHO_T}$ac_cv_header_stdc" >&6 -if test $ac_cv_header_stdc = yes; then - -cat >>confdefs.h <<\_ACEOF -#define STDC_HEADERS 1 -_ACEOF - +echo "$as_me:$LINENO: result: $ac_cv_prog_cc_g" >&5 +echo "${ECHO_T}$ac_cv_prog_cc_g" >&6 +if test "$ac_test_CFLAGS" = set; then + CFLAGS=$ac_save_CFLAGS +elif test $ac_cv_prog_cc_g = yes; then + if test "$GCC" = yes; then + CFLAGS="-g -O2" + else + CFLAGS="-g" + fi +else + if test "$GCC" = yes; then + CFLAGS="-O2" + else + CFLAGS= + fi fi - - - -if test "$disable_libcheck" != "yes" -then - -echo "$as_me:$LINENO: checking for ibv_get_device_list in -libverbs" >&5 -echo $ECHO_N "checking for ibv_get_device_list in -libverbs... $ECHO_C" >&6 -if test "${ac_cv_lib_ibverbs_ibv_get_device_list+set}" = set; then +echo "$as_me:$LINENO: checking for $CC option to accept ANSI C" >&5 +echo $ECHO_N "checking for $CC option to accept ANSI C... $ECHO_C" >&6 +if test "${ac_cv_prog_cc_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else - ac_check_lib_save_LIBS=$LIBS -LIBS="-libverbs $LIBS" + ac_cv_prog_cc_stdc=no +ac_save_CC=$CC cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ +#include +#include +#include +#include +/* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ +struct buf { int x; }; +FILE * (*rcsopen) (struct buf *, struct stat *, int); +static char *e (p, i) + char **p; + int i; +{ + return p[i]; +} +static char *f (char * (*g) (char **, int), char **p, ...) +{ + char *s; + va_list v; + va_start (v,p); + s = g (p, va_arg (v,int)); + va_end (v); + return s; +} -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char ibv_get_device_list (); +/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has + function prototypes and stuff, but not '\xHH' hex character constants. + These don't provoke an error unfortunately, instead are silently treated + as 'x'. The following induces an error, until -std1 is added to get + proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an + array size at least. It's necessary to write '\x00'==0 to get something + that's true only with -std1. */ +int osf4_cc_array ['\x00' == 0 ? 1 : -1]; + +int test (int i, double x); +struct s1 {int (*f) (int a);}; +struct s2 {int (*f) (double a);}; +int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); +int argc; +char **argv; int main () { -ibv_get_device_list (); +return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; ; return 0; } _ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 +# Don't try gcc -ansi; that turns off useful extensions and +# breaks some systems' header files. +# AIX -qlanglvl=ansi +# Ultrix and OSF/1 -std1 +# HP-UX 10.20 and later -Ae +# HP-UX older versions -Aa -D_HPUX_SOURCE +# SVR4 -Xc -D__EXTENSIONS__ +for ac_arg in "" -qlanglvl=ansi -std1 -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" +do + CC="$ac_save_CC $ac_arg" + rm -f conftest.$ac_objext +if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 @@ -19846,102 +19071,94 @@ if { (eval echo "$as_me:$LINENO: \"$ac_l ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' + { ac_try='test -s conftest.$ac_objext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - ac_cv_lib_ibverbs_ibv_get_device_list=yes + ac_cv_prog_cc_stdc=$ac_arg +break else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -ac_cv_lib_ibverbs_ibv_get_device_list=no -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS fi -echo "$as_me:$LINENO: result: $ac_cv_lib_ibverbs_ibv_get_device_list" >&5 -echo "${ECHO_T}$ac_cv_lib_ibverbs_ibv_get_device_list" >&6 -if test $ac_cv_lib_ibverbs_ibv_get_device_list = yes; then - cat >>confdefs.h <<_ACEOF -#define HAVE_LIBIBVERBS 1 -_ACEOF - - LIBS="-libverbs $LIBS" +rm -f conftest.err conftest.$ac_objext +done +rm -f conftest.$ac_ext conftest.$ac_objext +CC=$ac_save_CC -else - { { echo "$as_me:$LINENO: error: libibverbs not installed" >&5 -echo "$as_me: error: libibverbs not installed" >&2;} - { (exit 1); exit 1; }; } fi +case "x$ac_cv_prog_cc_stdc" in + x|xno) + echo "$as_me:$LINENO: result: none needed" >&5 +echo "${ECHO_T}none needed" >&6 ;; + *) + echo "$as_me:$LINENO: result: $ac_cv_prog_cc_stdc" >&5 +echo "${ECHO_T}$ac_cv_prog_cc_stdc" >&6 + CC="$CC $ac_cv_prog_cc_stdc" ;; +esac - -for ac_func in ibv_read_sysfs_file +# Some people use a C++ compiler to compile C. Since we use `exit', +# in C++ we need to declare it. In case someone uses the same compiler +# for both compiling C and C++ we need to have the C++ compiler decide +# the declaration of exit, since it's the most demanding environment. +cat >conftest.$ac_ext <<_ACEOF +#ifndef __cplusplus + choke me +#endif +_ACEOF +rm -f conftest.$ac_objext +if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && + { ac_try='test -z "$ac_c_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest.$ac_objext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then + for ac_declaration in \ + '' \ + 'extern "C" void std::exit (int) throw (); using std::exit;' \ + 'extern "C" void std::exit (int); using std::exit;' \ + 'extern "C" void exit (int) throw ();' \ + 'extern "C" void exit (int);' \ + 'void exit (int);' do -as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` -echo "$as_me:$LINENO: checking for $ac_func" >&5 -echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 -if eval "test \"\${$as_ac_var+set}\" = set"; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ -/* Define $ac_func to an innocuous variant, in case declares $ac_func. - For example, HP-UX 11i declares gettimeofday. */ -#define $ac_func innocuous_$ac_func - -/* System header to define __stub macros and hopefully few prototypes, - which can conflict with char $ac_func (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ - -#ifdef __STDC__ -# include -#else -# include -#endif - -#undef $ac_func - -/* Override any gcc2 internal prototype to avoid an error. */ -#ifdef __cplusplus -extern "C" -{ -#endif -/* We use char because int might match the return type of a gcc2 - builtin and then its argument prototype would still apply. */ -char $ac_func (); -/* The GNU C library defines this for functions which it implements - to always fail with ENOSYS. Some functions are actually named - something starting with __ and the normal name is an alias. */ -#if defined (__stub_$ac_func) || defined (__stub___$ac_func) -choke me -#else -char (*f) () = $ac_func; -#endif -#ifdef __cplusplus -} -#endif - +$ac_declaration +#include int main () { -return f != $ac_func; +exit (42); ; return 0; } _ACEOF -rm -f conftest.$ac_objext conftest$ac_exeext -if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 - (eval $ac_link) 2>conftest.er1 +rm -f conftest.$ac_objext +if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 @@ -19955,392 +19172,297 @@ if { (eval echo "$as_me:$LINENO: \"$ac_l ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && - { ac_try='test -s conftest$ac_exeext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - eval "$as_ac_var=yes" -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - -eval "$as_ac_var=no" -fi -rm -f conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -fi -echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 -echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 -if test `eval echo '${'$as_ac_var'}'` = yes; then - cat >>confdefs.h <<_ACEOF -#define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 -_ACEOF - -fi -done - -fi - -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. -set dummy ${ac_tool_prefix}gcc; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_CC+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CC="${ac_tool_prefix}gcc" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done -done - -fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - echo "$as_me:$LINENO: result: $CC" >&5 -echo "${ECHO_T}$CC" >&6 -else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - -fi -if test -z "$ac_cv_prog_CC"; then - ac_ct_CC=$CC - # Extract the first word of "gcc", so it can be a program name with args. -set dummy gcc; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_ac_ct_CC+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -n "$ac_ct_CC"; then - ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_CC="gcc" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done -done - -fi -fi -ac_ct_CC=$ac_cv_prog_ac_ct_CC -if test -n "$ac_ct_CC"; then - echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 -echo "${ECHO_T}$ac_ct_CC" >&6 -else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - - CC=$ac_ct_CC -else - CC="$ac_cv_prog_CC" -fi - -if test -z "$CC"; then - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. -set dummy ${ac_tool_prefix}cc; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_CC+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CC="${ac_tool_prefix}cc" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done -done - -fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - echo "$as_me:$LINENO: result: $CC" >&5 -echo "${ECHO_T}$CC" >&6 -else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - -fi -if test -z "$ac_cv_prog_CC"; then - ac_ct_CC=$CC - # Extract the first word of "cc", so it can be a program name with args. -set dummy cc; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_ac_ct_CC+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -n "$ac_ct_CC"; then - ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_CC="cc" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done -done - -fi -fi -ac_ct_CC=$ac_cv_prog_ac_ct_CC -if test -n "$ac_ct_CC"; then - echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 -echo "${ECHO_T}$ac_ct_CC" >&6 -else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi - - CC=$ac_ct_CC + { ac_try='test -s conftest.$ac_objext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then + : else - CC="$ac_cv_prog_CC" -fi + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 +continue fi -if test -z "$CC"; then - # Extract the first word of "cc", so it can be a program name with args. -set dummy cc; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_CC+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. +rm -f conftest.err conftest.$ac_objext conftest.$ac_ext + cat >conftest.$ac_ext <<_ACEOF +/* confdefs.h. */ +_ACEOF +cat confdefs.h >>conftest.$ac_ext +cat >>conftest.$ac_ext <<_ACEOF +/* end confdefs.h. */ +$ac_declaration +int +main () +{ +exit (42); + ; + return 0; +} +_ACEOF +rm -f conftest.$ac_objext +if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 + (eval $ac_compile) 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && + { ac_try='test -z "$ac_c_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest.$ac_objext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then + break else - ac_prog_rejected=no -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then - ac_prog_rejected=yes - continue - fi - ac_cv_prog_CC="cc" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done -done + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 -if test $ac_prog_rejected = yes; then - # We found a bogon in the path, so make sure we never use it. - set dummy $ac_cv_prog_CC - shift - if test $# != 0; then - # We chose a different compiler from the bogus one. - # However, it has the same basename, so the bogon will be chosen - # first if we set CC to just the basename; use the full file name. - shift - ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" - fi -fi fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - echo "$as_me:$LINENO: result: $CC" >&5 -echo "${ECHO_T}$CC" >&6 -else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 +rm -f conftest.err conftest.$ac_objext conftest.$ac_ext +done +rm -f conftest* +if test -n "$ac_declaration"; then + echo '#ifdef __cplusplus' >>confdefs.h + echo $ac_declaration >>confdefs.h + echo '#endif' >>confdefs.h fi -fi -if test -z "$CC"; then - if test -n "$ac_tool_prefix"; then - for ac_prog in cl - do - # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. -set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_CC+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CC="$ac_tool_prefix$ac_prog" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done -done + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - echo "$as_me:$LINENO: result: $CC" >&5 -echo "${ECHO_T}$CC" >&6 -else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 -fi +rm -f conftest.err conftest.$ac_objext conftest.$ac_ext +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu - test -n "$CC" && break - done -fi -if test -z "$CC"; then - ac_ct_CC=$CC - for ac_prog in cl -do - # Extract the first word of "$ac_prog", so it can be a program name with args. -set dummy $ac_prog; ac_word=$2 -echo "$as_me:$LINENO: checking for $ac_word" >&5 -echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6 -if test "${ac_cv_prog_ac_ct_CC+set}" = set; then +depcc="$CC" am_compiler_list= + +echo "$as_me:$LINENO: checking dependency style of $depcc" >&5 +echo $ECHO_N "checking dependency style of $depcc... $ECHO_C" >&6 +if test "${am_cv_CC_dependencies_compiler_type+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else - if test -n "$ac_ct_CC"; then - ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if $as_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_CC="$ac_prog" - echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 + if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then + # We make a subdir and do the tests there. Otherwise we can end up + # making bogus files that we don't know about and never remove. For + # instance it was reported that on HP-UX the gcc test will end up + # making a dummy file named `D' -- because `-MD' means `put the output + # in D'. + mkdir conftest.dir + # Copy depcomp to subdir because otherwise we won't find it if we're + # using a relative directory. + cp "$am_depcomp" conftest.dir + cd conftest.dir + # We will build objects and dependencies in a subdirectory because + # it helps to detect inapplicable dependency modes. For instance + # both Tru64's cc and ICC support -MD to output dependencies as a + # side effect of compilation, but ICC will put the dependencies in + # the current directory while Tru64 will put them in the object + # directory. + mkdir sub + + am_cv_CC_dependencies_compiler_type=none + if test "$am_compiler_list" = ""; then + am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` fi -done -done + for depmode in $am_compiler_list; do + # Setup a source with many dependencies, because some compilers + # like to wrap large dependency lists on column 80 (with \), and + # we should not choose a depcomp mode which is confused by this. + # + # We need to recreate these files for each test, as the compiler may + # overwrite some of them when testing with obscure command lines. + # This happens at least with the AIX C compiler. + : > sub/conftest.c + for i in 1 2 3 4 5 6; do + echo '#include "conftst'$i'.h"' >> sub/conftest.c + # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with + # Solaris 8's {/usr,}/bin/sh. + touch sub/conftst$i.h + done + echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf + + case $depmode in + nosideeffect) + # after this tag, mechanisms are not by side-effect, so they'll + # only be used when explicitly requested + if test "x$enable_dependency_tracking" = xyes; then + continue + else + break + fi + ;; + none) break ;; + esac + # We check with `-c' and `-o' for the sake of the "dashmstdout" + # mode. It turns out that the SunPro C++ compiler does not properly + # handle `-M -o', and we need to detect this. + if depmode=$depmode \ + source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ + depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ + $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ + >/dev/null 2>conftest.err && + grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && + grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && + ${MAKE-make} -s -f confmf > /dev/null 2>&1; then + # icc doesn't choke on unknown options, it will just issue warnings + # or remarks (even with -Werror). So we grep stderr for any message + # that says an option was ignored or not supported. + # When given -MP, icc 7.0 and 7.1 complain thusly: + # icc: Command line warning: ignoring option '-M'; no argument required + # The diagnosis changed in icc 8.0: + # icc: Command line remark: option '-MP' not supported + if (grep 'ignoring option' conftest.err || + grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else + am_cv_CC_dependencies_compiler_type=$depmode + break + fi + fi + done -fi -fi -ac_ct_CC=$ac_cv_prog_ac_ct_CC -if test -n "$ac_ct_CC"; then - echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 -echo "${ECHO_T}$ac_ct_CC" >&6 + cd .. + rm -rf conftest.dir else - echo "$as_me:$LINENO: result: no" >&5 -echo "${ECHO_T}no" >&6 + am_cv_CC_dependencies_compiler_type=none fi - test -n "$ac_ct_CC" && break -done - - CC=$ac_ct_CC fi +echo "$as_me:$LINENO: result: $am_cv_CC_dependencies_compiler_type" >&5 +echo "${ECHO_T}$am_cv_CC_dependencies_compiler_type" >&6 +CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type + + +if + test "x$enable_dependency_tracking" != xno \ + && test "$am_cv_CC_dependencies_compiler_type" = gcc3; then + am__fastdepCC_TRUE= + am__fastdepCC_FALSE='#' +else + am__fastdepCC_TRUE='#' + am__fastdepCC_FALSE= fi -test -z "$CC" && { { echo "$as_me:$LINENO: error: no acceptable C compiler found in \$PATH -See \`config.log' for more details." >&5 -echo "$as_me: error: no acceptable C compiler found in \$PATH -See \`config.log' for more details." >&2;} - { (exit 1); exit 1; }; } -# Provide some information about the compiler. -echo "$as_me:$LINENO:" \ - "checking for C compiler version" >&5 -ac_compiler=`set X $ac_compile; echo $2` -{ (eval echo "$as_me:$LINENO: \"$ac_compiler --version &5\"") >&5 - (eval $ac_compiler --version &5) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } -{ (eval echo "$as_me:$LINENO: \"$ac_compiler -v &5\"") >&5 - (eval $ac_compiler -v &5) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } -{ (eval echo "$as_me:$LINENO: \"$ac_compiler -V &5\"") >&5 - (eval $ac_compiler -V &5) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } -echo "$as_me:$LINENO: checking whether we are using the GNU C compiler" >&5 -echo $ECHO_N "checking whether we are using the GNU C compiler... $ECHO_C" >&6 -if test "${ac_cv_c_compiler_gnu+set}" = set; then +if test "$disable_libcheck" != "yes" +then + +echo "$as_me:$LINENO: checking for ibv_get_device_list in -libverbs" >&5 +echo $ECHO_N "checking for ibv_get_device_list in -libverbs... $ECHO_C" >&6 +if test "${ac_cv_lib_ibverbs_ibv_get_device_list+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else - cat >conftest.$ac_ext <<_ACEOF + ac_check_lib_save_LIBS=$LIBS +LIBS="-libverbs $LIBS" +cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ +/* Override any gcc2 internal prototype to avoid an error. */ +#ifdef __cplusplus +extern "C" +#endif +/* We use char because int might match the return type of a gcc2 + builtin and then its argument prototype would still apply. */ +char ibv_get_device_list (); int main () { -#ifndef __GNUC__ - choke me -#endif - +ibv_get_device_list (); ; return 0; } _ACEOF +rm -f conftest.$ac_objext conftest$ac_exeext +if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 + (eval $ac_link) 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } && + { ac_try='test -z "$ac_c_werror_flag" + || test ! -s conftest.err' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; } && + { ac_try='test -s conftest$ac_exeext' + { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 + (eval $ac_try) 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); }; }; then + ac_cv_lib_ibverbs_ibv_get_device_list=yes +else + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 + +ac_cv_lib_ibverbs_ibv_get_device_list=no +fi +rm -f conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +echo "$as_me:$LINENO: result: $ac_cv_lib_ibverbs_ibv_get_device_list" >&5 +echo "${ECHO_T}$ac_cv_lib_ibverbs_ibv_get_device_list" >&6 +if test $ac_cv_lib_ibverbs_ibv_get_device_list = yes; then + cat >>confdefs.h <<_ACEOF +#define HAVE_LIBIBVERBS 1 +_ACEOF + + LIBS="-libverbs $LIBS" + +else + { { echo "$as_me:$LINENO: error: libibverbs not installed" >&5 +echo "$as_me: error: libibverbs not installed" >&2;} + { (exit 1); exit 1; }; } +fi + + +if test "${ac_cv_header_infiniband_driver_h+set}" = set; then + echo "$as_me:$LINENO: checking for infiniband/driver.h" >&5 +echo $ECHO_N "checking for infiniband/driver.h... $ECHO_C" >&6 +if test "${ac_cv_header_infiniband_driver_h+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +fi +echo "$as_me:$LINENO: result: $ac_cv_header_infiniband_driver_h" >&5 +echo "${ECHO_T}$ac_cv_header_infiniband_driver_h" >&6 +else + # Is the header compilable? +echo "$as_me:$LINENO: checking infiniband/driver.h usability" >&5 +echo $ECHO_N "checking infiniband/driver.h usability... $ECHO_C" >&6 +cat >conftest.$ac_ext <<_ACEOF +/* confdefs.h. */ +_ACEOF +cat confdefs.h >>conftest.$ac_ext +cat >>conftest.$ac_ext <<_ACEOF +/* end confdefs.h. */ +$ac_includes_default +#include +_ACEOF rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 @@ -20363,26 +19485,117 @@ if { (eval echo "$as_me:$LINENO: \"$ac_c ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - ac_compiler_gnu=yes + ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -ac_compiler_gnu=no +ac_header_compiler=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext -ac_cv_c_compiler_gnu=$ac_compiler_gnu +echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 +echo "${ECHO_T}$ac_header_compiler" >&6 + +# Is the header present? +echo "$as_me:$LINENO: checking infiniband/driver.h presence" >&5 +echo $ECHO_N "checking infiniband/driver.h presence... $ECHO_C" >&6 +cat >conftest.$ac_ext <<_ACEOF +/* confdefs.h. */ +_ACEOF +cat confdefs.h >>conftest.$ac_ext +cat >>conftest.$ac_ext <<_ACEOF +/* end confdefs.h. */ +#include +_ACEOF +if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 + (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 + ac_status=$? + grep -v '^ *+' conftest.er1 >conftest.err + rm -f conftest.er1 + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } >/dev/null; then + if test -s conftest.err; then + ac_cpp_err=$ac_c_preproc_warn_flag + ac_cpp_err=$ac_cpp_err$ac_c_werror_flag + else + ac_cpp_err= + fi +else + ac_cpp_err=yes +fi +if test -z "$ac_cpp_err"; then + ac_header_preproc=yes +else + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 + ac_header_preproc=no fi -echo "$as_me:$LINENO: result: $ac_cv_c_compiler_gnu" >&5 -echo "${ECHO_T}$ac_cv_c_compiler_gnu" >&6 -GCC=`test $ac_compiler_gnu = yes && echo yes` -ac_test_CFLAGS=${CFLAGS+set} -ac_save_CFLAGS=$CFLAGS -CFLAGS="-g" -echo "$as_me:$LINENO: checking whether $CC accepts -g" >&5 -echo $ECHO_N "checking whether $CC accepts -g... $ECHO_C" >&6 -if test "${ac_cv_prog_cc_g+set}" = set; then +rm -f conftest.err conftest.$ac_ext +echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 +echo "${ECHO_T}$ac_header_preproc" >&6 + +# So? What about this header? +case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in + yes:no: ) + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: accepted by the compiler, rejected by the preprocessor!" >&5 +echo "$as_me: WARNING: infiniband/driver.h: accepted by the compiler, rejected by the preprocessor!" >&2;} + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: proceeding with the compiler's result" >&5 +echo "$as_me: WARNING: infiniband/driver.h: proceeding with the compiler's result" >&2;} + ac_header_preproc=yes + ;; + no:yes:* ) + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: present but cannot be compiled" >&5 +echo "$as_me: WARNING: infiniband/driver.h: present but cannot be compiled" >&2;} + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: check for missing prerequisite headers?" >&5 +echo "$as_me: WARNING: infiniband/driver.h: check for missing prerequisite headers?" >&2;} + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: see the Autoconf documentation" >&5 +echo "$as_me: WARNING: infiniband/driver.h: see the Autoconf documentation" >&2;} + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: section \"Present But Cannot Be Compiled\"" >&5 +echo "$as_me: WARNING: infiniband/driver.h: section \"Present But Cannot Be Compiled\"" >&2;} + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: proceeding with the preprocessor's result" >&5 +echo "$as_me: WARNING: infiniband/driver.h: proceeding with the preprocessor's result" >&2;} + { echo "$as_me:$LINENO: WARNING: infiniband/driver.h: in the future, the compiler will take precedence" >&5 +echo "$as_me: WARNING: infiniband/driver.h: in the future, the compiler will take precedence" >&2;} + ( + cat <<\_ASBOX +## ------------------------------------- ## +## Report this to ehcadd at sourceforge.net ## +## ------------------------------------- ## +_ASBOX + ) | + sed "s/^/$as_me: WARNING: /" >&2 + ;; +esac +echo "$as_me:$LINENO: checking for infiniband/driver.h" >&5 +echo $ECHO_N "checking for infiniband/driver.h... $ECHO_C" >&6 +if test "${ac_cv_header_infiniband_driver_h+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + ac_cv_header_infiniband_driver_h=$ac_header_preproc +fi +echo "$as_me:$LINENO: result: $ac_cv_header_infiniband_driver_h" >&5 +echo "${ECHO_T}$ac_cv_header_infiniband_driver_h" >&6 + +fi +if test $ac_cv_header_infiniband_driver_h = yes; then + : +else + { { echo "$as_me:$LINENO: error: not found. libehca requires libibverbs." >&5 +echo "$as_me: error: not found. libehca requires libibverbs." >&2;} + { (exit 1); exit 1; }; } +fi + + + + +for ac_func in ibv_read_sysfs_file +do +as_ac_var=`echo "ac_cv_func_$ac_func" | $as_tr_sh` +echo "$as_me:$LINENO: checking for $ac_func" >&5 +echo $ECHO_N "checking for $ac_func... $ECHO_C" >&6 +if eval "test \"\${$as_ac_var+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF @@ -20391,18 +19604,54 @@ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ +/* Define $ac_func to an innocuous variant, in case declares $ac_func. + For example, HP-UX 11i declares gettimeofday. */ +#define $ac_func innocuous_$ac_func + +/* System header to define __stub macros and hopefully few prototypes, + which can conflict with char $ac_func (); below. + Prefer to if __STDC__ is defined, since + exists even on freestanding compilers. */ + +#ifdef __STDC__ +# include +#else +# include +#endif + +#undef $ac_func + +/* Override any gcc2 internal prototype to avoid an error. */ +#ifdef __cplusplus +extern "C" +{ +#endif +/* We use char because int might match the return type of a gcc2 + builtin and then its argument prototype would still apply. */ +char $ac_func (); +/* The GNU C library defines this for functions which it implements + to always fail with ENOSYS. Some functions are actually named + something starting with __ and the normal name is an alias. */ +#if defined (__stub_$ac_func) || defined (__stub___$ac_func) +choke me +#else +char (*f) () = $ac_func; +#endif +#ifdef __cplusplus +} +#endif int main () { - +return f != $ac_func; ; return 0; } _ACEOF -rm -f conftest.$ac_objext -if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>conftest.er1 +rm -f conftest.$ac_objext conftest$ac_exeext +if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 + (eval $ac_link) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 @@ -20416,108 +19665,59 @@ if { (eval echo "$as_me:$LINENO: \"$ac_c ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; } && - { ac_try='test -s conftest.$ac_objext' + { ac_try='test -s conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - ac_cv_prog_cc_g=yes + eval "$as_ac_var=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -ac_cv_prog_cc_g=no +eval "$as_ac_var=no" fi -rm -f conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext fi -echo "$as_me:$LINENO: result: $ac_cv_prog_cc_g" >&5 -echo "${ECHO_T}$ac_cv_prog_cc_g" >&6 -if test "$ac_test_CFLAGS" = set; then - CFLAGS=$ac_save_CFLAGS -elif test $ac_cv_prog_cc_g = yes; then - if test "$GCC" = yes; then - CFLAGS="-g -O2" - else - CFLAGS="-g" - fi -else - if test "$GCC" = yes; then - CFLAGS="-O2" - else - CFLAGS= - fi +echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_var'}'`" >&5 +echo "${ECHO_T}`eval echo '${'$as_ac_var'}'`" >&6 +if test `eval echo '${'$as_ac_var'}'` = yes; then + cat >>confdefs.h <<_ACEOF +#define `echo "HAVE_$ac_func" | $as_tr_cpp` 1 +_ACEOF + fi -echo "$as_me:$LINENO: checking for $CC option to accept ANSI C" >&5 -echo $ECHO_N "checking for $CC option to accept ANSI C... $ECHO_C" >&6 -if test "${ac_cv_prog_cc_stdc+set}" = set; then +done + +fi + +for ac_header in sysfs/libsysfs.h +do +as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` +if eval "test \"\${$as_ac_Header+set}\" = set"; then + echo "$as_me:$LINENO: checking for $ac_header" >&5 +echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 +if eval "test \"\${$as_ac_Header+set}\" = set"; then echo $ECHO_N "(cached) $ECHO_C" >&6 +fi +echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 +echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 else - ac_cv_prog_cc_stdc=no -ac_save_CC=$CC + # Is the header compilable? +echo "$as_me:$LINENO: checking $ac_header usability" >&5 +echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ -#include -#include -#include -#include -/* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ -struct buf { int x; }; -FILE * (*rcsopen) (struct buf *, struct stat *, int); -static char *e (p, i) - char **p; - int i; -{ - return p[i]; -} -static char *f (char * (*g) (char **, int), char **p, ...) -{ - char *s; - va_list v; - va_start (v,p); - s = g (p, va_arg (v,int)); - va_end (v); - return s; -} - -/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has - function prototypes and stuff, but not '\xHH' hex character constants. - These don't provoke an error unfortunately, instead are silently treated - as 'x'. The following induces an error, until -std1 is added to get - proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an - array size at least. It's necessary to write '\x00'==0 to get something - that's true only with -std1. */ -int osf4_cc_array ['\x00' == 0 ? 1 : -1]; - -int test (int i, double x); -struct s1 {int (*f) (int a);}; -struct s2 {int (*f) (double a);}; -int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); -int argc; -char **argv; -int -main () -{ -return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; - ; - return 0; -} +$ac_includes_default +#include <$ac_header> _ACEOF -# Don't try gcc -ansi; that turns off useful extensions and -# breaks some systems' header files. -# AIX -qlanglvl=ansi -# Ultrix and OSF/1 -std1 -# HP-UX 10.20 and later -Ae -# HP-UX older versions -Aa -D_HPUX_SOURCE -# SVR4 -Xc -D__EXTENSIONS__ -for ac_arg in "" -qlanglvl=ansi -std1 -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" -do - CC="$ac_save_CC $ac_arg" - rm -f conftest.$ac_objext +rm -f conftest.$ac_objext if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 (eval $ac_compile) 2>conftest.er1 ac_status=$? @@ -20539,81 +19739,130 @@ if { (eval echo "$as_me:$LINENO: \"$ac_c ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - ac_cv_prog_cc_stdc=$ac_arg -break + ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 +ac_header_compiler=no fi -rm -f conftest.err conftest.$ac_objext -done -rm -f conftest.$ac_ext conftest.$ac_objext -CC=$ac_save_CC - -fi - -case "x$ac_cv_prog_cc_stdc" in - x|xno) - echo "$as_me:$LINENO: result: none needed" >&5 -echo "${ECHO_T}none needed" >&6 ;; - *) - echo "$as_me:$LINENO: result: $ac_cv_prog_cc_stdc" >&5 -echo "${ECHO_T}$ac_cv_prog_cc_stdc" >&6 - CC="$CC $ac_cv_prog_cc_stdc" ;; -esac +rm -f conftest.err conftest.$ac_objext conftest.$ac_ext +echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 +echo "${ECHO_T}$ac_header_compiler" >&6 -# Some people use a C++ compiler to compile C. Since we use `exit', -# in C++ we need to declare it. In case someone uses the same compiler -# for both compiling C and C++ we need to have the C++ compiler decide -# the declaration of exit, since it's the most demanding environment. +# Is the header present? +echo "$as_me:$LINENO: checking $ac_header presence" >&5 +echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6 cat >conftest.$ac_ext <<_ACEOF -#ifndef __cplusplus - choke me -#endif +/* confdefs.h. */ _ACEOF -rm -f conftest.$ac_objext -if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>conftest.er1 +cat confdefs.h >>conftest.$ac_ext +cat >>conftest.$ac_ext <<_ACEOF +/* end confdefs.h. */ +#include <$ac_header> +_ACEOF +if { (eval echo "$as_me:$LINENO: \"$ac_cpp conftest.$ac_ext\"") >&5 + (eval $ac_cpp conftest.$ac_ext) 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest.$ac_objext' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; }; then - for ac_declaration in \ - '' \ - 'extern "C" void std::exit (int) throw (); using std::exit;' \ - 'extern "C" void std::exit (int); using std::exit;' \ - 'extern "C" void exit (int) throw ();' \ - 'extern "C" void exit (int);' \ - 'void exit (int);' -do + (exit $ac_status); } >/dev/null; then + if test -s conftest.err; then + ac_cpp_err=$ac_c_preproc_warn_flag + ac_cpp_err=$ac_cpp_err$ac_c_werror_flag + else + ac_cpp_err= + fi +else + ac_cpp_err=yes +fi +if test -z "$ac_cpp_err"; then + ac_header_preproc=yes +else + echo "$as_me: failed program was:" >&5 +sed 's/^/| /' conftest.$ac_ext >&5 + + ac_header_preproc=no +fi +rm -f conftest.err conftest.$ac_ext +echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 +echo "${ECHO_T}$ac_header_preproc" >&6 + +# So? What about this header? +case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in + yes:no: ) + { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 +echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} + { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 +echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} + ac_header_preproc=yes + ;; + no:yes:* ) + { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 +echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} + { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 +echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} + { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 +echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} + { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 +echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} + { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 +echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} + { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 +echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} + ( + cat <<\_ASBOX +## ------------------------------------- ## +## Report this to ehcadd at sourceforge.net ## +## ------------------------------------- ## +_ASBOX + ) | + sed "s/^/$as_me: WARNING: /" >&2 + ;; +esac +echo "$as_me:$LINENO: checking for $ac_header" >&5 +echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6 +if eval "test \"\${$as_ac_Header+set}\" = set"; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else + eval "$as_ac_Header=\$ac_header_preproc" +fi +echo "$as_me:$LINENO: result: `eval echo '${'$as_ac_Header'}'`" >&5 +echo "${ECHO_T}`eval echo '${'$as_ac_Header'}'`" >&6 + +fi +if test `eval echo '${'$as_ac_Header'}'` = yes; then + cat >>confdefs.h <<_ACEOF +#define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 +_ACEOF + +fi + +done + + +echo "$as_me:$LINENO: checking for ANSI C header files" >&5 +echo $ECHO_N "checking for ANSI C header files... $ECHO_C" >&6 +if test "${ac_cv_header_stdc+set}" = set; then + echo $ECHO_N "(cached) $ECHO_C" >&6 +else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ -$ac_declaration #include +#include +#include +#include + int main () { -exit (42); + ; return 0; } @@ -20640,188 +19889,127 @@ if { (eval echo "$as_me:$LINENO: \"$ac_c ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - : + ac_cv_header_stdc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -continue +ac_cv_header_stdc=no fi rm -f conftest.err conftest.$ac_objext conftest.$ac_ext + +if test $ac_cv_header_stdc = yes; then + # SunOS 4.x string.h does not declare mem*, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ -$ac_declaration +#include + +_ACEOF +if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | + $EGREP "memchr" >/dev/null 2>&1; then + : +else + ac_cv_header_stdc=no +fi +rm -f conftest* + +fi + +if test $ac_cv_header_stdc = yes; then + # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. + cat >conftest.$ac_ext <<_ACEOF +/* confdefs.h. */ +_ACEOF +cat confdefs.h >>conftest.$ac_ext +cat >>conftest.$ac_ext <<_ACEOF +/* end confdefs.h. */ +#include + +_ACEOF +if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | + $EGREP "free" >/dev/null 2>&1; then + : +else + ac_cv_header_stdc=no +fi +rm -f conftest* + +fi + +if test $ac_cv_header_stdc = yes; then + # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. + if test "$cross_compiling" = yes; then + : +else + cat >conftest.$ac_ext <<_ACEOF +/* confdefs.h. */ +_ACEOF +cat confdefs.h >>conftest.$ac_ext +cat >>conftest.$ac_ext <<_ACEOF +/* end confdefs.h. */ +#include +#if ((' ' & 0x0FF) == 0x020) +# define ISLOWER(c) ('a' <= (c) && (c) <= 'z') +# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) +#else +# define ISLOWER(c) \ + (('a' <= (c) && (c) <= 'i') \ + || ('j' <= (c) && (c) <= 'r') \ + || ('s' <= (c) && (c) <= 'z')) +# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) +#endif + +#define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) int main () { -exit (42); - ; - return 0; + int i; + for (i = 0; i < 256; i++) + if (XOR (islower (i), ISLOWER (i)) + || toupper (i) != TOUPPER (i)) + exit(2); + exit (0); } _ACEOF -rm -f conftest.$ac_objext -if { (eval echo "$as_me:$LINENO: \"$ac_compile\"") >&5 - (eval $ac_compile) 2>conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 >conftest.err - rm -f conftest.er1 - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } && - { ac_try='test -z "$ac_c_werror_flag" - || test ! -s conftest.err' - { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 - (eval $ac_try) 2>&5 +rm -f conftest$ac_exeext +if { (eval echo "$as_me:$LINENO: \"$ac_link\"") >&5 + (eval $ac_link) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); }; } && - { ac_try='test -s conftest.$ac_objext' + (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (eval echo "$as_me:$LINENO: \"$ac_try\"") >&5 (eval $ac_try) 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then - break + : else - echo "$as_me: failed program was:" >&5 + echo "$as_me: program exited with status $ac_status" >&5 +echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 +( exit $ac_status ) +ac_cv_header_stdc=no fi -rm -f conftest.err conftest.$ac_objext conftest.$ac_ext -done -rm -f conftest* -if test -n "$ac_declaration"; then - echo '#ifdef __cplusplus' >>confdefs.h - echo $ac_declaration >>confdefs.h - echo '#endif' >>confdefs.h -fi - -else - echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - +rm -f core *.core gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi -rm -f conftest.err conftest.$ac_objext conftest.$ac_ext -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - -depcc="$CC" am_compiler_list= - -echo "$as_me:$LINENO: checking dependency style of $depcc" >&5 -echo $ECHO_N "checking dependency style of $depcc... $ECHO_C" >&6 -if test "${am_cv_CC_dependencies_compiler_type+set}" = set; then - echo $ECHO_N "(cached) $ECHO_C" >&6 -else - if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then - # We make a subdir and do the tests there. Otherwise we can end up - # making bogus files that we don't know about and never remove. For - # instance it was reported that on HP-UX the gcc test will end up - # making a dummy file named `D' -- because `-MD' means `put the output - # in D'. - mkdir conftest.dir - # Copy depcomp to subdir because otherwise we won't find it if we're - # using a relative directory. - cp "$am_depcomp" conftest.dir - cd conftest.dir - # We will build objects and dependencies in a subdirectory because - # it helps to detect inapplicable dependency modes. For instance - # both Tru64's cc and ICC support -MD to output dependencies as a - # side effect of compilation, but ICC will put the dependencies in - # the current directory while Tru64 will put them in the object - # directory. - mkdir sub - - am_cv_CC_dependencies_compiler_type=none - if test "$am_compiler_list" = ""; then - am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` - fi - for depmode in $am_compiler_list; do - # Setup a source with many dependencies, because some compilers - # like to wrap large dependency lists on column 80 (with \), and - # we should not choose a depcomp mode which is confused by this. - # - # We need to recreate these files for each test, as the compiler may - # overwrite some of them when testing with obscure command lines. - # This happens at least with the AIX C compiler. - : > sub/conftest.c - for i in 1 2 3 4 5 6; do - echo '#include "conftst'$i'.h"' >> sub/conftest.c - # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with - # Solaris 8's {/usr,}/bin/sh. - touch sub/conftst$i.h - done - echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf - - case $depmode in - nosideeffect) - # after this tag, mechanisms are not by side-effect, so they'll - # only be used when explicitly requested - if test "x$enable_dependency_tracking" = xyes; then - continue - else - break - fi - ;; - none) break ;; - esac - # We check with `-c' and `-o' for the sake of the "dashmstdout" - # mode. It turns out that the SunPro C++ compiler does not properly - # handle `-M -o', and we need to detect this. - if depmode=$depmode \ - source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ - depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ - $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ - >/dev/null 2>conftest.err && - grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && - grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && - ${MAKE-make} -s -f confmf > /dev/null 2>&1; then - # icc doesn't choke on unknown options, it will just issue warnings - # or remarks (even with -Werror). So we grep stderr for any message - # that says an option was ignored or not supported. - # When given -MP, icc 7.0 and 7.1 complain thusly: - # icc: Command line warning: ignoring option '-M'; no argument required - # The diagnosis changed in icc 8.0: - # icc: Command line remark: option '-MP' not supported - if (grep 'ignoring option' conftest.err || - grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else - am_cv_CC_dependencies_compiler_type=$depmode - break - fi - fi - done - - cd .. - rm -rf conftest.dir -else - am_cv_CC_dependencies_compiler_type=none fi - fi -echo "$as_me:$LINENO: result: $am_cv_CC_dependencies_compiler_type" >&5 -echo "${ECHO_T}$am_cv_CC_dependencies_compiler_type" >&6 -CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type - +echo "$as_me:$LINENO: result: $ac_cv_header_stdc" >&5 +echo "${ECHO_T}$ac_cv_header_stdc" >&6 +if test $ac_cv_header_stdc = yes; then +cat >>confdefs.h <<\_ACEOF +#define STDC_HEADERS 1 +_ACEOF -if - test "x$enable_dependency_tracking" != xno \ - && test "$am_cv_CC_dependencies_compiler_type" = gcc3; then - am__fastdepCC_TRUE= - am__fastdepCC_FALSE='#' -else - am__fastdepCC_TRUE='#' - am__fastdepCC_FALSE= fi - ac_config_files="$ac_config_files Makefile" cat >confcache <<\_ACEOF From xma at us.ibm.com Wed Oct 18 10:10:01 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 18 Oct 2006 10:10:01 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Roland Dreier wrote on 10/17/2006 08:41:59 PM: > Anyway, I'm eagerly awaiting your NAPI results with ehca. > > Thanks, > Roland Thanks. The touch test results are not good. This NAPI patch induces huge latency for ehca driver scaling code, the throughput performance is not good. (I am not fully conviced the huge latency is because of raising NAPI in thread context.) Then I tried ehca no scaling driver, the latency looks good, but the throughtput is still a problem. We are working on these issues. Hopefully we can get the answer soon. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Oct 18 11:39:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 18 Oct 2006 20:39:28 +0200 Subject: [openib-general] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <200610181901.55412.hnguyen@de.ibm.com> References: <200610181901.55412.hnguyen@de.ibm.com> Message-ID: <20061018183928.GC30350@mellanox.co.il> Quoting r. Hoang-Nam Nguyen : > Subject: [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs > > Hello, > here is the patch of configure in libehca as a result of the patch > "libehca configure.in and config.h.in". It is generated by autogen.sh > and pretty lengthy. Hence, I'm attaching it here for completeness. > Vlad, do you want me to check it in svn or send you the whole file? > Thanks! > Nam Do we really want generated files in svn? Why? -- MST From sashak at voltaire.com Wed Oct 18 11:50:49 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 18 Oct 2006 20:50:49 +0200 Subject: [openib-general] Tools for development In-Reply-To: <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <1161096883.30946.2.camel@stevo-desktop> <20061017150442.GA22531@mellanox.co.il> <20061017162146.GB26226@sashak.voltaire.com> <350DF1C2-673D-44EE-A529-E4A471F50BFA@cisco.com> Message-ID: <20061018185049.GF13749@sashak.voltaire.com> On 08:12 Wed 18 Oct , Jeff Squyres wrote: > I was not on the call last week, but I understand that there was some > discussion about exactly this point (ditch SVN and go 100% git): the > decision was to stick with SVN for userspace stuff and stick with git > for kernel stuff. > > However, this is a larger audience than was on the call. Is there a > significant movement here from the developers to move to 100% git? Moving (or not moving) userspace to git could be done on per project basis (as actually suggested by Michael). Personally I'm voting for git. Sasha > > (I don't really care) > > > On Oct 17, 2006, at 12:21 PM, Sasha Khapyorsky wrote: > > >On 17:04 Tue 17 Oct , Michael S. Tsirkin wrote: > >>Quoting r. Steve Wise : > >>>At the risk of opening a can of worms, is there any reason we > >>>don't move > >>>the user stuff into its own git tree? This would get rid of svn > >>>altogether... > >> > >>If we do, that should probably be multiple git trees - verbs, > >>management, > >>tests are all more or less independent and developed mostly by > >>different people. > > > >Reasonable. And generally this should not be too bad. > > > >Sasha > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit http://openib.org/mailman/listinfo/ > >openib-general > > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > From weiny2 at llnl.gov Wed Oct 18 13:13:17 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Wed, 18 Oct 2006 13:13:17 -0700 Subject: [openib-general] Catastrophic error detected. Message-ID: <20061018131317.1d187ad1.weiny2@llnl.gov> I got the following error running with OFED 1.1 on a modified 2.6.9 RHEL4 kernel. Hal mentioned that there might be a catastrophic error recovery patch submitted since then? I can't find a mention of that in the mailing list. If possible I would like to try such a patch. Thanks, Ira 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: Catastrophic error detected: unknown error 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[00]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[01]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[02]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[03]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[04]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[05]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[06]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[07]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[08]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[09]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0a]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0b]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0c]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0d]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0e]: ffffffff 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0f]: ffffffff # rhea277 /root > /sbin/lspci -vv -s 07:00.0 07:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev 20) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- References: <20061018131317.1d187ad1.weiny2@llnl.gov> Message-ID: <20061018202400.GA30696@mellanox.co.il> Quoting r. Ira Weiny : > Subject: Catastrophic error detected. > > I got the following error running with OFED 1.1 on a modified 2.6.9 RHEL4 > kernel. Hal mentioned that there might be a catastrophic error recovery patch > submitted since then? I can't find a mention of that in the mailing list. If > possible I would like to try such a patch. > > Thanks, > Ira > > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: Catastrophic error detected: unknown error > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[00]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[01]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[02]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[03]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[04]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[05]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[06]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[07]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[08]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[09]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0a]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0b]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0c]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0d]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0e]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0f]: ffffffff OFED 1.1 will already try to recover. But the fact that you got ffffffff indicates its a hard error that we couldn't recover from. -- MST From HNGUYEN at de.ibm.com Wed Oct 18 13:30:22 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Wed, 18 Oct 2006 22:30:22 +0200 Subject: [openib-general] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <20061018183928.GC30350@mellanox.co.il> Message-ID: Hi, > Do we really want generated files in svn? Why? No. I was unsure if it's in ofed branch. And you're right, no need to. Ignore this! Thanks Nam From rdreier at cisco.com Wed Oct 18 13:43:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 13:43:03 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061018051458.GB25360@obsidianresearch.com> (Jason Gunthorpe's message of "Tue, 17 Oct 2006 23:14:58 -0600") References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> <20061018042003.GA25360@obsidianresearch.com> <20061018044353.GA24817@mellanox.co.il> <20061018051458.GB25360@obsidianresearch.com> Message-ID: > I just look a quick look at the directory setup and if you are > changing things I'd say you should also arrange to have the libibverbs > soname stamped into the plugin path and soname. Something like > libmthca-libibverbs.2.so.0. Once you do that it is pretty safe > to put it in /usr/lib* That makes sense (although I guess it would be libmthca-libibverbs.2.so without the .0, since libmthca is just a plugin that doesn't have an independent soname of its own). Then we could have each plugin drop a file in /etc/libibverbs.conf.d/ with the name -- something like driver mthca (and possibly also read $HOME/.libibverbs.conf if desired) The only two things I need to figure out, I hope with help from smarter people: - What is the autoconf/automake chicanery needed to make the libmthca figure out the right libibverbs soname to stick in the name of the .so it installs? - And what is the autoconf/automake chicanery needed to fall back to having libmthca install plain mthca.so under /usr/lib/infiniband when it detects that it is being built against libibverbs 1.0? - R. From rdreier at cisco.com Wed Oct 18 13:55:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 13:55:13 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: (Shirley Ma's message of "Wed, 18 Oct 2006 10:10:01 -0700") References: Message-ID: > Thanks. The touch test results are not good. This NAPI patch induces huge > latency for ehca driver scaling code, the throughput performance is not > good. (I am not fully conviced the huge latency is because of raising NAPI > in thread context.) Then I tried ehca no scaling driver, the latency looks > good, but the throughtput is still a problem. We are working on these > issues. Hopefully we can get the answer soon. Hmm, the results with "scaling" on are not that unexpected, since the idea of scheduling a thread round-robin (to kill all cache locality) is pretty dubious anyway. I would like to understand why there's a throughput difference with scaling turned off, since the NAPI code doesn't change the interrupt handling all that much, and should lower the CPU usage if anything. Does changing the netdev weight value affect anything? - R. From jgunthorpe at obsidianresearch.com Wed Oct 18 14:00:08 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 18 Oct 2006 15:00:08 -0600 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> <20061018042003.GA25360@obsidianresearch.com> <20061018044353.GA24817@mellanox.co.il> <20061018051458.GB25360@obsidianresearch.com> Message-ID: <20061018210008.GV4054@obsidianresearch.com> On Wed, Oct 18, 2006 at 01:43:03PM -0700, Roland Dreier wrote: > The only two things I need to figure out, I hope with help from > smarter people: I'm by no means an expert, but this might be helpfull to someone who is: AC_DEFUN(rc_LIBSTDCPP_VER, [AC_MSG_CHECKING([libstdc++ version]) dummy=if$$ cat <<_LIBSTDCPP_>$dummy.cc #include #include #include int main(int argc, char **argv) { exit(0); } _LIBSTDCPP_ ${CXX-c++} $dummy.cc -o $dummy > /dev/null 2>&1 if test "$?" = 0; then soname=`objdump -p ./$dummy |grep NEEDED|grep libstd` LIBSTDCPP_VER=`echo $soname | sed -e 's/.*NEEDED.*libstdc++\(-libc.*\(-.*\)\)\?.so.\(.*\)/\3\2/'` fi rm -f $dummy $dummy.cc if test -z "$LIBSTDCPP_VER"; then AC_MSG_WARN([cannot determine standard C++ library version number]) else AC_MSG_RESULT([$LIBSTDCPP_VER]) LIBSTDCPP_VER="-$LIBSTDCPP_VER" fi AC_SUBST(LIBSTDCPP_VER) ]) This is a fragment from another project I have that stamps a soname with the libstdc++ soname (libstdc++ causes a similar issue). The basic idea is to compile a dummy program and link it with the target library then use objdump to extract the soname and assign a substition variable. That bit goes in aclocal.m4 Once you have the subsitition I think a conditional fragment in the makefile should be enough to solve the second problem. Jason From xma at us.ibm.com Wed Oct 18 14:12:27 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 18 Oct 2006 14:12:27 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Roland Dreier wrote on 10/18/2006 01:55:13 PM: > I would like to understand why there's a throughput difference with > scaling turned off, since the NAPI code doesn't change the interrupt > handling all that much, and should lower the CPU usage if anything. That's I am trying to understand now. Yes, the send side rate dropped significant, cpu usage lower as well. > Does changing the netdev weight value affect anything? > > - R. No, it doesn't. Thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Wed Oct 18 14:23:34 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 18 Oct 2006 14:23:34 -0700 Subject: [openib-general] Cisco SQA Results for OFED 1.1 rc7 Message-ID: Regression testing went well, using Cisco switches and Cisco (Mellanox) HCAs. See attached spreadsheet for more details. The following increase in testing happened: * Started testing SLES10 IA32 (will have IA64 and PPC64 results for pre1). * Switched to HP MPI 2.2.5, which is first version to support OF. The following bugs were tested and closed. * 247 OFED IPoIB HA not working on RHEL4 U3 * 259 problems with OFED IPoIB HA on SLES10 * 173 OFED mpitests: add osu_{bw,latency,bibw,bcast}.c examples The following bugs were opened, but all have been marked fixed in pre1, thanks Mellanox folks for the quick response. * 273 OFED 1.1 rc7 does not work with Cisco FC Gateway * 274 OFED 1.1: RENICE_IB_MAD=yes hangs dual-HCA system with dual-port HCAs * 277 OFED 1.1 rc7: uninitialized value during IPoIB failover in ipoib_ha.pl * 278 OFED 1.1: two copies of openib.spec in openib-1.1.tgz Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ofed_sqa_results.xls Type: application/vnd.ms-excel Size: 205312 bytes Desc: ofed_sqa_results.xls URL: From troy at scl.ameslab.gov Wed Oct 18 15:16:28 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Wed, 18 Oct 2006 17:16:28 -0500 Subject: [openib-general] ibv_reg_mr temporary vs permanent errors Message-ID: <0E8782D7-E8BE-4DBC-AFB9-41AC405F3BAA@scl.ameslab.gov> If ibv_reg_mr fails, can an application (or library, such as pvfs) assume that this is just a temporary error, and try to deregister some memory, then try again? How can we differentiate between the case where the hardware (such as ehca) actually has more information about why the memory registration failed, and the application can act on that information (by coalescing memory regions, for example), vs cases where something is just plain broken and the application should give up and exit. From Sujal at Mellanox.com Wed Oct 18 16:09:27 2006 From: Sujal at Mellanox.com (Sujal Das) Date: Wed, 18 Oct 2006 16:09:27 -0700 Subject: [openib-general] [openfabrics-ewg] Cisco SQA Results for OFED 1.1 rc7 Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F40D25C@mtiexch01.mti.com> Scott, thanks for the report. Based on this, it looks like Cisco did not test the SRP initiator and HA functions with any SRP targets. Is that a fair assessment? ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Wednesday, October 18, 2006 2:24 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openfabrics-ewg] Cisco SQA Results for OFED 1.1 rc7 Regression testing went well, using Cisco switches and Cisco (Mellanox) HCAs. See attached spreadsheet for more details. The following increase in testing happened: * Started testing SLES10 IA32 (will have IA64 and PPC64 results for pre1). * Switched to HP MPI 2.2.5, which is first version to support OF. The following bugs were tested and closed. * 247 OFED IPoIB HA not working on RHEL4 U3 * 259 problems with OFED IPoIB HA on SLES10 * 173 OFED mpitests: add osu_{bw,latency,bibw,bcast}.c examples The following bugs were opened, but all have been marked fixed in pre1, thanks Mellanox folks for the quick response. * 273 OFED 1.1 rc7 does not work with Cisco FC Gateway * 274 OFED 1.1: RENICE_IB_MAD=yes hangs dual-HCA system with dual-port HCAs * 277 OFED 1.1 rc7: uninitialized value during IPoIB failover in ipoib_ha.pl * 278 OFED 1.1: two copies of openib.spec in openib-1.1.tgz Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed Oct 18 16:25:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 16:25:21 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061018210008.GV4054@obsidianresearch.com> (Jason Gunthorpe's message of "Wed, 18 Oct 2006 15:00:08 -0600") References: <20061017214218.GU23922@mellanox.co.il> <20061018001005.GQ4054@obsidianresearch.com> <20061018042003.GA25360@obsidianresearch.com> <20061018044353.GA24817@mellanox.co.il> <20061018051458.GB25360@obsidianresearch.com> <20061018210008.GV4054@obsidianresearch.com> Message-ID: > AC_DEFUN(rc_LIBSTDCPP_VER, Thanks -- this actually solves the easiest part of my problem, and does it in a way that's not really useful for me (libibverbs needs to know what extra bits are getting added to plugin names, and with this technique, it would have to know what the final libary name was going to be, before it got built). So I think I need to stick the extra plugin library name into a define in . But seeing this code led me to information that solves everything else I was worried about. The libtool flag "-release" is what I need to add gunk to the final .so's name, and I think backward compatibility can be handled pretty easily too. So thanks... > That bit goes in aclocal.m4 Yeah, I'd hide code like that too rather than let anyone see it in my configure.in ;) - R. From sweitzen at cisco.com Wed Oct 18 17:07:00 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 18 Oct 2006 17:07:00 -0700 Subject: [openib-general] [openfabrics-ewg] Cisco SQA Results for OFED 1.1 rc7 Message-ID: SRP got broken in rc7 for the Cisco Fibre Channel gateway, so we couldn't test it with that. We have started testing with DDN IB storage, but don't have test results to share yet. I'm sad to report no SRP HA testing in Cisco SQA yet. It's next on the todo list (right after IPoIB HA). Scott ________________________________ From: Sujal Das [mailto:Sujal at mellanox.com] Sent: Wednesday, October 18, 2006 4:09 PM To: Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: RE: [openfabrics-ewg] Cisco SQA Results for OFED 1.1 rc7 Scott, thanks for the report. Based on this, it looks like Cisco did not test the SRP initiator and HA functions with any SRP targets. Is that a fair assessment? ________________________________ From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Wednesday, October 18, 2006 2:24 PM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openfabrics-ewg] Cisco SQA Results for OFED 1.1 rc7 Regression testing went well, using Cisco switches and Cisco (Mellanox) HCAs. See attached spreadsheet for more details. The following increase in testing happened: * Started testing SLES10 IA32 (will have IA64 and PPC64 results for pre1). * Switched to HP MPI 2.2.5, which is first version to support OF. The following bugs were tested and closed. * 247 OFED IPoIB HA not working on RHEL4 U3 * 259 problems with OFED IPoIB HA on SLES10 * 173 OFED mpitests: add osu_{bw,latency,bibw,bcast}.c examples The following bugs were opened, but all have been marked fixed in pre1, thanks Mellanox folks for the quick response. * 273 OFED 1.1 rc7 does not work with Cisco FC Gateway * 274 OFED 1.1: RENICE_IB_MAD=yes hangs dual-HCA system with dual-port HCAs * 277 OFED 1.1 rc7: uninitialized value during IPoIB failover in ipoib_ha.pl * 278 OFED 1.1: two copies of openib.spec in openib-1.1.tgz Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Wed Oct 18 17:11:51 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 18 Oct 2006 17:11:51 -0700 Subject: [openib-general] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? Message-ID: What testing did these companies do with rc7? I'd kinda like to see performance data for the QLogic and IBM HCAs... Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Wed Oct 18 17:39:34 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 18 Oct 2006 18:39:34 -0600 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061018001005.GQ4054@obsidianresearch.com> <20061018042003.GA25360@obsidianresearch.com> <20061018044353.GA24817@mellanox.co.il> <20061018051458.GB25360@obsidianresearch.com> <20061018210008.GV4054@obsidianresearch.com> Message-ID: <20061019003934.GW4054@obsidianresearch.com> On Wed, Oct 18, 2006 at 04:25:21PM -0700, Roland Dreier wrote: > > AC_DEFUN(rc_LIBSTDCPP_VER, > Thanks -- this actually solves the easiest part of my problem, and > does it in a way that's not really useful for me (libibverbs needs to > know what extra bits are getting added to plugin names, and with this > technique, it would have to know what the final libary name was going > to be, before it got built). So I think I need to stick the extra > plugin library name into a define in . Right, thats exactly what should be done in ibverbs. The general technique from that example is what you'd put in userspace/libmtcha/configure.in if I'm groking this build process properly.. Jason From mst at mellanox.co.il Wed Oct 18 17:53:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 02:53:26 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061019005326.GD30696@mellanox.co.il> Quoting r. Roland Dreier : > could have each plugin drop a file in /etc/libibverbs.conf.d/ with the > name -- something like OK, feature request time :) Hopefully that is under prefix: $prefix/etc/libibverbs.conf.d/ and I think an environment with a list of additional directories would also be helpful. Finally, it might be nice to be able to just specify the list of plugins at configure time for people like me who buuild everything from source and who want less flexibility but also less files to install. -- MST From rdreier at cisco.com Wed Oct 18 18:03:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 18:03:19 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061019005326.GD30696@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 19 Oct 2006 02:53:26 +0200") References: <20061019005326.GD30696@mellanox.co.il> Message-ID: > Hopefully that is under prefix: > $prefix/etc/libibverbs.conf.d/ Well, $sysconfdir/libibverbs.conf.d > and I think an environment with a list of additional directories > would also be helpful. Is that really necessary? Just stick whatever you want into $HOME/.libibverbs.conf. > Finally, it might be nice to be able to just specify the list of > plugins at configure time for people like me who buuild everything > from source and who want less flexibility > but also less files to install. Again, is that really any easier than putting whatever you want into your .libibverbs.conf? I definitely plan to make it so a missing plug-in is not fatal, so it shouldn't hurt to have extra drivers declared that you don't build every time. - R. From mst at mellanox.co.il Wed Oct 18 18:33:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 03:33:14 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061019013313.GE30696@mellanox.co.il> Quoting r. Roland Dreier : > > Hopefully that is under prefix: > > $prefix/etc/libibverbs.conf.d/ > > Well, $sysconfdir/libibverbs.conf.d Ugh, is that a problem if I want to build and run as non-root? I'm used to be able to set --prefix on config line for all libs to some directory, put LD_LIBRARY_PATH to point there, then if I like I just blow all of it away and I get a clean system. Scattering config files around in home directory etc will break this. > > Finally, it might be nice to be able to just specify the list of > > plugins at configure time for people like me who buuild everything > > from source and who want less flexibility > > but also less files to install. > > Again, is that really any easier Well, I'm thinking of distributed systems mainly where copying extra files around is additional pain. Consider myself: I'm building things on my laptop, then pushing them out to machines in the lab over rsync for testing. Less files - less headache. > than putting whatever you want into > your .libibverbs.conf? I really don't think a library sticking things in user's home directory is such a great idea - typical users don't really know they link against some library, this is just an extra place that users can break: move to another machine, things stop working, and your app's manual does not say anything of course. > I definitely plan to make it so a missing plug-in is not fatal, so it > shouldn't hurt to have extra drivers declared that you don't build > every time. Not until someone decides to rename a plugin for some reason - then you have to hunt down and kill the old file name to prevent an old version stuck in library path for some reason from being loaded - easy with the central location, but good luck walking all user's home directories. -- MST From mst at mellanox.co.il Wed Oct 18 18:48:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 03:48:58 +0200 Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.4.3 Message-ID: <20061019014858.GF30696@mellanox.co.il> OK, as promised, I'm forwarding the announcement of the stable git release. Let's update the git server with that (I think it has 1.4 at the moment?). git-daemon and gitweb improvements are the most useful, I think. -- MST -------------- next part -------------- An embedded message was scrubbed... From: "Junio C Hamano" Subject: [ANNOUNCE] GIT 1.4.3 Date: Thu, 19 Oct 2006 01:53:22 +0200 Size: 57122 URL: From halr at voltaire.com Wed Oct 18 19:19:57 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Oct 2006 22:19:57 -0400 Subject: [openib-general] [PATCH] [TRIVIAL] OpenSM/osm_port_info_rcv.c: Remove duplicate dump of received PortInfo Message-ID: <1161224394.25985.38676.camel@hal.voltaire.com> OpenSM/osm_port_info_rcv.c: Remove duplicate dump of received PortInfo in osm_pi_rcv_process Signed-off-by: Hal Rosenstock Index: opensm/osm_port_info_rcv.c =================================================================== --- opensm/osm_port_info_rcv.c (revision 9884) +++ opensm/osm_port_info_rcv.c (working copy) @@ -710,8 +710,9 @@ osm_pi_rcv_process( port_guid = p_context->port_guid; node_guid = p_context->node_guid; - osm_dump_port_info( - p_rcv->p_log, node_guid, port_guid, port_num, p_pi, OSM_LOG_DEBUG); + osm_dump_port_info( p_rcv->p_log, + node_guid, port_guid, port_num, p_pi, + OSM_LOG_DEBUG ); /* we might get a response during a light sweep looking for a change in @@ -829,10 +830,6 @@ osm_pi_rcv_process( p_smp->hop_count, p_smp->initial_path ); } - osm_dump_port_info( p_rcv->p_log, - node_guid, port_guid, port_num, p_pi, - OSM_LOG_DEBUG ); - /* Check if the update_sm_base_lid in the context is TRUE. If it is - then update the master_sm_base_lid of the variable From rdreier at cisco.com Wed Oct 18 19:49:50 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 19:49:50 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061019013313.GE30696@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 19 Oct 2006 03:33:14 +0200") References: <20061019013313.GE30696@mellanox.co.il> Message-ID: > > > Hopefully that is under prefix: > > > $prefix/etc/libibverbs.conf.d/ > > > > Well, $sysconfdir/libibverbs.conf.d > > Ugh, is that a problem if I want to build and run as non-root? > I'm used to be able to set --prefix on config line for all libs > to some directory, put LD_LIBRARY_PATH to point there, then > if I like I just blow all of it away and I get a clean system. > Scattering config files around in home directory etc will break this. I'm not following the objection: what's wrong with using $sysconfdir? It defaults to $prefix/etc like you want, and it can be overridden with the --sysconfdir parameter to configure. > > > Finally, it might be nice to be able to just specify the list of > > > plugins at configure time for people like me who buuild everything > > > from source and who want less flexibility > > > but also less files to install. > > > > Again, is that really any easier > > Well, I'm thinking of distributed systems mainly where copying extra > files around is additional pain. > Consider myself: I'm building things on my laptop, then pushing them out to > machines in the lab over rsync for testing. Less files - less headache. > > > than putting whatever you want into > > your .libibverbs.conf? > > I really don't think a library sticking things in user's home directory > is such a great idea - typical users don't really know they link against > some library, this is just an extra place that users can break: > move to another machine, things stop working, and your app's > manual does not say anything of course. libraries don't stick anything in home directories -- I'm just suggesting $HOME/.libibverbs.conf as a place to stick extra configs that users might want to add. I'm kind of thinking that we might want other config options beyond just driver names someday. Otherwise we might as well have /etc/libibverbs.drivers.d and an environment variable IBV_DRIVERS, I guess. But it might be nice to be able to add a line like default-fork-safe true somewhere in libibverbs.conf.d to set a system-wide default. I dunno what's better. Maybe separate environment variables for user-specific configs are just as good -- eg that's what ld.so does. > > > I definitely plan to make it so a missing plug-in is not fatal, so it > > shouldn't hurt to have extra drivers declared that you don't build > > every time. > > Not until someone decides to rename a plugin for some reason - then you have to > hunt down and kill the old file name to prevent an old version stuck in library > path for some reason from being loaded - easy with the central location, but > good luck walking all user's home directories. Hmm, this seems to argue against allowing environment variables or anything but a single directory built into libibverbs. Because otherwise you have to grep every .bashrc .cshrc and so on. - R. From mst at mellanox.co.il Wed Oct 18 20:05:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 05:05:01 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061019030501.GH30696@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > > > > Hopefully that is under prefix: > > > > $prefix/etc/libibverbs.conf.d/ > > > > > > Well, $sysconfdir/libibverbs.conf.d > > > > Ugh, is that a problem if I want to build and run as non-root? > > I'm used to be able to set --prefix on config line for all libs > > to some directory, put LD_LIBRARY_PATH to point there, then > > if I like I just blow all of it away and I get a clean system. > > Scattering config files around in home directory etc will break this. > > I'm not following the objection: what's wrong with using $sysconfdir? > It defaults to $prefix/etc like you want, and it can be overridden > with the --sysconfdir parameter to configure. Sorry, looks like I was confused. > > > > Finally, it might be nice to be able to just specify the list of > > > > plugins at configure time for people like me who buuild everything > > > > from source and who want less flexibility > > > > but also less files to install. > > > > > > Again, is that really any easier > > > > Well, I'm thinking of distributed systems mainly where copying extra > > files around is additional pain. > > Consider myself: I'm building things on my laptop, then pushing them out to > > machines in the lab over rsync for testing. Less files - less headache. > > > > > than putting whatever you want into > > > your .libibverbs.conf? > > > > I really don't think a library sticking things in user's home directory > > is such a great idea - typical users don't really know they link against > > some library, this is just an extra place that users can break: > > move to another machine, things stop working, and your app's > > manual does not say anything of course. > > libraries don't stick anything in home directories -- I'm just > suggesting $HOME/.libibverbs.conf as a place to stick extra configs > that users might want to add. > > I'm kind of thinking that we might want other config options beyond > just driver names someday. Otherwise we might as well have > /etc/libibverbs.drivers.d and an environment variable IBV_DRIVERS, I > guess. But it might be nice to be able to add a line like > > default-fork-safe true > > somewhere in libibverbs.conf.d to set a system-wide default. I see. Looks somewhat useful - do you really intend something like this? Then we'd need an API for app to set fork support state explicitly - we currently only make it possible to enable it, not to disable. > I dunno what's better. Maybe separate environment variables for > user-specific configs are just as good -- eg that's what ld.so does. Hmm. I guess what I'm trying to say is - let's follow some precedent. ld.so example is good. Are there others? > > > > > I definitely plan to make it so a missing plug-in is not fatal, so it > > > shouldn't hurt to have extra drivers declared that you don't build > > > every time. > > > > Not until someone decides to rename a plugin for some reason - then you > > have to hunt down and kill the old file name to prevent an old version > > stuck in library path for some reason from being loaded - easy with the > > central location, but good luck walking all user's home directories. > > Hmm, this seems to argue against allowing environment variables or > anything but a single directory built into libibverbs. Because > otherwise you have to grep every .bashrc .cshrc and so on. Hmm, good point. I like it that with environment I can just pass it on command line and not worry about any files which might be left behind. -- MST From mst at mellanox.co.il Wed Oct 18 20:16:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 05:16:44 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: <20061019013313.GE30696@mellanox.co.il> Message-ID: <20061019031644.GI30696@mellanox.co.il> Quoting r. Roland Dreier : > libraries don't stick anything in home directories -- I'm just > suggesting $HOME/.libibverbs.conf as a place to stick extra configs > that users might want to add. > > I'm kind of thinking that we might want other config options beyond > just driver names someday. Otherwise we might as well have > /etc/libibverbs.drivers.d and an environment variable IBV_DRIVERS, I > guess. But it might be nice to be able to add a line like > > default-fork-safe true > > somewhere in libibverbs.conf.d to set a system-wide default. > > I dunno what's better. Maybe separate environment variables for > user-specific configs are just as good -- eg that's what ld.so does. Possible usage examples: I was thinking about some networked filesystem to have all boxes in the lab get stuff from central place before the run, instead of copying stuff over. I don't want to consider NFS-based home directory though. Using environment makes it easier for me to avoid need to istall stuff on local disks, at all. -- MST From rdreier at cisco.com Wed Oct 18 20:41:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 20:41:19 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061019030501.GH30696@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 19 Oct 2006 05:05:01 +0200") References: <20061019030501.GH30696@mellanox.co.il> Message-ID: > > I dunno what's better. Maybe separate environment variables for > > user-specific configs are just as good -- eg that's what ld.so does. > > Hmm. > I guess what I'm trying to say is - let's follow some precedent. > ld.so example is good. Are there others? I think there are plenty of precedents for putting configuration in dotfiles in $HOME. For example on my system, 'man fonts-conf' shows NAME fonts.conf - Font configuration files SYNOPSIS /etc/fonts/fonts.conf /etc/fonts/fonts.dtd /etc/fonts/conf.d ~/.fonts.conf But I'm sure there are plenty of environment variable uses too. - R. From mst at mellanox.co.il Wed Oct 18 20:58:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 05:58:01 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061019035801.GA28933@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use mmiowb after doorbell ring > > > > I dunno what's better. Maybe separate environment variables for > > > user-specific configs are just as good -- eg that's what ld.so does. > > > > Hmm. > > I guess what I'm trying to say is - let's follow some precedent. > > ld.so example is good. Are there others? > > I think there are plenty of precedents for putting configuration in > dotfiles in $HOME. For example on my system, 'man fonts-conf' shows > > NAME > fonts.conf - Font configuration files > > SYNOPSIS > /etc/fonts/fonts.conf > /etc/fonts/fonts.dtd > /etc/fonts/conf.d > ~/.fonts.conf > > But I'm sure there are plenty of environment variable uses too. Sure. But this configuration of a program (x11 font server), not a library, is that right? So user has a chance to know he's running it and read the man page to figure which files are read. It seems for libraries conf files are not common. -- MST From rdreier at cisco.com Wed Oct 18 21:08:52 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 18 Oct 2006 21:08:52 -0700 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: <20061019035801.GA28933@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 19 Oct 2006 05:58:01 +0200") References: <20061019035801.GA28933@mellanox.co.il> Message-ID: > Sure. But this configuration of a program (x11 font server), not a library, is > that right? So user has a chance to know he's running it and read the man page > to figure which files are read. It seems for libraries conf files are not > common. No, actually I snipped the next few lines of the man page: Description Fontconfig is a library designed to provide system-wide font configuration, cus- tomization and application access. Off the top of my head I can also think of GTK+ (~/.gtkrc), and I seem to have a ~/.gstreamer too. I think the real question is whether we expect to have complex config options that would be hard to stick in an environment variable. At this point I suspect not, so I guess I'll go with $sysconfdir/libibverbs.d and $IBV_DRIVERS. - R. From mst at mellanox.co.il Wed Oct 18 21:17:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 06:17:08 +0200 Subject: [openib-general] [PATCH] use mmiowb after doorbell ring In-Reply-To: References: Message-ID: <20061019041708.GB28933@mellanox.co.il> Quoting Roland Dreier : > I think the real question is whether we expect to have complex config > options that would be hard to stick in an environment variable. At > this point I suspect not, so I guess I'll go with $sysconfdir/libibverbs.d > and $IBV_DRIVERS. OK, both make sense. -- MST From krkumar2 at in.ibm.com Wed Oct 18 21:59:28 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 19 Oct 2006 10:29:28 +0530 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: <000001c6f2d4$8a73a0e0$ff0da8c0@amr.corp.intel.com> Message-ID: Hi Sean, > Let's try something like this then (untested): > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 18a4366..0d06431 100755 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -1859,16 +1859,20 @@ int rdma_bind_addr(struct rdma_cm_id *id > mutex_unlock(&lock); > } > if (ret) > - goto err; > + goto err1; > } > > memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); > ret = cma_get_port(id_priv); > if (ret) > - goto err; > + goto err2; > > return 0; > -err: > +err2: > + mutex_lock(&lock); > + cma_detach_from_dev(id_priv); > + mutex_unlock(&lock); > +err1: > cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); > return ret; > } This will mean that a deref is wrongly done if a loopback or zero address is passed to this function, without it having done a ref inc. I do think this case requires a variable to indicate whether a ref was got or not. Assuming that is true, I will submit a patch with your comment about holding the lock. thanks, - KK From krkumar2 at in.ibm.com Wed Oct 18 22:00:53 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 19 Oct 2006 10:30:53 +0530 Subject: [openib-general] [PATCH] RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count Message-ID: <20061019050053.4951.66426.sendpatchset@localhost.localdomain> rdma_bind_addr() leaks a cma_dev reference count in failure case. Also hold lock when doing a cma_detach_from_dev() as pointed out by Sean. Signed-off-by: Krishna Kumar --- diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-09 17:13:41.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-09 19:42:31.000000000 +0530 @@ -1750,6 +1750,7 @@ static int cma_get_port(struct rdma_id_p int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; + int did_acquire_dev = 0; int ret; if (addr->sa_family != AF_INET) @@ -1768,6 +1769,7 @@ int rdma_bind_addr(struct rdma_cm_id *id } if (ret) goto err; + did_acquire_dev = 1; } memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); @@ -1777,6 +1779,11 @@ int rdma_bind_addr(struct rdma_cm_id *id return 0; err: + if (did_acquire_dev) { + mutex_lock(&lock); + cma_detach_from_dev(id_priv); + mutex_unlock(&lock); + } cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); return ret; } From krkumar2 at in.ibm.com Wed Oct 18 22:12:34 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 19 Oct 2006 10:42:34 +0530 Subject: [openib-general] [PATCH] Fix some cancellation problems in process_req(). In-Reply-To: <4535168F.9050901@ichips.intel.com> Message-ID: > The other changes look fine. But note that if req->status == -ECANCELED and > time_after() is true, then it seems like a toss up as to which one can be > reported to the user. I felt that since the time_after() check matched (in all likelyhood) due to the processing of the cancellation, ECANCELLED is more appropriate to return. It is most likely that if both conditions are true, that a cancelled operation led to the time_after() match (cancel sets time to jiffies resulting in this time_after match). Chances of both happening together is almost zero. Do you agree ? Otherwise I can re-work the patch as suggested. thanks, - KK From krkumar2 at in.ibm.com Wed Oct 18 22:05:33 2006 From: krkumar2 at in.ibm.com (Krishna Kumar) Date: Thu, 19 Oct 2006 10:35:33 +0530 Subject: [openib-general] [PATCH] [REVOKE] If addr_handler() got error, do not set state as OK Message-ID: <20061019050533.4957.41770.sendpatchset@localhost.localdomain> This was originally sent with the intention : If addr_handler() got invoked with an error status, do not set id_priv->state to success followed by resettting it to the old value (redundant code). Also encapsulate some common code. But when I followed Sean's suggestion to avoid using extra flags, the result is not very appealing (see below). The code is too complicated (multiple overwrites of 'status') to do this neatly. I suggest we drop this patch, as it is not easy to achieve the above intention cleanly by either re-write method :) diff -ruNp org/drivers/infiniband/core/cma.c new/drivers/infiniband/core/cma.c --- org/drivers/infiniband/core/cma.c 2006-10-10 15:45:27.000000000 +0530 +++ new/drivers/infiniband/core/cma.c 2006-10-10 15:59:53.000000000 +0530 @@ -1520,6 +1518,13 @@ static void addr_handler(int status, str atomic_inc(&id_priv->dev_remove); + if (status) { /* We got called with an error */ + if (!cma_comp(id_priv, CMA_ADDR_QUERY)) /* Invalid state */ + goto out; + event = RDMA_CM_EVENT_ADDR_ERROR; + goto notify: + } + /* * Grab mutex to block rdma_destroy_id() from removing the device while * we're trying to acquire it. @@ -1529,9 +1534,8 @@ static void addr_handler(int status, str mutex_unlock(&lock); goto out; } - - if (!status && !id_priv->cma_dev) - status = cma_acquire_dev(id_priv); + if (!id_priv->cma_dev) + status = cma_acquire_dev(id_priv); mutex_unlock(&lock); if (status) { @@ -1544,16 +1548,15 @@ static void addr_handler(int status, str event = RDMA_CM_EVENT_ADDR_RESOLVED; } - if (cma_notify_user(id_priv, event, status, NULL, 0)) { +notify: + if (cma_notify_user(id_priv, event, status, NULL, 0)) cma_exch(id_priv, CMA_DESTROYING); - cma_release_remove(id_priv); - cma_deref_id(id_priv); - rdma_destroy_id(&id_priv->id); - return; - } + out: cma_release_remove(id_priv); cma_deref_id(id_priv); + if (cma_comp(id_priv, CMA_DESTROYING)) + rdma_destroy_id(&id_priv->id); } static int cma_resolve_loopback(struct rdma_id_private *id_priv) From mst at mellanox.co.il Wed Oct 18 22:09:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 07:09:07 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161177058.2917.513.camel@fc6.xsintricity.com> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> Message-ID: <20061019050907.GA1547@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > On Wed, 2006-10-18 at 09:29 +0200, Michael S. Tsirkin wrote: > > Quoting r. Doug Ledford : > > > > >From our dicussion, it seems we should be able to just push the > > > > small number of missing bits into RHEL5 directly. That would be > > > > nicer of course. > > > > > > It depends. If there's lots of individual changes, it might be easier > > > to push the OFED 1.1 change. But, that depends on when the final OFED > > > 1.1 comes out and how much it varies from the existing RPMs. > > > > OFED is in deep freeze, so you can already look at it to estimate the amount of > > changes against 2.6.18. > > Could you look at the diff please so that I know whether it's worth it > > to invest in building the minimal patch set for pushing into RHEL5, > > or whether you'll push OFED 1.1 into RHEL kernel as is? > > Yeah, I'll look over the diff today. How does it look? -- MST From xma at us.ibm.com Wed Oct 18 23:15:25 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 18 Oct 2006 23:15:25 -0700 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <1161031849.2917.400.camel@fc6.xsintricity.com> Message-ID: openib-general-bounces at openib.org wrote on 10/16/2006 01:50:49 PM: > On Mon, 2006-10-16 at 15:25 +0200, Michael S. Tsirkin wrote: > > Quoting r. Maestas, Christopher Daniel : > > > Subject: Re: [openib-general] RHEL5 and OFED ... > > > > > > > Now for userspace - does RHEL5 include at least libibverbs-1.0? > > > > This has been released a while back, and Roland makes regular bugfix > > > releases. > > > > > > Here's what I see on a rhel4 u4 system: > > > --- > > > $ rpm -q libibverbs > > > libibverbs-1.0.3-1 > > > --- > > > > > > So I would think rhel5 would have at least that or greater. When I > > > compiled rpms for 1.1rc7 it generated: > > > --- > > > # ls libibverbs-* > > > libibverbs-1.0.4-0.x86_64.rpm libibverbs-utils-1.0.4-0.x86_64.rpm > > > libibverbs-devel-1.0.4-0.x86_64.rpm > > > > Dough, would it be possible to update this + libmthca? > > Possibly. What's the justification? What's in 1.0.4 that is the > primary reason for wanting to update from 1.0.3? > > -- > Doug Ledford I am not sure whether this already has an answer. The justification is madvise(..., MADV_DONTFORK) is used to make fork() work for verbs consumers in the recent packages. I hope same patch will be in libehca. thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at dev.mellanox.co.il Thu Oct 19 00:16:12 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 19 Oct 2006 09:16:12 +0200 Subject: [openib-general] ibv_reg_mr temporary vs permanent errors In-Reply-To: <0E8782D7-E8BE-4DBC-AFB9-41AC405F3BAA@scl.ameslab.gov> References: <0E8782D7-E8BE-4DBC-AFB9-41AC405F3BAA@scl.ameslab.gov> Message-ID: <4537263C.1040300@dev.mellanox.co.il> Hi Troy. Troy Benjegerdes wrote: > If ibv_reg_mr fails, can an application (or library, such as pvfs) > assume that this is just a temporary error, and try to deregister > some memory, then try again? > I believe that the answer is not always. They may be several reasons for a memory registration to fail: * bad parameters (memory type and requested permission doesn't match) * if the permission is not legal (Remote Write is enabled but Local Write isn't) * the process cannot lock any more memory (ulimit configuration) * the process cannot register any more memory regions (maybe other processes registered all of the available MRs supported by HCA) > How can we differentiate between the case where the hardware (such as > ehca) actually has more information about why the memory registration > failed, and the application can act on that information (by > coalescing memory regions, for example), vs cases where something is > just plain broken and the application should give up and exit. > For now, the gen2 driver doesn't give the user any reason for the failure of the operation. I hope that this will be changed ... Dotan From vlad at mellanox.co.il Thu Oct 19 01:01:58 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 19 Oct 2006 10:01:58 +0200 Subject: [openib-general] building OFED package from git and svn Message-ID: <6C2C79E72C305246B504CBA17B5500C92A7F5B@mtlexch01.mtl.com> Hi Yossi, It should be fixed in OFED-1.1-pre1. Regards, Vladimir ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of yosef etigin Sent: Monday, October 16, 2006 11:35 AM To: openib-general at openib.org Subject: [openib-general] building OFED package from git and svn Hello, I have been trying to build OFED source package (1.1 rev 9820) as was described in the HOWTO.build_ofed wiki. The package was built successfully, however i had trouble compilng it. The error I get is a missing library inside OFED's temporary build tree. The error is during the compilation of DAPL: gcc: /tmp/OFED-1.1-rev9725/SOURCES/openib-1.1/src/userspace/librdmacm/src/.li bs/.libs/librdmacm.so: No such file or directory log of the make process is attached. When I changed the top-level Makefile of open-ib sources, line 306: from: AM_LDFLAGS="-L../libibverbs/src -libverbs -L../librdmacm/src/.libs -lrdmacm -lsysfs" to: AM_LDFLAGS="-L../libibverbs/src -libverbs -L../librdmacm/src/ -lrdmacm -lsysfs" It compiled OK. Is this really a problem in this Makefile or this fix covers up for something more deep? Yossi -------------- next part -------------- An HTML attachment was scrubbed... URL: From RAISCH at de.ibm.com Thu Oct 19 01:16:07 2006 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Thu, 19 Oct 2006 10:16:07 +0200 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: <829BE3D9-3A7F-4964-BB22-30B94692B7C8@scl.ameslab.gov> Message-ID: > (I am taking this back to the openib list because I think the list > needs to hear about real applications that are hitting memory > registration limits) > > What are the limits on the ehca memory registrations? > Is there a limit to the number of regions that can be registered? The numbe rof regions should not be limited, The total size of regions is limited, all user applications together can only register the complete available physical memory once. The rationale behind that is that you can give away physical memory only once to a application. Registering shared memory regions on a "physical" memory region should be unlimited as well. > Is > there any way (with kernel hacks) that we can register the entire > address space of the application? I'd guess you mean physical available memory space. Would be definetly hard to "pin" virtual memory provided by swapping. > We would like to be able to do RDMA > sends and receives from anywhere in the application address space > eventually, and only register it once. Yes, that's the fastest way to use IB. But keep in mind that registered memory is pinned and can't be given to "helper" tasks, like sshd. So you have to restrict you application to max memory minus the memory needed by base kernel+ daemons+bash+... to be able to "breathe". > > What is the point of RDMA for memory-intensive applications if you > have to copy the data to a registered buffer before sending it anyway? > Regards . . . Christoph Raisch From HNGUYEN at de.ibm.com Thu Oct 19 01:55:26 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 19 Oct 2006 10:55:26 +0200 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: <829BE3D9-3A7F-4964-BB22-30B94692B7C8@scl.ameslab.gov> Message-ID: Hello Troy and Kyle! > > Kyle wrote: > > Our app writes out a file once, then reads it in many times through > > the pvfs2 system. In the pvfs2 layers, there is memory caching > > done at the network level, so memory is registered by the app, and > > attempts are made to re-register and/or re-use these memory regions > > to save on memory reg overhead. The problem occurs only while > > writing files, so while memory is being initially registered with > > the nic/app and cached? Also, our tests show that the app runs > > normally to completion on identical machines using mellanox hca's > > instead of the eHCA. The file sizes are generally >16GByte, > > however our failures usually appear by the time ~220-250MBytes have > > been written(possibly also all registered)? We have tested memory registration with 64GB. So I don't think ~16GB is an issue. However we do have a restriction of mappings in that the total number of mappings is twice of the total number of pages assigned to the partition. The term mappings means the number of pages in the calls to ib_reg_phys_mr() or ib_reg_user_mr()/ibv_reg_mr(). ehca driver does register the whole space at module load time so that for user space applications you have a limit of mappings equal the total number of physical pages. Note that kernel modules sitting on top of ehca eg. ib_ipoib, ib_mad don't suffer under this limit since they share the whole space registered by ehca as they call ib_get_dma_mr(). > > I'm not sure the standard OpenIB NetPIPE runs can reproduce this > > type of workload. However, we have developed a working PVFS2- > > NetPIPE module which can reproduce this problem on occassion, if > > there is interest in further testing this on your end, I can make > > it available. Yes. Please send it to me. I'd like to test it. Is it a user space appl.? I want to see if we could reach the limit of mappings mentioned above. > > Our ehca's have the following revision info: > > vendor_id: 0x5076 > > vendor_part_id: 0 > > hw_ver: 0x1000003 > > Kernel version is debian 2.6.17 ok. For completeness please give me the driver version using modinfo and also the firmware code level via HMC - click on "Licensed Internal Code Maintenance" (left pane), "Change Licensed Internal Code" (right pane), select your frame and then "View System Info", "Display Current Values". You can also turn on the debug traces of ehca to track all reg_mr() calls in order to determine if you reach the limit of mappings mentioned above. Or just send me the whole dmesg resp. /var/log/messages. > Troy wrote: > What are the limits on the ehca memory registrations? > Is there a limit to the number of regions that can be registered? See above > Is > there any way (with kernel hacks) that we can register the entire > address space of the application? We would like to be able to do RDMA > sends and receives from anywhere in the application address space > eventually, and only register it once. ib_ipoib or ib_mad actually uses ib_get_dma_mr() and obtain the whole space. For user space there is no corresponding api yet. > What is the point of RDMA for memory-intensive applications if you > have to copy the data to a registered buffer before sending it anyway? Not sure if I understand completely... However RDMA mr accesses should not be an issue. Thanks! Nam From HNGUYEN at de.ibm.com Thu Oct 19 02:20:20 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 19 Oct 2006 11:20:20 +0200 Subject: [openib-general] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? In-Reply-To: Message-ID: Hi, > What testing did these companies do with rc7? Still testing rc7. Will post our results probably today evening. Regards! Nam From ogerlitz at voltaire.com Thu Oct 19 02:59:25 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 19 Oct 2006 11:59:25 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: References: Message-ID: <45374C7D.9070604@voltaire.com> Shirley Ma wrote: > openib-general-bounces at openib.org wrote on 10/16/2006 01:50:49 PM: > > On Mon, 2006-10-16 at 15:25 +0200, Michael S. Tsirkin wrote: > > > Quoting r. Maestas, Christopher Daniel : > > > > So I would think rhel5 would have at least that or greater. When I > > > > compiled rpms for 1.1rc7 it generated: > > > > --- > > > > # ls libibverbs-* > > > > libibverbs-1.0.4-0.x86_64.rpm > libibverbs-utils-1.0.4-0.x86_64.rpm > > > > libibverbs-devel-1.0.4-0.x86_64.rpm > > > > > > Dough, would it be possible to update this + libmthca? > > > > Possibly. What's the justification? What's in 1.0.4 that is the > > primary reason for wanting to update from 1.0.3? > I am not sure whether this already has an answer. > The justification is madvise(..., MADV_DONTFORK) is used to make fork() > work for verbs consumers in the recent packages. I hope same patch will > be in libehca. Just to crlarify: libibverbs-1.X does not include the madvise(...,MADV_DONTFORK) fork() support, it was integrated into libibverbs1.1 which is not released yet. Or. From ogerlitz at voltaire.com Thu Oct 19 03:02:57 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 19 Oct 2006 12:02:57 +0200 Subject: [openib-general] RHEL5 and OFED ... In-Reply-To: <45374C7D.9070604@voltaire.com> References: <45374C7D.9070604@voltaire.com> Message-ID: <45374D51.4020505@voltaire.com> Or Gerlitz wrote: > Shirley Ma wrote: >> I am not sure whether this already has an answer. >> The justification is madvise(..., MADV_DONTFORK) is used to make fork() >> work for verbs consumers in the recent packages. I hope same patch will >> be in libehca. > Just to crlarify: libibverbs-1.X does not include the > madvise(...,MADV_DONTFORK) fork() support, it was integrated into > libibverbs1.1 which is not released yet. I meant libibverbs-1.0.X and not 1.X Or. From vlad at mellanox.co.il Thu Oct 19 05:04:01 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 19 Oct 2006 14:04:01 +0200 Subject: [openib-general] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs Message-ID: <6C2C79E72C305246B504CBA17B5500C93477B6@mtlexch01.mtl.com> Hi Nam, Can this patch be saved for the next OFED-1.2 release? Note: OFED installation script checks that sysfsutils package installed. Regards, Vladimir -----Original Message----- From: Hoang-Nam Nguyen [mailto:hnguyen at de.ibm.com] Sent: Wednesday, October 18, 2006 7:02 PM To: Vladimir Sokolovsky; Michael S. Tsirkin Cc: openfabrics-ewg at openib.org; openib-general at openib.org Subject: [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs Hello, here is the patch of configure in libehca as a result of the patch "libehca configure.in and config.h.in". It is generated by autogen.sh and pretty lengthy. Hence, I'm attaching it here for completeness. Vlad, do you want me to check it in svn or send you the whole file? Thanks! Nam From HNGUYEN at de.ibm.com Thu Oct 19 05:32:41 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 19 Oct 2006 14:32:41 +0200 Subject: [openib-general] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <6C2C79E72C305246B504CBA17B5500C93477B6@mtlexch01.mtl.com> Message-ID: Hi Vlad! > Can this patch be saved for the next OFED-1.2 release? > Note: OFED installation script checks that sysfsutils package installed. As Michael indicated in previous email configure is a generated file from autogen.sh. And I'm not sure if your packaging script does generate it automatically when you build the whole OFED tgz-file. The actual patches in configure.in and config.h.in are needed because I got compile error like: static declaration of ibv_read_sysfs_file() does not match previous declaration in infiniband/driver.h Thanks! Nam From ishai at dev.mellanox.co.il Thu Oct 19 05:38:03 2006 From: ishai at dev.mellanox.co.il (Ishai Rabinovitz) Date: Thu, 19 Oct 2006 14:38:03 +0200 (IST) Subject: [openib-general] [PATCH] IB/SRP Userspace: srptools/srp_daemon - Fix connect bug and add support for user specified initiator extension In-Reply-To: References: Message-ID: <33587.194.90.237.34.1161261483.squirrel@dev.mellanox.co.il> Thanks for your patch. I agree with some of the changes you suggest and disagree with others. It will be useful to post a different patch for each logical change. > 1. Fixes bug in srp_daemon for the case where if it is invoked with the '-e' option, it fails to connect to the SRP targets because of a newline character in the parameter string. You are right that the '\n' is redundant, but I have not seen the bug it creates. The last parameter in the string is considered to be a string by ib_srp and therefore ib_srp will ignore the newline. > 2. Changes the name of the constant 'MAX_TRAGET_CONFIG_STR_STRING' to 'MAX_TARGET_CONFIG_STR_STRING'. Thanks, I will apply this change. > 3. Changes the behavior of the '-n' option to srp_daemon. The earlier behavior printed the initiator extension. The new behavior allows the user to specify an initiator extension as an argument to the '-n' option. I do not think we want to change the -n behavior to this one. First of all your approach induces the same initiator_ext to all the targets discovered by this srp_daemon. Secondly If someone uses random values for the initiator_ext it may cause a waste of resources in the target: the target can never tell when a connection had failed (or an initiator performed a boot) and will keep the connection alive. When there is an attempt to reconnect to this target with the same initiator_ext, the target knows he can close the old connection. This is the reason we decided to have a convention. The convention is to use the target port guid. The advantage of this convention is that it allows us to have two connection on the same time from an initiator to both ports of the target. I understand that some targets may need a different initiator_ext, but you should add a new flag for actually setting the initiator_ext and leave -n untouched. You are welcome to send such a patch. Ishai From dotanb at dev.mellanox.co.il Thu Oct 19 05:58:16 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 19 Oct 2006 14:58:16 +0200 Subject: [openib-general] [PATCH] rping.c: Compilation warning on 64 bit machines was fixed Message-ID: <1161262696.20990.1.camel@mtls05.yok.mtl.com> Compilation warning on 64 bit machines was fixed. Signed-off-by: Dotan Barak --- Index: last_stable/src/userspace/librdmacm/examples/rping.c =================================================================== --- last_stable.orig/src/userspace/librdmacm/examples/rping.c 2006-07-02 18:09:41.000000000 +0300 +++ last_stable/src/userspace/librdmacm/examples/rping.c 2006-07-03 13:38:20.000000000 +0300 @@ -1025,7 +1025,7 @@ int main(int argc, char *argv[]) if ((cb->size < RPING_MIN_BUFSIZE) || (cb->size > (RPING_BUFSIZE - 1))) { fprintf(stderr, "Invalid size %d " - "(valid range is %d to %d)\n", + "(valid range is %Zd to %d)\n", cb->size, RPING_MIN_BUFSIZE, RPING_BUFSIZE); ret = EINVAL; } else From kliteyn at dev.mellanox.co.il Thu Oct 19 06:00:13 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 19 Oct 2006 15:00:13 +0200 Subject: [openib-general] [PATCH] [TRIVIAL] OpenSM/osm_port_info_rcv.c: Remove duplicate dump of received PortInfo In-Reply-To: <1161224394.25985.38676.camel@hal.voltaire.com> References: <1161224394.25985.38676.camel@hal.voltaire.com> Message-ID: <453776DD.7030502@dev.mellanox.co.il> Looks good. -- Yevgeny Hal Rosenstock wrote: > OpenSM/osm_port_info_rcv.c: Remove duplicate dump of received PortInfo > in osm_pi_rcv_process > > Signed-off-by: Hal Rosenstock > > Index: opensm/osm_port_info_rcv.c > =================================================================== > --- opensm/osm_port_info_rcv.c (revision 9884) > +++ opensm/osm_port_info_rcv.c (working copy) > @@ -710,8 +710,9 @@ osm_pi_rcv_process( > port_guid = p_context->port_guid; > node_guid = p_context->node_guid; > > - osm_dump_port_info( > - p_rcv->p_log, node_guid, port_guid, port_num, p_pi, OSM_LOG_DEBUG); > + osm_dump_port_info( p_rcv->p_log, > + node_guid, port_guid, port_num, p_pi, > + OSM_LOG_DEBUG ); > > /* > we might get a response during a light sweep looking for a change in > @@ -829,10 +830,6 @@ osm_pi_rcv_process( > p_smp->hop_count, p_smp->initial_path ); > } > > - osm_dump_port_info( p_rcv->p_log, > - node_guid, port_guid, port_num, p_pi, > - OSM_LOG_DEBUG ); > - > /* > Check if the update_sm_base_lid in the context is TRUE. > If it is - then update the master_sm_base_lid of the variable > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Thu Oct 19 06:06:11 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 19 Oct 2006 08:06:11 -0500 Subject: [openib-general] [PATCH] rping.c: Compilation warning on 64 bit machines was fixed In-Reply-To: <1161262696.20990.1.camel@mtls05.yok.mtl.com> References: <1161262696.20990.1.camel@mtls05.yok.mtl.com> Message-ID: <1161263171.25787.7.camel@stevo-desktop> Committed revision 9898. Thanks, Steve. On Thu, 2006-10-19 at 14:58 +0200, Dotan Barak wrote: > Compilation warning on 64 bit machines was fixed. > > Signed-off-by: Dotan Barak From halr at voltaire.com Thu Oct 19 06:03:21 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Oct 2006 09:03:21 -0400 Subject: [openib-general] [PATCH] Diags/ibportstate: For query op, add peer port checking of link width and speed active Message-ID: <1161262998.25985.65438.camel@hal.voltaire.com> Diags/ibportstate: For query op, add peer port checking of link width and speed active This is requiresthe combined route support in libibmad Signed-off-by: Hal Rosenstock --- Index: src/ibportstate.c =================================================================== --- src/ibportstate.c (revision 9870) +++ src/ibportstate.c (working copy) @@ -83,33 +83,29 @@ iberror(const char *fn, char *msg, ...) /*******************************************/ -static char * -get_node_info(ib_portid_t *dest, char *data, char **argv, int argc) +static int +get_node_info(ib_portid_t *dest, char *data) { int node_type; if (!smp_query(data, dest, IB_ATTR_NODE_INFO, 0, 0)) - return "smp query nodeinfo failed"; + return -1; node_type = mad_get_field(data, 0, IB_NODE_TYPE_F); if (node_type == IB_NODE_SWITCH) /* Switch NodeType ? */ return 0; else - return "Node type not switch"; + return 1; } -static char * -get_port_info(ib_portid_t *dest, char *data, char **argv, int argc, int port_op) +static int +get_port_info(ib_portid_t *dest, char *data, int portnum, int port_op) { char buf[2048]; - int portnum = 0; char val[64]; - if (argc > 0) - portnum = strtol(argv[0], 0, 0); - if (!smp_query(data, dest, IB_ATTR_PORT_INFO, portnum, 0)) - return "smp query portinfo failed"; + return -1; if (port_op != 4) mad_dump_portstates(buf, sizeof buf, data, sizeof data); @@ -123,18 +119,14 @@ get_port_info(ib_portid_t *dest, char *d return 0; } -static char * -set_port_info(ib_portid_t *dest, char *data, char **argv, int argc, int port_op) +static int +set_port_info(ib_portid_t *dest, char *data, int portnum, int port_op) { char buf[2048]; - int portnum = 0; char val[64]; - if (argc > 0) - portnum = strtol(argv[0], 0, 0); - if (!smp_set(data, dest, IB_ATTR_PORT_INFO, portnum, 0)) - return "smp set failed"; + return -1; if (port_op != 4) mad_dump_portstates(buf, sizeof buf, data, sizeof data); @@ -149,6 +141,55 @@ set_port_info(ib_portid_t *dest, char *d return 0; } +static int +get_link_width(int lwe, int lws) +{ + if (lwe == 255) + return lws; + else + return lwe; +} + +static int +get_link_speed(int lse, int lss) +{ + if (lse == 15) + return lss; + else + return lse; +} + +static void +validate_width(int width, int peerwidth, int lwa) +{ + if ((width & 0x8) && (peerwidth & 0x8)) { + if (lwa != 8) + printf("Peer ports operating at active width %d rather than 8 (12x)\n", lwa); + } else { + if ((width & 0x4) && (peerwidth & 0x4)) { + if (lwa != 4) + printf("Peer ports operating at active width %d rather than 4 (8x)\n", lwa); + } else { + if ((width & 0x2) && (peerwidth & 0x2)) + if (lwa != 2) + printf("Peer ports operating at active width %d rather than 2 (4x)\n", lwa); + } + } +} + +static void +validate_speed(int speed, int peerspeed, int lsa) +{ + if ((speed & 0x4) && (peerspeed & 0x4)) { + if (lsa != 4) + printf("Peer ports operating at active speed %d rather than 4 (10.0 Gbps)\n", lsa); + } else { + if ((speed & 0x2) && (peerspeed & 0x2)) + if (lsa != 2) + printf("Peer ports operating at active speed %d rather than 2 (5.0 Gbps)\n", lsa); + } +} + void usage(void) { @@ -179,13 +220,21 @@ main(int argc, char **argv) ib_portid_t portid = {0}; ib_portid_t *sm_id = 0, sm_portid = {0}; extern int ibdebug; + int err; int timeout = 0, udebug = 0; char *ca = 0; int ca_port = 0; int port_op = 0; /* default to query */ int speed = 15; - char *err; + int is_switch = 1; + int state, physstate, lwe, lws, lwa, lse, lss, lsa; + int peerlocalportnum, peerlwe, peerlws, peerlwa, peerlse, peerlss, peerlsa; + int width, peerwidth, peerspeed; char data[IB_SMP_DATA_SIZE]; + ib_portid_t peerportid = {0}; + int portnum = 0; + ib_portid_t selfportid = {0}; + int selfport = 0; static char const str_opts[] = "C:P:t:s:devDGVhu"; static const struct option long_opts[] = { @@ -282,13 +331,26 @@ main(int argc, char **argv) } } - if (port_op && (port_op != 4)) /* other than query or speed op */ - if ((err = get_node_info(&portid, data, argv+1, argc-1))) - IBERROR("smpquery nodeinfo: %s", err); - - printf("Initial PortInfo:\n"); - if ((err = get_port_info(&portid, data, argv+1, argc-1, port_op))) - IBERROR("smpquery portinfo: %s", err); + err = get_node_info(&portid, data); + if (err < 0) + IBERROR("smp query nodeinfo failed"); + if (err) { /* not switch */ + if (port_op == 0) /* query op */ + is_switch = 0; + else if (port_op != 4) /* other than speed op */ + IBERROR("smp query nodeinfo: Node type not switch"); + } + + if (argc-1 > 0) + portnum = strtol(argv[1], 0, 0); + + if (port_op) + printf("Initial PortInfo:\n"); + else + printf("PortInfo:\n"); + err = get_port_info(&portid, data, portnum, port_op); + if (err < 0) + IBERROR("smp query portinfo failed"); /* Only if one of the "set" options is chosen */ if (port_op) { @@ -303,13 +365,84 @@ main(int argc, char **argv) mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 0); } - if ((err = set_port_info(&portid, data, argv+1, argc-1, port_op))) - IBERROR("smpset portinfo: %s", err); + err = set_port_info(&portid, data, portnum, port_op); + if (err < 0) + IBERROR("smp set portinfo failed"); if (port_op == 3) { /* Reset port - so also enable */ mad_set_field(data, 0, IB_PORT_PHYS_STATE_F, 2); /* Polling */ - if ((err = set_port_info(&portid, data, argv+1, argc-1, port_op))) - IBERROR("smpset portinfo: %s", err); + err = set_port_info(&portid, data, portnum, port_op); + if (err < 0) + IBERROR("smp set portinfo failed"); + } + } else { /* query op */ + /* only compare peer port if switch port */ + if (is_switch) { + /* First, exclude SP0 */ + if (portnum) { + /* Now, make sure PortState is Active */ + /* Or is PortPhysicalState LinkUp sufficient ? */ + mad_decode_field(data, IB_PORT_STATE_F, &state); + mad_decode_field(data, IB_PORT_PHYS_STATE_F, &physstate); + if (state == 4) { /* Active */ + mad_decode_field(data, IB_PORT_LINK_WIDTH_ENABLED_F, &lwe ); + mad_decode_field(data, IB_PORT_LINK_WIDTH_SUPPORTED_F, &lws); + mad_decode_field(data, IB_PORT_LINK_WIDTH_ACTIVE_F, &lwa); + mad_decode_field(data, IB_PORT_LINK_SPEED_SUPPORTED_F, &lss); + mad_decode_field(data, IB_PORT_LINK_SPEED_ACTIVE_F, &lsa); + mad_decode_field(data, IB_PORT_LINK_SPEED_ENABLED_F, &lse); + + /* Setup portid for peer port */ + memcpy(&peerportid, &portid, sizeof(peerportid)); + peerportid.drpath.cnt = 1; + peerportid.drpath.p[1] = portnum; + + /* Set DrSLID to local lid */ + if (ib_resolve_self(&selfportid, &selfport, 0) < 0) + IBERROR("could not resolve self"); + peerportid.drpath.drslid = selfportid.lid; + peerportid.drpath.drdlid = 0xffff; + + /* Get peer port NodeInfo to obtain peer port number */ + err = get_node_info(&peerportid, data); + if (err < 0) + IBERROR("smp query nodeinfo failed"); + + mad_decode_field(data, IB_NODE_LOCAL_PORT_F, &peerlocalportnum); + + printf("Peer PortInfo:\n"); + /* Get peer port characteristics */ + err = get_port_info(&peerportid, data, peerlocalportnum, port_op); + if (err < 0) + IBERROR("smp query peer portinfofailed"); + + mad_decode_field(data, IB_PORT_LINK_WIDTH_ENABLED_F, &peerlwe ); + mad_decode_field(data, IB_PORT_LINK_WIDTH_SUPPORTED_F, &peerlws); + mad_decode_field(data, IB_PORT_LINK_WIDTH_ACTIVE_F, &peerlwa); + mad_decode_field(data, IB_PORT_LINK_SPEED_SUPPORTED_F, &peerlss); + mad_decode_field(data, IB_PORT_LINK_SPEED_ACTIVE_F, &peerlsa); + mad_decode_field(data, IB_PORT_LINK_SPEED_ENABLED_F, &peerlse); + + /* Now validate peer port characteristics */ + /* Examine Link Width */ + width = get_link_width(lwe, lws); + if (width & 0xe) { /* more than 1x */ + peerwidth = get_link_width(peerlwe, peerlws); + if (peerwidth & 0xe) + /* Look at active widths */ + validate_width(width, peerwidth, lwa); + } + + /* Examine Link Speed */ + speed = get_link_speed(lse, lss); + if (speed & 0x6) { /* more than 2.5 Gbps */ + peerspeed = get_link_speed(peerlse, peerlss); + if (peerspeed & 0x6) + /* Look at active speeds */ + validate_speed(speed, peerspeed, lsa); + } + } + } } } Index: man/ibportstate.8 =================================================================== --- man/ibportstate.8 (revision 9897) +++ man/ibportstate.8 (working copy) @@ -10,7 +10,9 @@ ibportstate \- handle port (physical) st .SH DESCRIPTION .PP ibportstate allows the port state and port physical state of an IB port -to be queried, or a switch port to be disabled, enabled, or reset. It +to be queried (in addition to link width and speed being validated +relative to the peer port when the port queried is a switch port), +or a switch port to be disabled, enabled, or reset. It also allows the link speed enabled on any IB port to be adjusted. .SH OPTIONS @@ -30,6 +32,13 @@ Port operations allowed this setting) (NOTE: Speed changes are not effected until the port goes through link renegotiation) + query also validates port characteristics (link width and speed) + based on the peer port. This checking is done when the port + queried is a switch port as it relies on combined routing + (an initial LID route with directed routing to the peer) which + can only be done on a switch. This peer port validation feature + of query op requires LID routing to be functioning in the subnet. + .SH COMMON OPTIONS From mlakshmanan at silverstorm.com Thu Oct 19 06:21:20 2006 From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu) Date: Thu, 19 Oct 2006 09:21:20 -0400 Subject: [openib-general] [PATCH] IB/SRP Userspace: srptools/srp_daemon - Fix connect bug and add support for user specified initiator extension In-Reply-To: <33587.194.90.237.34.1161261483.squirrel@dev.mellanox.co.il> Message-ID: > From: Ishai Rabinovitz [mailto:ishai at dev.mellanox.co.il] > Subject: Re: [openib-general] [PATCH] IB/SRP Userspace: > srptools/srp_daemon - Fix connect bug and add support for > user specified initiator extension > > Thanks for your patch. > > I agree with some of the changes you suggest and disagree > with others. It > will be useful to post a different patch for each logical change. > Thanks for the comments. I will make sure to separate out the logical changes into discrete patches the next time I submit a patch. > > 1. Fixes bug in srp_daemon for the case where if it is > > invoked with the '-e' option, it fails to connect to the > > SRP targets because of a newline character in the parameter string. > > You are right that the '\n' is redundant, but I have not seen > the bug it > creates. The last parameter in the string is considered to be > a string by > ib_srp and therefore ib_srp will ignore the newline. > I saw the following error message in /var/log/messages before I fixed the newline: messages:Oct 17 06:14:25 aspen kernel: ib_srp: unknown parameter or missing valu e 'io_class=ff00 messages:Oct 17 06:14:25 aspen kernel: ib_srp: unknown parameter or missing valu e 'io_class=ff00 messages:Oct 18 05:43:42 aspen kernel: ib_srp: unknown parameter or missing valu e 'io_class=ff00 Do you suspect the problem to be elsewhere? I was testing against Silverstorm SRP targets, but considering the error message, that should not have been relevant. The connection, of course, never got created, which is what prompted me to make the above fix. > > > 2. Changes the name of the constant > 'MAX_TRAGET_CONFIG_STR_STRING' to > 'MAX_TARGET_CONFIG_STR_STRING'. > > Thanks, I will apply this change. > Didn't really mean to nitpick on this typo. I decided to fix it only when I grep'ed for TARGET and found no matches. > > > 3. Changes the behavior of the '-n' option to srp_daemon. > > The earlier behavior printed the initiator extension. The > > new behavior allows the user to specify an initiator extension > > as an argument to the '-n' option. > > I do not think we want to change the -n behavior to this one. > First of all your approach induces the same initiator_ext to all the > targets discovered by this srp_daemon. The point you raise actually prompts another related question about srp_daemon behavior. Currently, srp_daemon connects to all the targets it finds, when given the '-e' option. I think adding a flag that would allow a user to specify the target IOC guid to connect to would help, and that flag when used with the initiator extension would be more useful. What do you think? > Secondly If someone uses random values for the initiator_ext > it may cause a waste of resources in the target: the target can never > tell when a connection had failed (or an initiator performed a boot) > and will keep the connection alive. When there is an attempt to reconnect > to this target with the same initiator_ext, the target knows he can > close the old connection. > This is the reason we decided to have a convention. The convention is to > use the target port guid. The advantage of this convention is that it > allows us to have two connection on the same time from an initiator to > both ports of the target. > > I understand that some targets may need a different > initiator_ext, Yes, Silverstorm SRP targets support multiple connections from a single HCA port to a single SRP target, for the purposes of establishing unique connections to specific storage devices that are *behind* an FC switch but all being accessible through the same target port. In summary, to fully support such SRP targets and the increased functionality that becomes possible because of them, supporting multiple initiator extension becomes a necessity. > but you should add a new flag for actually setting the > initiator_ext and leave -n untouched. > You are welcome to send such a patch. I agree, I will leave '-n' untouched and add a new flag, say '-x', that will allow a user to specify the initiator extension. > > Ishai There is another issue that we need to tackle as well. Currently, the check in srp_daemon, to see whether a target is already connected, does not take into account the initiator extension that a user could specify. That would need to be added to the check as well. From a quick examination of the code, it appears that the changes are both to the kernel module as well as to srp_daemon. I hope to have a separate patch for that ready in a few days. Thanks, appreciate your comments and feedback, Madhu. From tziporet at dev.mellanox.co.il Thu Oct 19 06:39:11 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 19 Oct 2006 15:39:11 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: References: Message-ID: <45377FFF.7060801@dev.mellanox.co.il> Hoang-Nam Nguyen wrote: > As Michael indicated in previous email configure is a generated file > from autogen.sh. And I'm not sure if your packaging script does generate > it automatically when you build the whole OFED tgz-file. > The actual patches in configure.in and config.h.in are needed because > I got compile error like: > static declaration of ibv_read_sysfs_file() does not match previous > declaration in infiniband/driver.h > Thanks! > Nam > > The release is closed. We only updating the documents now (will be closed in the coming few hours only). Since ehca is in technology preview state these issues are not blockers. Please document all issues in ehca release_notes (or send me parts you want to include). Tziporet From vlad at mellanox.co.il Thu Oct 19 06:40:45 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 19 Oct 2006 15:40:45 +0200 Subject: [openib-general] some OFED source/build questions Message-ID: <6C2C79E72C305246B504CBA17B5500C934783B@mtlexch01.mtl.com> Hi Or, Please see below, Regards, Vladimir > 2) OPENIB_PARAMS documentation > > Doing some probing, i undersrand that to set this or that > option to the > build i need to set this or that --with or --without > directive to the > SOURCES/openib-1.1/ofed_scripts/configure script and this is done by > setting the OPENIB_PARAMS env var while running the > install.sh script. > > Some of these --with/out options which need to be > documented somewhere > are not, for example to set CONFIG_INFINIBAND_IPOIB_DEBUG i need to > add --with-ipoib_debug-mod to the build, correct? > OPENIB_PARAMS supports only the following options: --with-memtrack --without-modprobe --with-madeye-mod --without-ipoibconf CONFIG_INFINIBAND_IPOIB_DEBUG option is set by default if IPoIB selected > Also, is there a way to see after building with which exact > CONFIG_INFINIBAND_ directives OFED was built? You can see all configuration options in the following way: Run: /etc/infiniband/info Alternatively, you can check 'prefix'/src/openib/configure.mk file. > > Or. > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Thu Oct 19 06:49:22 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 15:49:22 +0200 Subject: [openib-general] [PATCH] diags/ibroute: fix double calculated block value Message-ID: <20061019134922.GA21208@sashak.voltaire.com> Initial value of LFT block variable was double calculated (first time as 'startblock' and then block = startblock/BLOCK_SIZE). Signed-off-by: Sasha Khapyorsky --- diags/src/ibroute.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/diags/src/ibroute.c b/diags/src/ibroute.c index 08bdbdf..98f20e0 100644 --- a/diags/src/ibroute.c +++ b/diags/src/ibroute.c @@ -352,7 +352,7 @@ #endif printf(" Port Info \n"); startblock = startlid / IB_SMP_DATA_SIZE; endblock = ALIGN(endlid, IB_SMP_DATA_SIZE) / IB_SMP_DATA_SIZE; - for (block = startblock / IB_SMP_DATA_SIZE; block <= endblock; block++) { + for (block = startblock; block <= endblock; block++) { DEBUG("reading block %d", block); if (!smp_query(lft, portid, IB_ATTR_LINEARFORWTBL, block, 0)) return "linear forwarding table get failed"; -- 1.4.2.3.g128e From sashak at voltaire.com Thu Oct 19 06:50:28 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 15:50:28 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: comments fixing Message-ID: <20061019135028.GB21208@sashak.voltaire.com> Trivial comments fixing - incorrect function name in one place and incorrect function description in another. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_port.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/osm/include/opensm/osm_port.h b/osm/include/opensm/osm_port.h index 12ab7a7..15e0876 100644 --- a/osm/include/opensm/osm_port.h +++ b/osm/include/opensm/osm_port.h @@ -1669,9 +1669,9 @@ osm_port_get_phys_ptr( * Port *********/ -/****f* OpenSM: Port/osm_port_get_phys_ptr +/****f* OpenSM: Port/osm_port_get_default_phys_ptr * NAME -* osm_port_get_phys_ptr +* osm_port_get_default_phys_ptr * * DESCRIPTION * Gets the pointer to the default Physical Port object. @@ -1708,7 +1708,7 @@ osm_port_get_default_phys_ptr( * osm_port_get_parent_node * * DESCRIPTION -* Gets the pointer to the specified Physical Port object. +* Gets the pointer to the this port's Node object. * * SYNOPSIS */ -- 1.4.2.3.g128e From halr at voltaire.com Thu Oct 19 07:01:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Oct 2006 10:01:42 -0400 Subject: [openib-general] [PATCH TRIVIAL] opensm: comments fixing In-Reply-To: <20061019135028.GB21208@sashak.voltaire.com> References: <20061019135028.GB21208@sashak.voltaire.com> Message-ID: <1161266488.25985.67788.camel@hal.voltaire.com> On Thu, 2006-10-19 at 09:50, Sasha Khapyorsky wrote: > Trivial comments fixing - incorrect function name in one place and > incorrect function description in another. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Thu Oct 19 07:08:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Oct 2006 10:08:28 -0400 Subject: [openib-general] [PATCH] diags/ibroute: fix double calculated block value In-Reply-To: <20061019134922.GA21208@sashak.voltaire.com> References: <20061019134922.GA21208@sashak.voltaire.com> Message-ID: <1161266904.25985.68049.camel@hal.voltaire.com> On Thu, 2006-10-19 at 09:49, Sasha Khapyorsky wrote: > Initial value of LFT block variable was double calculated (first time > as 'startblock' and then block = startblock/BLOCK_SIZE). > > Signed-off-by: Sasha Khapyorsky Good catch. Thanks. Applied. -- Hal From HNGUYEN at de.ibm.com Thu Oct 19 07:41:17 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 19 Oct 2006 16:41:17 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <45377FFF.7060801@dev.mellanox.co.il> Message-ID: Hi, > The release is closed. We only updating the documents now (will be > closed in the coming few hours only). > Since ehca is in technology preview state these issues are not blockers. > Please document all issues in ehca release_notes (or send me parts you > want to include). Even though ehca is in technology preview state, there are certainly some ones who would like to use it. In current ofed the build process is incomplete for ehca user space. We did send the patches yesterday as discussed w/ Vladimir and Michael and considered to be on the category "2. Small updates for the install". It's very important for us that one can get ehca compiled and built with ofed. Since those patches do not touch any other components and we were one day before todays deadline, we're very surprised that you reject this code now. What should be the next steps? Regards Nam From dledford at redhat.com Thu Oct 19 07:40:36 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 19 Oct 2006 10:40:36 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061019050907.GA1547@mellanox.co.il> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> Message-ID: <1161268837.2917.544.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 07:09 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > > > On Wed, 2006-10-18 at 09:29 +0200, Michael S. Tsirkin wrote: > > > Quoting r. Doug Ledford : > > > > > >From our dicussion, it seems we should be able to just push the > > > > > small number of missing bits into RHEL5 directly. That would be > > > > > nicer of course. > > > > > > > > It depends. If there's lots of individual changes, it might be easier > > > > to push the OFED 1.1 change. But, that depends on when the final OFED > > > > 1.1 comes out and how much it varies from the existing RPMs. > > > > > > OFED is in deep freeze, so you can already look at it to estimate the amount of > > > changes against 2.6.18. > > > Could you look at the diff please so that I know whether it's worth it > > > to invest in building the minimal patch set for pushing into RHEL5, > > > or whether you'll push OFED 1.1 into RHEL kernel as is? > > > > Yeah, I'll look over the diff today. > > How does it look? Didn't get around to it. Instead, I was fixing a buffer overflow problem in openmpi (reuse of the len variable without resetting it to the correct value after the bottom of the loop does len = strlen(desc); causes the snprintf() in the loop to trigger as a buffer overflow when compiled with FORTIFY_SOURCE, patch attached) and reviewing arpingib (which I'm going to remove from the ipoibtools and fix the native arping in RHEL5 to work properly over IB without needing a new flag, the -A or -U flags should be sufficient assuming those modes worked at all over IB which they don't in either the native arping or the patched arpingib in ipoibtools). I should get to it today though. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: openmpi-1.1.1-overflow.patch Type: text/x-patch Size: 1012 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From vlad at mellanox.co.il Thu Oct 19 07:45:50 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 19 Oct 2006 16:45:50 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs Message-ID: <6C2C79E72C305246B504CBA17B5500C9347883@mtlexch01.mtl.com> Hi Nam, The code that fixes lib/lib64 issue on ppc64 and libehca.so to be in libehca RPM is in OFED-1.1. Regards, Vladimir > -----Original Message----- > From: Hoang-Nam Nguyen [mailto:HNGUYEN at de.ibm.com] > Sent: Thursday, October 19, 2006 4:41 PM > To: Tziporet Koren > Cc: Michael S. Tsirkin; openfabrics-ewg at openib.org; > openib-general at openib.org; Vladimir Sokolovsky; Chet Mehta > Subject: Re: [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca > configure: fix missing check of libsysfs > > Hi, > > The release is closed. We only updating the documents now (will be > > closed in the coming few hours only). > > Since ehca is in technology preview state these issues are > not blockers. > > Please document all issues in ehca release_notes (or send > me parts you > > want to include). > Even though ehca is in technology preview state, there are > certainly some ones who would like to use it. In current ofed > the build process is incomplete for ehca user space. We did > send the patches yesterday as discussed w/ Vladimir and > Michael and considered to be on the category "2. Small > updates for the install". It's very important for us that one > can get ehca compiled and built with ofed. Since those > patches do not touch any other components and we were one day > before todays deadline, we're very surprised that you reject > this code now. What should be the next steps? > Regards > Nam > From rdreier at cisco.com Thu Oct 19 07:51:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 07:51:31 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: (Shirley Ma's message of "Wed, 18 Oct 2006 23:15:25 -0700") References: Message-ID: Shirley> I am not sure whether this already has an answer. The Shirley> justification is madvise(..., MADV_DONTFORK) is used to Shirley> make fork() work for verbs consumers in the recent Shirley> packages. I hope same patch will be in libehca. That's not in any released version of libibverbs, and not in the libibverbs 1.0 tree at all (it is only in libibverbs 1.1-pre development code) From HNGUYEN at de.ibm.com Thu Oct 19 07:56:06 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 19 Oct 2006 16:56:06 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <6C2C79E72C305246B504CBA17B5500C9347883@mtlexch01.mtl.com> Message-ID: Hi Vlad, > The code that fixes lib/lib64 issue on ppc64 and libehca.so to be in > libehca RPM is in OFED-1.1. Thanks! Can you pls confirm if configure.in and config.h.in are in OFED-1.1 as well? Regards Nam From ogerlitz at voltaire.com Thu Oct 19 08:02:57 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 19 Oct 2006 17:02:57 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161268837.2917.544.camel@fc6.xsintricity.com> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> <1161268837.2917.544.camel@fc6.xsintricity.com> Message-ID: <453793A1.8000000@voltaire.com> Doug Ledford wrote: > ... and reviewing arpingib > (which I'm going to remove from the ipoibtools and fix the native arping > in RHEL5 to work properly over IB without needing a new flag, the -A or > -U flags should be sufficient assuming those modes worked at all over IB > which they don't in either the native arping or the patched arpingib in > ipoibtools). I should get to it today though. Would you mind send the patch to arping for review? Or. From vlad at mellanox.co.il Thu Oct 19 08:07:16 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 19 Oct 2006 17:07:16 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs Message-ID: <6C2C79E72C305246B504CBA17B5500C934789A@mtlexch01.mtl.com> No, the updated configure.in and config.h.in are not in OFED-1.1. In any case, I believe that most of the checks you have added to configure scripts are provided by OFED installation scripts. So, in case OFED installation fails, ehca configure would fail as well. Regards, Vladimir > -----Original Message----- > From: Hoang-Nam Nguyen [mailto:HNGUYEN at de.ibm.com] > Sent: Thursday, October 19, 2006 4:56 PM > To: Vladimir Sokolovsky > Cc: Michael S. Tsirkin; openfabrics-ewg at openib.org; > openfabrics-ewg-bounces at openib.org; > openib-general at openib.org; Tziporet Koren; Christoph Raisch > Subject: Re: [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca > configure: fix missing check of libsysfs > > Hi Vlad, > > The code that fixes lib/lib64 issue on ppc64 and libehca.so > to be in > > libehca RPM is in OFED-1.1. > Thanks! Can you pls confirm if configure.in and config.h.in are in > OFED-1.1 as well? > Regards > Nam > From RAISCH at de.ibm.com Thu Oct 19 08:19:21 2006 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Thu, 19 Oct 2006 17:19:21 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <6C2C79E72C305246B504CBA17B5500C934789A@mtlexch01.mtl.com> Message-ID: so this means this problem we try to fix is already covered by your new setup? Do you have a version we could try? Gruss / Regards . . . Christoph Raisch christoph raisch, HCAD teamlead, IODF2 (d/3627), ibm boeblingen lab, phone: (+49/0)7031-16 4584, fax: -16 2042, loc: 71032-05-003, internet: raisch at de.ibm.com "Vladimir Sokolovsky" Hoang-Nam Nguyen/Germany/IBM at IBMDE cc 19.10.2006 17:07 "Michael S. Tsirkin" , , , , "Tziporet Koren" , Christoph Raisch/Germany/IBM at IBMDE Subject RE: [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs No, the updated configure.in and config.h.in are not in OFED-1.1. In any case, I believe that most of the checks you have added to configure scripts are provided by OFED installation scripts. So, in case OFED installation fails, ehca configure would fail as well. Regards, Vladimir > -----Original Message----- > From: Hoang-Nam Nguyen [mailto:HNGUYEN at de.ibm.com] > Sent: Thursday, October 19, 2006 4:56 PM > To: Vladimir Sokolovsky > Cc: Michael S. Tsirkin; openfabrics-ewg at openib.org; > openfabrics-ewg-bounces at openib.org; > openib-general at openib.org; Tziporet Koren; Christoph Raisch > Subject: Re: [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca > configure: fix missing check of libsysfs > > Hi Vlad, > > The code that fixes lib/lib64 issue on ppc64 and libehca.so > to be in > > libehca RPM is in OFED-1.1. > Thanks! Can you pls confirm if configure.in and config.h.in are in > OFED-1.1 as well? > Regards > Nam > From xma at us.ibm.com Thu Oct 19 08:15:42 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 19 Oct 2006 08:15:42 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: Message-ID: Roland Dreier wrote on 10/19/2006 07:51:31 AM: > Shirley> I am not sure whether this already has an answer. The > Shirley> justification is madvise(..., MADV_DONTFORK) is used to > Shirley> make fork() work for verbs consumers in the recent > Shirley> packages. I hope same patch will be in libehca. > > That's not in any released version of libibverbs, and not in the > libibverbs 1.0 tree at all (it is only in libibverbs 1.1-pre > development code) How can RHEL5 pick up this particular patch? Applications with fork() depend on this patch. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Oct 19 08:17:07 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 19 Oct 2006 08:17:07 -0700 Subject: [openib-general] [PATCH] Fix some cancellation problems in process_req(). In-Reply-To: References: Message-ID: <453796F3.9080404@ichips.intel.com> Krishna Kumar2 wrote: > I felt that since the time_after() check matched (in all likelyhood) due > to the > processing of the cancellation, ECANCELLED is more appropriate to return. > It is most likely that if both conditions are true, that a cancelled > operation led to > the time_after() match (cancel sets time to jiffies resulting in this > time_after match). > Chances of both happening together is almost zero. > > Do you agree ? Otherwise I can re-work the patch as suggested. The only thing I would change about your patch is the if check as indicated. - Sean From mshefty at ichips.intel.com Thu Oct 19 08:21:11 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 19 Oct 2006 08:21:11 -0700 Subject: [openib-general] [PATCH] rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: References: Message-ID: <453797E7.30409@ichips.intel.com> Krishna Kumar2 wrote: > This will mean that a deref is wrongly done if a loopback or zero address > is > passed to this function, without it having done a ref inc. I do think this > case > requires a variable to indicate whether a ref was got or not. Assuming > that is > true, I will submit a patch with your comment about holding the lock. Yes, you're right. We would need to do: if (!cma_any_addr(addr)) again before calling detach. - Sean From ogerlitz at voltaire.com Thu Oct 19 08:25:24 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 19 Oct 2006 17:25:24 +0200 Subject: [openib-general] some OFED source/build questions In-Reply-To: <6C2C79E72C305246B504CBA17B5500C934783B@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C934783B@mtlexch01.mtl.com> Message-ID: <453798E4.6000500@voltaire.com> Vladimir Sokolovsky wrote: > OPENIB_PARAMS supports only the following options: > --with-memtrack > --without-modprobe > --with-madeye-mod > --without-ipoibconf > > CONFIG_INFINIBAND_IPOIB_DEBUG option is set by default if IPoIB selected > You can see all configuration options in the following way: > Run: /etc/infiniband/info > Alternatively, you can check 'prefix'/src/openib/configure.mk file. Ok, thanks for the answer. Or. From dotanb at dev.mellanox.co.il Thu Oct 19 08:26:17 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Thu, 19 Oct 2006 17:26:17 +0200 Subject: [openib-general] Catastrophic error detected. In-Reply-To: <20061018131317.1d187ad1.weiny2@llnl.gov> References: <20061018131317.1d187ad1.weiny2@llnl.gov> Message-ID: <45379919.9020104@dev.mellanox.co.il> Hi Ira. Ira Weiny wrote: > I got the following error running with OFED 1.1 on a modified 2.6.9 RHEL4 > kernel. Hal mentioned that there might be a catastrophic error recovery patch > submitted since then? I can't find a mention of that in the mailing list. If > possible I would like to try such a patch. > > Thanks, > Ira > > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: Catastrophic error detected: unknown error > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[00]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[01]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[02]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[03]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[04]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[05]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[06]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[07]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[08]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[09]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0a]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0b]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0c]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0d]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0e]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0f]: ffffffff > > # rhea277 /root > /sbin/lspci -vv -s 07:00.0 > 07:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev 20) > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Interrupt: pin A routed to IRQ 217 > Region 0: Memory at dff00000 (64-bit, non-prefetchable) [disabled] [size=1M] > Region 2: Memory at de800000 (64-bit, prefetchable) [disabled] [size=8M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] Vital Product Data > Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=0 offset=00082000 > PBA: BAR=0 offset=00082200 > Capabilities: [60] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- > Device: Latency L0s <64ns, L1 unlimited > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- > Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > Device: MaxPayload 128 bytes, MaxReadReq 512 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8 > Link: Latency L0s unlimited, L1 unlimited > Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > can you please give me some info on how you got this error: * what did you do that caused this error? * which FW version do you have? * what is the board_id of the HCA? (you can find this info using ibv_devinfo) thanks Dotan From kliteyn at dev.mellanox.co.il Thu Oct 19 08:26:47 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 19 Oct 2006 17:26:47 +0200 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c Message-ID: <45379937.4040004@dev.mellanox.co.il> In case osmtest failed to remove some MC group, dumping all the MC groups that still remain, and their member. -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest/osmt_multicast.c =================================================================== --- osmtest/osmt_multicast.c (revision 9907) +++ osmtest/osmt_multicast.c (working copy) @@ -57,6 +57,86 @@ /********************************************************************** **********************************************************************/ +static void +__osmt_print_all_multicast_records( + IN osmtest_t * const p_osmt) +{ + int i; + ib_api_status_t status; + osmv_query_req_t req; + osmv_user_query_t user; + osmtest_req_context_t context; + ib_member_rec_t * mcast_record; + + memset( &context, 0, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + + user.attr_id = IB_MAD_ATTR_MCMEMBER_RECORD; + user.attr_offset = ib_get_attr_offset(sizeof(*mcast_record)); + + req.query_type = OSMV_QUERY_USER_DEFINED; + req.timeout_ms = p_osmt->opt.transaction_timeout; + req.retry_cnt = 1; + req.flags = OSM_SA_FLAGS_SYNC; + context.p_osmt = p_osmt; + req.query_context = &context; + req.pfn_query_cb = osmtest_query_res_cb; + req.p_query_input = &user; + + /* UnTrusted - get the multicast groups */ + req.sm_key = 0; + status = osmv_query_sa(p_osmt->h_bind, &req); + + if (status != IB_SUCCESS || context.result.status != IB_SUCCESS) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "__osmt_print_all_multicast_records: ERR 02B5: " + "Failed getting the multicast groups records - %s/%s\n", + ib_get_err_str(status), + ib_get_err_str(context.result.status) ); + return; + } + + osm_log( &p_osmt->log, OSM_LOG_INFO, + "\n |------------------------------------------|" + "\n | Remaining Multicast Groups |" + "\n |------------------------------------------|\n" ); + + for (i = 0; i < context.result.result_cnt; i++) { + mcast_record = osmv_get_query_mc_rec(context.result.p_result_madw, i); + osm_dump_mc_record(&p_osmt->log, mcast_record,OSM_LOG_INFO); + } + + /* Trusted - now get the multicast group members */ + req.sm_key = OSM_DEFAULT_SM_KEY; + status = osmv_query_sa(p_osmt->h_bind, &req); + + if (status != IB_SUCCESS || context.result.status != IB_SUCCESS) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "__osmt_print_all_multicast_records: ERR 02B6: " + "Failed getting the multicast group members records - %s/%s\n", + ib_get_err_str(status), + ib_get_err_str(context.result.status) ); + return; + } + + osm_log( &p_osmt->log, OSM_LOG_INFO, + "\n |--------------------------------------------------|" + "\n | Remaining Multicast Group Memebers |" + "\n |--------------------------------------------------|\n" ); + + for (i = 0; i < context.result.result_cnt; i++) { + mcast_record = osmv_get_query_mc_rec(context.result.p_result_madw, i); + osm_dump_mc_record(&p_osmt->log, mcast_record,OSM_LOG_INFO); + } + +} + +/********************************************************************** + **********************************************************************/ + static cl_status_t __match_mgids( @@ -3403,10 +3483,6 @@ osmt_run_mcast_flow( IN osmtest_t * cons cl_ntoh64(p_mgrp->mcmember_rec.mgid.unicast.interface_id), mlid ); got_error = TRUE; - - /** - * ToDo: Query all the group members of this MC Group - **/ } else { @@ -3421,7 +3497,10 @@ osmt_run_mcast_flow( IN osmtest_t * cons } if (got_error) + { status = IB_ERROR; + __osmt_print_all_multicast_records(p_osmt); + } } Exit: OSM_LOG_EXIT( &p_osmt->log ); From ogerlitz at voltaire.com Thu Oct 19 08:30:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 19 Oct 2006 17:30:19 +0200 (IST) Subject: [openib-general] OFED1.1-rc7 build failure on 2.6.9-prep (RH4 U3 hand built) system Message-ID: Vlad, I am trying to build OFED1.1-rc7 on a system where i have built the RH4 U3 kernel manually from its source rpm (that prepared kernel-2.6.9-34.EL.src.rpm for being built and then did make, make modules_install, make install and reboot) and get the errors below. Do you have any idea what i am doing wrong? If i don't build/install/boot in this kernel but rather only do the initial setting of the kernel sources (ie make modules_prepare etc), OFED gets built fine. The configure line i was using is: ./configure --with-mthca-mod --with-core-mod --with-ipoib-mod --with-ipoib_debug-mod I have created the directory OFED-1.1-rc7/SOURCES/2.6.9 by hand and opened there the tgz file. Or. # uname -a Linux excell01.voltaire.com 2.6.9-prep #1 SMP Thu Oct 19 18:02:31 IST 2006 x86_64 x86_64 x86_64 GNU/Linux # ls -l /lib/modules/2.6.9-prep/ total 592 lrwxrwxrwx 1 root root 46 Oct 19 18:10 build -> /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9 drwxr-xr-x 2 root root 4096 Oct 18 16:17 extra drwxr-xr-x 9 root root 4096 Oct 19 18:10 kernel -rw-r--r-- 1 root root 113700 Oct 19 18:10 modules.alias -rw-r--r-- 1 root root 69 Oct 19 18:10 modules.ccwmap -rw-r--r-- 1 root root 112864 Oct 19 18:10 modules.dep -rw-r--r-- 1 root root 73 Oct 19 18:10 modules.ieee1394map -rw-r--r-- 1 root root 357 Oct 19 18:10 modules.inputmap -rw-r--r-- 1 root root 235 Oct 19 18:10 modules.isapnpmap -rw-r--r-- 1 root root 104476 Oct 19 18:10 modules.pcimap -rw-r--r-- 1 root root 62764 Oct 19 18:10 modules.symbols -rw-r--r-- 1 root root 155849 Oct 19 18:10 modules.usbmap lrwxrwxrwx 1 root root 46 Oct 19 18:10 source -> /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9 # make kernel Building kernel modules Kernel version: 2.6.9-prep Modules directory: //lib/modules/2.6.9-prep Kernel sources: /lib/modules/2.6.9-prep/build env EXTRA_CFLAGS=" -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/include \ -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/ulp/ipoib \ -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/debug" \ make -C /lib/modules/2.6.9-prep/build SUBDIRS="/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband" KERNELRELEASE=2.6.9-prep \ EXTRAVERSION=-prep V=1 \ CONFIG_INFINIBAND=m \ CONFIG_INFINIBAND_IPOIB=m \ CONFIG_INFINIBAND_SDP= \ CONFIG_INFINIBAND_SRP= \ CONFIG_INFINIBAND_USER_MAD= \ CONFIG_INFINIBAND_USER_ACCESS= \ CONFIG_INFINIBAND_ADDR_TRANS= \ CONFIG_INFINIBAND_MTHCA=m \ CONFIG_INFINIBAND_IPOIB_DEBUG=y \ CONFIG_INFINIBAND_ISER= \ CONFIG_INFINIBAND_EHCA= \ CONFIG_INFINIBAND_RDS= \ CONFIG_INFINIBAND_RDS_DEBUG= \ CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ CONFIG_INFINIBAND_SDP_DEBUG= \ CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ CONFIG_INFINIBAND_IPATH= \ CONFIG_INFINIBAND_MTHCA_DEBUG=y \ CONFIG_INFINIBAND_MADEYE= \ LINUXINCLUDE='-I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include \ -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/include \ -Iinclude \ $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ -include include/linux/autoconf.h \ -include /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/autoconf.h \ ' \ modules make[1]: Entering directory `/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9' mkdir -p /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/.tmp_versions make -f scripts/Makefile.build obj=/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband make -f scripts/Makefile.build obj=/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core gcc -Wp,-MD,/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/.cm.o.d -nostdinc -iwithprefix include -D__KERNEL__ -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/include -Iinclude -include include/linux/autoconf.h -include /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/autoconf.h -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -g -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/include -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/ulp/ipoib -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/debug -D__nocast= -DMODULE -DKBUILD_BASENAME=cm -DKBUILD_MODNAME=ib_cm -c -o /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/.tmp_cm.o /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/cm.c In file included from include/linux/slab.h:15, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/slab.h:4, from include/linux/percpu.h:4, from include/linux/sched.h:31, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/sched.h:4, from include/linux/module.h:10, from include/linux/device.h:20, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/device.h:4, from include/linux/dma-mapping.h:4, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/cm.c:39: include/linux/gfp.h:134: error: conflicting types for 'gfp_t' /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/types.h:7: error: previous declaration of 'gfp_t' was here In file included from include/linux/percpu.h:4, from include/linux/sched.h:31, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/sched.h:4, from include/linux/module.h:10, from include/linux/device.h:20, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/device.h:4, from include/linux/dma-mapping.h:4, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/cm.c:39: /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/slab.h:8: warning: static declaration of 'kzalloc' follows non-static declaration include/linux/slab.h:101: warning: previous declaration of 'kzalloc' was here In file included from include/linux/module.h:10, from include/linux/device.h:20, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/device.h:4, from include/linux/dma-mapping.h:4, from /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/cm.c:39: /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/linux/sched.h:8: warning: static declaration of 'wait_for_completion_timeout' follows non-static declaration include/linux/completion.h:32: warning: previous declaration of 'wait_for_completion_timeout' was here make[3]: *** [/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core/cm.o] Error 1 make[2]: *** [/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband/core] Error 2 make[1]: *** [_module_/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/infiniband] Error 2 make[1]: Leaving directory `/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9' make: *** [kernel] Error 2 From dledford at redhat.com Thu Oct 19 08:37:55 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 19 Oct 2006 11:37:55 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: References: Message-ID: <1161272275.2917.560.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 08:15 -0700, Shirley Ma wrote: > Roland Dreier wrote on 10/19/2006 07:51:31 AM: > > Shirley> I am not sure whether this already has an answer. The > > Shirley> justification is madvise(..., MADV_DONTFORK) is used to > > Shirley> make fork() work for verbs consumers in the recent > > Shirley> packages. I hope same patch will be in libehca. > > > > That's not in any released version of libibverbs, and not in the > > libibverbs 1.0 tree at all (it is only in libibverbs 1.1-pre > > development code) > > How can RHEL5 pick up this particular patch? Applications with fork() > depend on this patch. Well, the support isn't well tested, it isn't released, and ISTR that it isn't even required by the MPI spec since that leaves behavior of an MPI app undefined after a fork() call and hence any application written to depend on undefined behavior is broken by design, so I'm leaning towards this being a good example of when people just need to know when to say no. But, I'm open to being shown I'm wrong. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From dledford at redhat.com Thu Oct 19 08:39:01 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 19 Oct 2006 11:39:01 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <453793A1.8000000@voltaire.com> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> <1161268837.2917.544.camel@fc6.xsintricity.com> <453793A1.8000000@voltaire.com> Message-ID: <1161272341.2917.562.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 17:02 +0200, Or Gerlitz wrote: > Doug Ledford wrote: > > ... and reviewing arpingib > > (which I'm going to remove from the ipoibtools and fix the native arping > > in RHEL5 to work properly over IB without needing a new flag, the -A or > > -U flags should be sufficient assuming those modes worked at all over IB > > which they don't in either the native arping or the patched arpingib in > > ipoibtools). I should get to it today though. > > Would you mind send the patch to arping for review? When it's done, sure. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jsquyres at cisco.com Thu Oct 19 08:40:34 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 19 Oct 2006 11:40:34 -0400 Subject: [openib-general] Tools for development In-Reply-To: References: <01162B18-C054-426E-A473-F9ED7745F6E9@cisco.com> <20061017134550.GE20690@mellanox.co.il> Message-ID: <322BD2DF-41F2-409E-971A-D074FDEF9C45@cisco.com> On Oct 18, 2006, at 8:10 AM, Jeff Squyres wrote: >> One feature that bugzilla has (and that seems to be disabled in >> openib bugzilla >> :() is mail integration, where I can Cc bugzilla and mail contents >> will get >> attached to bug report. I was hoping that new server will have this >> capability. Does trac have this? It appears that trac does not support this type of feature. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From vlad at mellanox.co.il Thu Oct 19 09:16:46 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 19 Oct 2006 18:16:46 +0200 Subject: [openib-general] OFED1.1-rc7 build failure on 2.6.9-prep (RH4 U3 hand built) system Message-ID: <6C2C79E72C305246B504CBA17B5500C93478F4@mtlexch01.mtl.com> Hi Or, I think that required for 2.6.9-34.EL kernel backport patches from kernel_patches/backport/2.6.9_U3 directory are not applied by configure script. You should change kernel name to be 2.6.9-34*. Regards, Vladimir > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Thursday, October 19, 2006 5:30 PM > To: Vladimir Sokolovsky > Cc: openib-general at openib.org; openfabrics-ewg at openib.org > Subject: OFED1.1-rc7 build failure on 2.6.9-prep (RH4 U3 hand > built) system > > Vlad, > > I am trying to build OFED1.1-rc7 on a system where i have > built the RH4 U3 kernel manually from its source rpm (that > prepared kernel-2.6.9-34.EL.src.rpm for being built and then > did make, make modules_install, make install and reboot) and > get the errors below. > > Do you have any idea what i am doing wrong? If i don't > build/install/boot in this kernel but rather only do the > initial setting of the kernel sources (ie make > modules_prepare etc), OFED gets built fine. > > The configure line i was using is: > > ./configure --with-mthca-mod --with-core-mod --with-ipoib-mod > --with-ipoib_debug-mod > > I have created the directory OFED-1.1-rc7/SOURCES/2.6.9 by > hand and opened there the tgz file. > > Or. > > # uname -a > Linux excell01.voltaire.com 2.6.9-prep #1 SMP Thu Oct 19 > 18:02:31 IST 2006 x86_64 x86_64 x86_64 GNU/Linux > > # ls -l /lib/modules/2.6.9-prep/ > total 592 > lrwxrwxrwx 1 root root 46 Oct 19 18:10 build -> > /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9 > drwxr-xr-x 2 root root 4096 Oct 18 16:17 extra > drwxr-xr-x 9 root root 4096 Oct 19 18:10 kernel > -rw-r--r-- 1 root root 113700 Oct 19 18:10 modules.alias > -rw-r--r-- 1 root root 69 Oct 19 18:10 modules.ccwmap > -rw-r--r-- 1 root root 112864 Oct 19 18:10 modules.dep > -rw-r--r-- 1 root root 73 Oct 19 18:10 modules.ieee1394map > -rw-r--r-- 1 root root 357 Oct 19 18:10 modules.inputmap > -rw-r--r-- 1 root root 235 Oct 19 18:10 modules.isapnpmap > -rw-r--r-- 1 root root 104476 Oct 19 18:10 modules.pcimap > -rw-r--r-- 1 root root 62764 Oct 19 18:10 modules.symbols > -rw-r--r-- 1 root root 155849 Oct 19 18:10 modules.usbmap > lrwxrwxrwx 1 root root 46 Oct 19 18:10 source -> > /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9 > > > # make kernel > > Building kernel modules > Kernel version: 2.6.9-prep > Modules directory: //lib/modules/2.6.9-prep Kernel sources: > /lib/modules/2.6.9-prep/build env EXTRA_CFLAGS=" > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include > > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/include \ > > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/ulp/ipoib \ > > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/debug" \ > make -C /lib/modules/2.6.9-prep/build > SUBDIRS="/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/ > drivers/infiniband" KERNELRELEASE=2.6.9-prep \ > EXTRAVERSION=-prep V=1 \ > CONFIG_INFINIBAND=m \ > CONFIG_INFINIBAND_IPOIB=m \ > CONFIG_INFINIBAND_SDP= \ > CONFIG_INFINIBAND_SRP= \ > CONFIG_INFINIBAND_USER_MAD= \ > CONFIG_INFINIBAND_USER_ACCESS= \ > CONFIG_INFINIBAND_ADDR_TRANS= \ > CONFIG_INFINIBAND_MTHCA=m \ > CONFIG_INFINIBAND_IPOIB_DEBUG=y \ > CONFIG_INFINIBAND_ISER= \ > CONFIG_INFINIBAND_EHCA= \ > CONFIG_INFINIBAND_RDS= \ > CONFIG_INFINIBAND_RDS_DEBUG= \ > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \ > CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \ > CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \ > CONFIG_INFINIBAND_SDP_DEBUG= \ > CONFIG_INFINIBAND_SDP_DEBUG_DATA= \ > CONFIG_INFINIBAND_IPATH= \ > CONFIG_INFINIBAND_MTHCA_DEBUG=y \ > CONFIG_INFINIBAND_MADEYE= \ > > LINUXINCLUDE='-I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/open > ib-1.1/include \ > > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/include \ > -Iinclude \ > $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \ > -include include/linux/autoconf.h \ > -include > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/autoconf.h \ > ' \ > modules > make[1]: Entering directory > `/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9' > mkdir -p > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/i > nfiniband/.tmp_versions > make -f scripts/Makefile.build > obj=/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drive > rs/infiniband > make -f scripts/Makefile.build > obj=/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drive > rs/infiniband/core > gcc > -Wp,-MD,/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/d > rivers/infiniband/core/.cm.o.d -nostdinc -iwithprefix include > -D__KERNEL__ > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include > > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/include -Iinclude -include > include/linux/autoconf.h -include > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/autoconf.h -Wall -Wstrict-prototypes -Wno-trigraphs > -fno-strict-aliasing -fno-common -Os -fomit-frame-pointer -g > -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel > -pipe -fno-reorder-blocks -Wno-sign-compare > -funit-at-a-time > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include > > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/include > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/ulp/ipoib > -I/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers > /infiniband/debug -D__nocast= -DMODULE -DKBUILD_BASENAME=cm > -DKBUILD_MODNAME=ib_cm -c -o > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/i > nfiniband/core/.tmp_cm.o > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/i > nfiniband/core/cm.c > In file included from include/linux/slab.h:15, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/slab.h:4, > from include/linux/percpu.h:4, > from include/linux/sched.h:31, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/sched.h:4, > from include/linux/module.h:10, > from include/linux/device.h:20, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/device.h:4, > from include/linux/dma-mapping.h:4, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/i > nfiniband/core/cm.c:39: > include/linux/gfp.h:134: error: conflicting types for 'gfp_t' > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/types.h:7: error: previous declaration of 'gfp_t' was > here In file included from include/linux/percpu.h:4, > from include/linux/sched.h:31, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/sched.h:4, > from include/linux/module.h:10, > from include/linux/device.h:20, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/device.h:4, > from include/linux/dma-mapping.h:4, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/i > nfiniband/core/cm.c:39: > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/slab.h:8: warning: static declaration of 'kzalloc' > follows non-static declaration > include/linux/slab.h:101: warning: previous declaration of > 'kzalloc' was here In file included from include/linux/module.h:10, > from include/linux/device.h:20, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/device.h:4, > from include/linux/dma-mapping.h:4, > from > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/i > nfiniband/core/cm.c:39: > /home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/include/l > inux/sched.h:8: warning: static declaration of > 'wait_for_completion_timeout' follows non-static declaration > include/linux/completion.h:32: warning: previous declaration > of 'wait_for_completion_timeout' was here > make[3]: *** > [/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/ > infiniband/core/cm.o] Error 1 > make[2]: *** > [/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/drivers/ > infiniband/core] Error 2 > make[1]: *** > [_module_/home/ogerlitz/OFED-1.1-rc7/SOURCES/2.6.9/openib-1.1/ > drivers/infiniband] Error 2 > make[1]: Leaving directory > `/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9' > make: *** [kernel] Error 2 > From rdreier at cisco.com Thu Oct 19 09:19:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 09:19:14 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: (Shirley Ma's message of "Thu, 19 Oct 2006 08:15:42 -0700") References: Message-ID: Shirley> How can RHEL5 pick up this particular patch? Applications Shirley> with fork() depend on this patch. It can't really, since it breaks the libibverbs ABI and therefore has to be part of a major release. From xma at us.ibm.com Thu Oct 19 09:47:01 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 19 Oct 2006 09:47:01 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: Message-ID: Roland Dreier wrote on 10/19/2006 09:19:14 AM: > Shirley> How can RHEL5 pick up this particular patch? Applications > Shirley> with fork() depend on this patch. > > It can't really, since it breaks the libibverbs ABI and therefore has > to be part of a major release. Then we need to wait for the new release or find an alternative way which I doubt. Thanks Shirley Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: From dledford at redhat.com Thu Oct 19 09:49:47 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 19 Oct 2006 12:49:47 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061019050907.GA1547@mellanox.co.il> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> Message-ID: <1161276587.2917.590.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 07:09 +0200, Michael S. Tsirkin wrote: > > Yeah, I'll look over the diff today. > > How does it look? Not too far in yet, but the srp_topspin patch in the kernel_patches/fixes directory appears to have munged whitespace. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From sweitzen at cisco.com Thu Oct 19 11:24:22 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 19 Oct 2006 11:24:22 -0700 Subject: [openib-general] udev on RHEL Message-ID: Doug, what udev does RHEL5 beta have? Any plans to upgrade udev for RHEL4 U5? Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Ishai Rabinovitz Sent: Tuesday, October 17, 2006 5:36 AM To: Sharma, Karun Cc: openfabrics-ewg at openib.org; openib-general at openib.org Subject: Re: [openib-general] [openfabrics-ewg] OFED 1.1 release schedule Hi, Let me first explain why the current OFED release does not support SRP-HA on RHEL4. SRP-HA is using Device Mapper multipath. Multipath prerequisites include udev of higher version than 050. RHEL4 distributions includes udev 039. udev is an important part of the distribution and I do not think that users will be ready to upgrade it in order to have SRP-HA. To my best knowledge the main reason that multipath needs at least udev 050 is because it uses the RUN option (This option executes its given parameter after the device exist). Multipath uses the RUN option to execute kpartx that handles the partitions of the new device. SRP-HA also uses the RUN option to execute the multipath command. I have an idea on how to overcome this problem. I want to implement a srp-multipath-daemon. This daemon will get kpartx and multipath requests using a shared message queue. The udev will use the PROGRAM option (That executes its given parameter immediately - before the device exist) to post request to this shared message queue and return immediately. The daemon will wait for the device to create and only than it will execute the commands. In any case this technique will not be a part of the coming OFED release. Ishai -----Original Message----- From: Sharma, Karun [mailto:ksharma at silverstorm.com] Sent: Tuesday, October 17, 2006 5:11 AM To: Tziporet Koren; Open Fabrics Cc: openib Subject: RE: [openfabrics-ewg] OFED 1.1 release schedule The plan is OK with Silverstorm. I have a question though. What are the plans to support SRP-HA feature on RHEL4 kernels ? Thanks Karun -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Oct 19 11:28:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Oct 2006 14:28:18 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_lid_mgr.c: Fix base LID if needed to eliminate potential infinite loop Message-ID: <1161282488.25985.78396.camel@hal.voltaire.com> OpenSM/osm_lid_mgr.c: Fix base LID if needed to eliminate potential infinite loop If SMA responds with base LID of 0xffff in PortInfo, the for loops following the call to osm_port_get_lid_range_ho would cause an infinite loop. Signed-off-by: Hal Rosenstock Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 9907) +++ opensm/osm_lid_mgr.c (working copy) @@ -326,6 +326,16 @@ Exit: return( status ); } +static uint16_t +__osm_trim_lid( + IN uint16_t lid ) +{ + if ((lid > IB_LID_UCAST_END_HO) || + (lid < IB_LID_UCAST_START_HO)) + return 0; + return lid; +} + /********************************************************************** initialize the manager for a new sweep: scans the known persistent assignment and port_lid_tbl @@ -427,6 +437,8 @@ __osm_lid_mgr_init_sweep( p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ) ) { osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid); + disc_min_lid = __osm_trim_lid(disc_min_lid); + disc_max_lid = __osm_trim_lid(disc_max_lid); for (lid = disc_min_lid; lid <= disc_max_lid; lid++) cl_ptr_vector_set(p_discovered_vec, lid, p_port ); /* make sure the guid2lid entry is valid. If not, clean it. */ @@ -795,6 +807,8 @@ __osm_lid_mgr_cleanup_discovered_port_li uint16_t max_tbl_lid = (uint16_t)(cl_ptr_vector_get_size( p_discovered_vec )); osm_port_get_lid_range_ho(p_port, &min_lid, &max_lid); + min_lid = __osm_trim_lid(min_lid); + max_lid = __osm_trim_lid(max_lid); for (lid = min_lid; lid <= max_lid; lid++) { if ((lid < max_tbl_lid ) && From dledford at redhat.com Thu Oct 19 11:54:23 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 19 Oct 2006 14:54:23 -0400 Subject: [openib-general] [openfabrics-ewg] udev on RHEL In-Reply-To: References: Message-ID: <1161284063.2917.594.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 11:24 -0700, Scott Weitzenkamp (sweitzen) wrote: > Doug, what udev does RHEL5 beta have? Any plans to upgrade udev for > RHEL4 U5? RHEL5 currently has something like 0.97 I think. For RHEL4.5, I don't currently have any plans to update udev so drastically as it would require changes in too many packages due to the udev rule file changes. Such a change would have to run through an exception process as it would result in possible package breakage in third party packages mid stream in a stable release, and that's generally very frowned upon. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From johnip at sgi.com Thu Oct 19 12:48:28 2006 From: johnip at sgi.com (John Partridge) Date: Thu, 19 Oct 2006 14:48:28 -0500 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> Message-ID: <4537D68C.4040409@sgi.com> Roland Dreier wrote: > chas> i would guess the read to the mmio region is flushing the > chas> writes to the config register but the read happens "too > chas> soon" after those writes. on a more mundance computer, the > chas> write/write/read probably wouldnt be batched together. > > config writes can't be posted though, so that doesn't make sense. > > - R. Roland, I think Chas is correct, here's why. A trace comparison better explains what is going on. First the FAILING case :- 23406: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23414: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23422: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23430: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23438: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23446: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23454: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23462: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 23470: Split compl. Lower A = 00 Req = (0,0,0) Tag = 0 Comp = (0,2,0) WAIT = 1 (Error completion) 23476: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE) We see here that a Config Write to Reg 01 (PCI_COMMAND) is issued across the bus. We then see the Memory Read to 698 that goes across the bus before the completion of the Config Write to Reg 01. Now the trace of the WORKING (fixed) case :- 23406: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23414: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23422: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23430: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23438: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23446: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE) 23452: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 23460: Split compl. Lower A = 00 D = 00000000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ) 23466: Memory Write A = 00280680 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2 23474: Memory Write A = 00280684 Req = (0,0,0) Tag = 1 BC = 4 WAIT = 2 23482: Memory Write A = 00280688 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2 23490: Memory Write A = 0028068c Req = (0,0,0) Tag = 1 BC = 4 WAIT = 2 23498: Memory Write A = 00280690 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2 23506: Memory Write A = 00280694 Req = (0,0,0) Tag = 1 BC = 4 WAIT = 2 23514: Memory Write A = 00280698 Req = (0,0,0) Tag = 0 BC = 4 WAIT = 2 23522: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 2 WAIT = 2 23530: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 2 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ) 23536: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 23544: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ) 23550: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 23558: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ) 23564: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 23572: Split compl. Lower A = 00 D = 01008000 Req = (0,0,0) Tag = 0 Comp = (0,2,0) BC = 4 WAIT = 1 (Normal completion of READ) Here we see completion of the Config Write to Reg 01 (PCI_COMMAND) before the PIO Memory Read to 698 via the BAR. The mmiowb() in my earlier patch was not actually flushing anything, but rather introduced enough of a delay for the Config Write to complete before the Memory Read to 698 in mthca_cmd_post_hcr(). This is why it appeared to fix the MCA. This is not though, as I thought originally, a platform ordering problem. In the failing case, the driver did the correct thing and "restore" the PCI Config regs in the correct order, the Config Writes went across the PCI-X bus in the correct order, but, the last Config Write did not make it to the HCA across the other side of the card's bridge (PPB) before the Memory Read to 698 via the BAR. Therefor the read failed because HCA's BAR was not yet available (because the Config Write had not completed the PCI_COMMAND yet). The area of code that issues the Memory read (mthca_cmd_post() and in turn mthca_cmd_post_hcr()) is protected by a mutex_lock, but mthca_reset() is not. So, once we issue the last Config Write of PCI_COMMAND and leave mthca_reset() there is nothing to prevent mthca_cmd_post_hcr() issuing the PIO Memory Read of 698 (vi the BAR). So, to make sure that the Config Write to Reg 01 completes before we exit mthca_reset() I suggest we do a Config Read of PCI_COMMAND before we leave mthca_reset(), something like this ? :- --- openib-1.1-buildable-ORIG/drivers/infiniband/hw/mthca/mthca_reset.c 2006-09-20 07:19:24.000000000 -0500 +++ openib-1.1/drivers/infiniband/hw/mthca/mthca_reset.c 2006-10-19 13:55:04.292275707 -0500 @@ -281,6 +281,17 @@ goto out; } + /* + * Perform a "flush" of the pci_write_config_dword() for PCI_COMMAND. + * The PCI_COMMAND to the HCA must complete before we exit mthca_reset() + * or any PIO Memory Reads via the BAR will fail at this point. + */ + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { + err = -ENODEV; + mthca_err(mdev, "Couldn't access HCA memory after restoring, " + "aborting.\n"); + } + out: if (bridge) pci_dev_put(bridge); I have tested this and it does indeed stop the MCA. With the traces and the my above text does this clear up the confusion ? John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From mst at mellanox.co.il Thu Oct 19 12:54:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 21:54:00 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161276587.2917.590.camel@fc6.xsintricity.com> References: <1161276587.2917.590.camel@fc6.xsintricity.com> Message-ID: <20061019195400.GA2674@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > On Thu, 2006-10-19 at 07:09 +0200, Michael S. Tsirkin wrote: > > > > Yeah, I'll look over the diff today. > > > > How does it look? > > Not too far in yet, but the srp_topspin patch in the > kernel_patches/fixes directory appears to have munged whitespace. This patch + srp_2_use_multiple_initiator_ports.patch together are upstream, as commit 01cb9bcbd34b7ba768a7f05375faf43becdb8a60. You can take from there if you like. -- MST From troy at scl.ameslab.gov Thu Oct 19 12:54:42 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Thu, 19 Oct 2006 14:54:42 -0500 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: References: Message-ID: <17A82EE2-E2D3-4AED-A3FD-8E85F31361A1@scl.ameslab.gov> >>> I'm not sure the standard OpenIB NetPIPE runs can reproduce this >>> type of workload. However, we have developed a working PVFS2- >>> NetPIPE module which can reproduce this problem on occassion, if >>> there is interest in further testing this on your end, I can make >>> it available. > Yes. Please send it to me. I'd like to test it. Is it a user space > appl.? > I want to see if we could reach the limit of mappings mentioned above. The netpipe code is available with mercurial by: hg clone http://source.scl.ameslab.gov/hg/netpipe3-pvfs-dev Once you have pvfs2-1.5.1 installed, you should be able to do 'make pvfs' in the netpipe3-pvfs-dev directory and build NPpvfs. The command line arguments I used to reproduce this were: ./NPpvfs -d $PVFS_FILE_PATH -l 32768 -u 268435456 -n 100 -o $NETPIPE_OUTPUT_FILE This is the dmesg log: PU0001 000e0091:ehca_hcall_7arg_7ret HCAD_ERROR opcode=160 ret=fffffffffffffff7 arg1=1000000003000004 arg2=5 arg3=4000f830000 arg4=10000 arg5=e0000000000000 arg6=eb6b6920 arg7=0 out1=0 out2=0 out3=0 out4=0 out5=0 out6=0 out7=0 PU0001 00090454:ehca_reg_mr HCAD_ERROR hipz_alloc_mr failed, h_ret=fffffffffffffff7 hca_hndl=1000000003000004 PU0001 00090478:ehca_reg_mr <<< ret=ffffffea shca=c0000000e796b000 e_mr=c0000000ce865e80 iova_start=000004000f830000 size=10000 acl=7 e_pd=c0000000eb6b6920 pginfo=c0000000dfcb3a70 num_pages=10 num_4k=10 PU0001 00090176:ehca_reg_user_mr <<< rc=ffffffffffffffea pd=c0000000eb6b6920 region=c0000000ce861dd0 mr_access_flags=7 udata=c0000000dfcb3ba0 From mst at mellanox.co.il Thu Oct 19 12:57:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 21:57:19 +0200 Subject: [openib-general] [PATCH repost] IB/srp: destroy/recreate qp/cq at reconnect Message-ID: <20061019195719.GB2674@mellanox.co.il> From: Ishai Rabinovitz This makes SRP more robust in presence of hardware errors and is closer to behaviour suggested by IB spec, reducing chance of stale packets. Signed-off-by: Ishai Rabinovitz Signed-off-by: Michael S. Tsirkin --- Hello, Roland! What do you think about this? Please consider for 2.6.19. For some reason (could be a firmware problem) I got a CQ overrun in SRP. Because of that there was a QP FATAL. Since in srp_reconnect_target we are not destroying the QP, the QP FATAL persists after the reconnect. In order to be able to recover from such situation I suggest we destroy the CQ and the QP in every reconnect. This also corrects a minor spec in-compliance - when srp_reconnect_target is called, srp destroys the CM ID and resets the QP, the new connection will be retried with the same QPN which could theoretically lead to stale packets (for strict spec compliance I think QPN should not be reused till all stale packets are flushed out of the network). Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:23:52.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-08-31 12:30:48.000000000 +0300 @@ -495,10 +495,10 @@ static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; - struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; - struct ib_wc wc; int ret; + struct ib_cq *old_cq; + struct ib_qp *old_qp; spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_LIVE) { @@ -522,17 +522,17 @@ ib_destroy_cm_id(target->cm_id); target->cm_id = new_cm_id; - qp_attr.qp_state = IB_QPS_RESET; - ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); - if (ret) - goto err; - - ret = srp_init_qp(target, target->qp); - if (ret) + old_qp = target->qp; + old_cq = target->cq; + ret = srp_create_target_ib(target); + if (ret) { + target->qp = old_qp; + target->cq = old_cq; goto err; + } - while (ib_poll_cq(target->cq, 1, &wc) > 0) - ; /* nothing */ + ib_destroy_qp(old_qp); + ib_destroy_cq(old_cq); spin_lock_irq(target->scsi_host->host_lock); list_for_each_entry_safe(req, tmp, &target->req_queue, list) -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From dledford at redhat.com Thu Oct 19 13:05:45 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 19 Oct 2006 16:05:45 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061019195400.GA2674@mellanox.co.il> References: <1161276587.2917.590.camel@fc6.xsintricity.com> <20061019195400.GA2674@mellanox.co.il> Message-ID: <1161288347.2917.602.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 21:54 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > > > On Thu, 2006-10-19 at 07:09 +0200, Michael S. Tsirkin wrote: > > > > > > Yeah, I'll look over the diff today. > > > > > > How does it look? > > > > Not too far in yet, but the srp_topspin patch in the > > kernel_patches/fixes directory appears to have munged whitespace. > > This patch + srp_2_use_multiple_initiator_ports.patch together are > upstream, as commit 01cb9bcbd34b7ba768a7f05375faf43becdb8a60. > You can take from there if you like. Nah, I just rediffed it. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From vishal at endace.com Thu Oct 19 13:14:52 2006 From: vishal at endace.com (vishal) Date: Fri, 20 Oct 2006 09:14:52 +1300 Subject: [openib-general] Problems running OFED 1.0 on SUSE 10.1 Enterprise x86_64 Message-ID: <1161288892.9609.15.camel@julia.et.endace.com> Hi, On executing the command 'rpm -ivh *.rpm ' the following error message came up:- error: Failed dependencies: kernel(kernel) = 07562a5eb4f39f26 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(drivers) = 2b3023c350dc4c0d is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(mm) = 5f65a47b6df522f2 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(drivers_base) = d16ee6013971e1f9 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(fs) = b9eb952096a047b7 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(net) = a74767cbe37d3a43 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(net_core) = 98e342e4018a6ece is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(net_ipv4) = c16c730207007344 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(drivers_scsi) = 6b24d4082de97cd1 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(security) = bbcf32817875f2a9 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 kernel(net_sched) = 8b63c168007fc895 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 I have tried the patch provided on https://bugzilla.novell.com/show_bug.cgi?id=199474 , but it didn't help. Has anyone come across this ? Thanks! Vishal From mst at mellanox.co.il Thu Oct 19 13:21:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 19 Oct 2006 22:21:45 +0200 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: References: Message-ID: <20061019202144.GC2674@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() > > Roland Dreier wrote on 10/18/2006 01:55:13 PM: > > I would like to understand why there's a throughput difference with > > scaling turned off, since the NAPI code doesn't change the interrupt > > handling all that much, and should lower the CPU usage if anything. > > That's I am trying to understand now. > Yes, the send side rate dropped significant, cpu usage lower as well. I think its a TCP configuration issue in your setup. With NAPI, we seem to be getting stable high results as reported previously by Eli. Hope to complete testing and report next week. Shirley, can you please post test setup and results? Some ideas: Please note that you need to apply the NAPI patch on both send and recv side in stream benchmark, otherwise one side will be a bottleneck. Please also note that due to factors such as TCP window limits, TX on a single socket is often stalled. To really stress a connection and see benefit from NAPI you should be running multiple socket streams in parallel: either just run multiple instances of netperf/netserver, or use iperf with -P flag. You also should look at the effect of increasing the send/recv socket buffer size. Finally, tuning RX/TX ring size should also be done differently: you might be over-running your queues, so make them bigger for NAPI. -- MST From greg.lindahl at qlogic.com Thu Oct 19 13:26:37 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Thu, 19 Oct 2006 13:26:37 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161272275.2917.560.camel@fc6.xsintricity.com> References: <1161272275.2917.560.camel@fc6.xsintricity.com> Message-ID: <20061019202636.GB3304@greglaptop.rchland.ibm.com> On Thu, Oct 19, 2006 at 11:37:55AM -0400, Doug Ledford wrote: > and ISTR that it > isn't even required by the MPI spec since that leaves behavior of an MPI > app undefined after a fork() call and hence any application written to > depend on undefined behavior is broken by design, Doug, There are many things which MPI programs do that MPI vendors like to support even though they're undefined in the standard. This is one of the minor ones. The situation is similar to F77 extensions, there are a bunch of them you must support to have a commercially viable compiler. -- greg From sashak at voltaire.com Thu Oct 19 13:35:20 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 22:35:20 +0200 Subject: [openib-general] [PATCH 0/5] opensm: lid matrices dumping and loading from file Message-ID: <11612901253393-git-send-email-sashak@voltaire.com> This is extension to existing OpenSM 'file' routing engine. This adds ability to dump switch lid matrices (aka min hops tables) to file and later to load it as is (currently). The usage is similar to unicast forwarding tables loading from dump file (introduced by 'file' routing engine), but new lid matrix file name should be specified by -M or --lid_matrix_file option. For example: $ opensm -R file -M ./opensm-lid-matrix.dump The dump file is named 'opensm-lid-matrix.dump' and will be generated in standard opensm dump directory (/var/log by default) when OSM_LOG_ROUTING logging flag is set. When routing engine 'file' is activated, but dump file is not specified or not cannot be open default lid matrix algorithm will be used. Also in addition to all above the patch 5 adds switch forwarding tables dumper which will generate file compatible with dump_lfts.sh output. This file can be used as input for forwarding tables loading by 'file' routing engine. Both or one of options -U and -M can be specified together with '-R file'. Enjoy! Sasha From sashak at voltaire.com Thu Oct 19 13:35:21 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 22:35:21 +0200 Subject: [openib-general] [PATCH 1/5] opensm: build_lid_matrices() routing engine method In-Reply-To: <11612901253393-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> Message-ID: <11612901362947-git-send-email-sashak@voltaire.com> This adds new method named build_lid_matrices() to OpenSM routing engine structure. When defined this method will be used by ucast_mgr_process() for switch min hop tables (aka lid matrices) preparation. In case of failure default lid matrix creation algorithm will be used. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_opensm.h | 4 + osm/opensm/osm_ucast_mgr.c | 142 ++++++++++++++++++++++----------------- 2 files changed, 84 insertions(+), 62 deletions(-) diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h index 5557dbd..80e4ad7 100644 --- a/osm/include/opensm/osm_opensm.h +++ b/osm/include/opensm/osm_opensm.h @@ -104,6 +104,7 @@ BEGIN_C_DECLS struct osm_routing_engine { const char *name; void *context; + int (*build_lid_matrices)(void *context); int (*ucast_build_fwd_tables)(void *context); int (*ucast_fdb_assign)(void *context); void (*delete)(void *context); @@ -117,6 +118,9 @@ struct osm_routing_engine { * The routing engine context. Will be passed as parameter * to the callback functions. * +* build_lid_matrices +* The callback for lid matrices generation. +* * ucast_build_fwd_tables * The callback for unicast forwarding table generation. * diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 39d6899..2c5f1d1 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -1078,25 +1078,18 @@ __osm_ucast_mgr_process_neighbors( /********************************************************************** **********************************************************************/ -osm_signal_t -osm_ucast_mgr_process( +static void +ucast_mgr_build_lid_matrices( IN osm_ucast_mgr_t* const p_mgr ) { uint32_t i; uint32_t iteration_max; - struct osm_routing_engine *p_routing_eng; - osm_signal_t signal; cl_qmap_t *p_sw_guid_tbl; - OSM_LOG_ENTER( p_mgr->p_log, osm_ucast_mgr_process ); - p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; - p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine; - - CL_PLOCK_EXCL_ACQUIRE( p_mgr->p_lock ); osm_log(p_mgr->p_log, OSM_LOG_VERBOSE, - "osm_ucast_mgr_process: " + "ucast_mgr_build_lid_matrices: " "Starting switches Min Hop Table Assignment\n"); /* @@ -1126,7 +1119,7 @@ osm_ucast_mgr_process( Note that there may not be any switches in the subnet if we are in simple p2p configuration. */ - iteration_max = cl_qmap_count( &p_mgr->p_subn->sw_guid_tbl ); + iteration_max = cl_qmap_count( p_sw_guid_tbl ); /* If there are switches in the subnet, iterate until the lid @@ -1152,78 +1145,103 @@ osm_ucast_mgr_process( __osm_ucast_mgr_process_neighbors, p_mgr ); } osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "osm_ucast_mgr_process: " - "Min-hop propagated in %d steps\n", - i - ); + "ucast_mgr_build_lid_matrices: " + "Min-hop propagated in %d steps\n", i ); + } +} - if (p_routing_eng->ucast_build_fwd_tables && - p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context) == 0) - { - cl_qmap_apply_func( p_sw_guid_tbl, - __osm_ucast_mgr_set_table_cb, p_mgr ); - } /* fallback on the regular path in case of failures */ - else - { - /* - This is the place where we can load pre-defined routes - into the switches fwd_tbl structures. +/********************************************************************** + **********************************************************************/ +osm_signal_t +osm_ucast_mgr_process( + IN osm_ucast_mgr_t* const p_mgr ) +{ + struct osm_routing_engine *p_routing_eng; + osm_signal_t signal = OSM_SIGNAL_DONE; + cl_qmap_t *p_sw_guid_tbl; - Later code will use these values if not configured for - reassignment. - */ - if (p_routing_eng->ucast_fdb_assign) - { - if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "osm_ucast_mgr_process: " - "Invoking \'%s\' function ucast_fdb_assign\n", - p_routing_eng->name ); - } + OSM_LOG_ENTER( p_mgr->p_log, osm_ucast_mgr_process ); - p_routing_eng->ucast_fdb_assign(p_routing_eng->context); + p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl; + p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine; - } - else + CL_PLOCK_EXCL_ACQUIRE( p_mgr->p_lock ); + + /* + If there are no switches in the subnet, we are done. + */ + if (cl_qmap_count( p_sw_guid_tbl ) == 0) + goto Exit; + + if (!p_routing_eng->build_lid_matrices || + p_routing_eng->build_lid_matrices(p_routing_eng->context) != 0) + ucast_mgr_build_lid_matrices(p_mgr); + + if (p_routing_eng->ucast_build_fwd_tables && + p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context) == 0) + { + cl_qmap_apply_func( p_sw_guid_tbl, + __osm_ucast_mgr_set_table_cb, p_mgr ); + } /* fallback on the regular path in case of failures */ + else + { + /* + This is the place where we can load pre-defined routes + into the switches fwd_tbl structures. + + Later code will use these values if not configured for + reassignment. + */ + if (p_routing_eng->ucast_fdb_assign) + { + if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, "osm_ucast_mgr_process: " - "UI pfn was not invoked\n" ); + "Invoking \'%s\' function ucast_fdb_assign\n", + p_routing_eng->name ); } - osm_log( p_mgr->p_log, OSM_LOG_INFO, - "osm_ucast_mgr_process: " - "Min Hop Tables configured on all switches\n" ); + p_routing_eng->ucast_fdb_assign(p_routing_eng->context); - /* - Now that the lid matrices have been built, we can - build and download the switch forwarding tables. - */ - - cl_qmap_apply_func( p_sw_guid_tbl, - __osm_ucast_mgr_process_tbl, p_mgr ); + } + else + { + osm_log( p_mgr->p_log, OSM_LOG_DEBUG, + "osm_ucast_mgr_process: " + "UI pfn was not invoked\n" ); } - /* dump fdb into file: */ - if ( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) - __osm_ucast_mgr_dump_tables( p_mgr ); + osm_log( p_mgr->p_log, OSM_LOG_INFO, + "osm_ucast_mgr_process: " + "Min Hop Tables configured on all switches\n" ); /* - For now don't bother checking if the switch forwarding tables - actually needed updating. The current code will always update - them, and thus leave transactions pending on the wire. - Therefore, return OSM_SIGNAL_DONE_PENDING. + Now that the lid matrices have been built, we can + build and download the switch forwarding tables. */ - signal = OSM_SIGNAL_DONE_PENDING; + + cl_qmap_apply_func( p_sw_guid_tbl, + __osm_ucast_mgr_process_tbl, p_mgr ); } - else - signal = OSM_SIGNAL_DONE; + + /* dump fdb into file: */ + if ( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) + __osm_ucast_mgr_dump_tables( p_mgr ); + + /* + For now don't bother checking if the switch forwarding tables + actually needed updating. The current code will always update + them, and thus leave transactions pending on the wire. + Therefore, return OSM_SIGNAL_DONE_PENDING. + */ + signal = OSM_SIGNAL_DONE_PENDING; osm_log(p_mgr->p_log, OSM_LOG_VERBOSE, "osm_ucast_mgr_process: " "LFT Tables configured on all switches\n"); + Exit: CL_PLOCK_RELEASE( p_mgr->p_lock ); OSM_LOG_EXIT( p_mgr->p_log ); return( signal ); -- 1.4.3.g7768 From sashak at voltaire.com Thu Oct 19 13:35:23 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 22:35:23 +0200 Subject: [openib-general] [PATCH 3/5] opensm: lid matrix dump In-Reply-To: <11612901253393-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> Message-ID: <1161290156640-git-send-email-sashak@voltaire.com> This adds dumping switches lid matrices to the file 'opensm-lid-matrix.dump'. Like other routing related dumps this code will be activated when OSM_LOG_ROUTING logging flag is set. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 32 ++++++++++++++++++++++++++++++++ 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index b4880ad..8f4bfba 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -350,8 +350,40 @@ __osm_ucast_mgr_dump_ucast_routes( /********************************************************************** **********************************************************************/ +static void +ucast_mgr_dump_lid_matrix(cl_map_item_t *p_map_item, void *cxt) +{ + osm_switch_t* p_sw = (osm_switch_t *)p_map_item; + osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; + FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file; + osm_node_t *p_node = osm_switch_get_node_ptr(p_sw); + unsigned max_lid = osm_switch_get_max_lid_ho(p_sw); + unsigned max_port = osm_switch_get_num_ports(p_sw); + uint16_t lid; + uint8_t port; + + fprintf(file, "Switch: guid 0x%016" PRIx64 "\n", + cl_ntoh64(osm_node_get_node_guid(p_node))); + for (lid = 1; lid <= max_lid; lid++) { + osm_port_t *p_port; + fprintf(file, "0x%04x:", lid); + for (port = 0 ; port < max_port ; port++) + fprintf(file, " %02x", + osm_switch_get_hop_count(p_sw, lid, port)); + p_port = cl_ptr_vector_get(&p_mgr->p_subn->port_lid_tbl, lid); + if (p_port) + fprintf(file, " # portguid 0x%" PRIx64, + cl_ntoh64(osm_port_get_guid(p_port))); + fprintf(file, "\n"); + } +} + +/********************************************************************** + **********************************************************************/ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr) { + ucast_mgr_dump_to_file(p_mgr, "opensm-lid-matrix.dump", + ucast_mgr_dump_lid_matrix); if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) ucast_mgr_dump(p_mgr, NULL, __osm_ucast_mgr_dump_path_distribution);; ucast_mgr_dump_to_file(p_mgr, "osm.fdbs", __osm_ucast_mgr_dump_ucast_routes); -- 1.4.3.g7768 From sashak at voltaire.com Thu Oct 19 13:35:22 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 22:35:22 +0200 Subject: [openib-general] [PATCH 2/5] opensm: ucast_mgr dumper unification In-Reply-To: <11612901253393-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> Message-ID: <116129014671-git-send-email-sashak@voltaire.com> This unifies ucsat_mgr dumper. Main goal is to provide infrastructure for different dump file generation using the same routines. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 104 +++++++++++++++++++++++--------------------- 1 files changed, 55 insertions(+), 49 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 2c5f1d1..b4880ad 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -129,10 +129,52 @@ osm_ucast_mgr_init( /********************************************************************** **********************************************************************/ +struct ucast_mgr_dump_context { + osm_ucast_mgr_t *p_mgr; + FILE *file; +}; + +static void +ucast_mgr_dump(osm_ucast_mgr_t *p_mgr, FILE *file, + void (*func)(cl_map_item_t *, void *)) +{ + struct ucast_mgr_dump_context dump_context; + + dump_context.p_mgr = p_mgr; + dump_context.file = file; + + cl_qmap_apply_func(&p_mgr->p_subn->sw_guid_tbl, func, &dump_context ); +} + +static void +ucast_mgr_dump_to_file(osm_ucast_mgr_t *p_mgr, const char *file_name, + void (*func)(cl_map_item_t *, void *)) +{ + char path[1024]; + FILE *file; + + snprintf(path, sizeof(path), "%s/%s", + p_mgr->p_subn->opt.dump_files_dir, file_name); + + file = fopen(path, "w"); + if (!file) { + osm_log(p_mgr->p_log, OSM_LOG_ERROR, + "__osm_ucast_mgr_dump_tables: ERR 3A12: " + "Failed to open fdb file (%s)\n", path ); + return; + } + + ucast_mgr_dump(p_mgr, file, func); + + fclose(file); +} + +/********************************************************************** + **********************************************************************/ static void __osm_ucast_mgr_dump_path_distribution( - IN const osm_ucast_mgr_t* const p_mgr, - IN const osm_switch_t* const p_sw ) + IN cl_map_item_t *p_map_item, + IN void *cxt) { osm_node_t *p_node; osm_node_t *p_remote_node; @@ -141,6 +183,8 @@ __osm_ucast_mgr_dump_path_distribution( uint32_t num_paths; ib_net64_t remote_guid_ho; char line[OSM_REPORT_LINE_SIZE]; + osm_switch_t* p_sw = (osm_switch_t *)p_map_item; + osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution ); @@ -200,9 +244,8 @@ __osm_ucast_mgr_dump_path_distribution( **********************************************************************/ static void __osm_ucast_mgr_dump_ucast_routes( - IN const osm_ucast_mgr_t* const p_mgr, - IN const osm_switch_t* const p_sw, - IN FILE *p_fdbFile ) + IN cl_map_item_t *p_map_item, + IN void *cxt) { const osm_node_t* p_node; uint8_t port_num; @@ -214,6 +257,9 @@ __osm_ucast_mgr_dump_ucast_routes( char line[OSM_REPORT_LINE_SIZE]; uint32_t line_num = 0; boolean_t ui_ucast_fdb_assign_func_defined; + osm_switch_t* p_sw = (osm_switch_t *)p_map_item; + osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; + FILE *p_fdbFile = ((struct ucast_mgr_dump_context *)cxt)->file; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes ); @@ -304,51 +350,11 @@ __osm_ucast_mgr_dump_ucast_routes( /********************************************************************** **********************************************************************/ -struct ucast_mgr_dump_context { - osm_ucast_mgr_t *p_mgr; - FILE *file; -}; - -static void -__osm_ucast_mgr_dump_table( - IN cl_map_item_t* const p_map_item, - IN void* context ) +static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr) { - osm_switch_t* const p_sw = (osm_switch_t*)p_map_item; - struct ucast_mgr_dump_context *cxt = context; - - if( osm_log_is_active( cxt->p_mgr->p_log, OSM_LOG_DEBUG ) ) - __osm_ucast_mgr_dump_path_distribution( cxt->p_mgr, p_sw ); - __osm_ucast_mgr_dump_ucast_routes( cxt->p_mgr, p_sw, cxt->file ); -} - -static void __osm_ucast_mgr_dump_tables( - IN osm_ucast_mgr_t *p_mgr ) -{ - char file_name[1024]; - struct ucast_mgr_dump_context dump_context; - FILE *file; - - strncpy(file_name, p_mgr->p_subn->opt.dump_files_dir, sizeof(file_name) - 1); - strncat(file_name, "/osm.fdbs", sizeof(file_name) - strlen(file_name) - 1); - - file = fopen(file_name, "w"); - if (!file) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "__osm_ucast_mgr_dump_tables: ERR 3A12: " - "Failed to open fdb file (%s)\n", - file_name ); - return; - } - - dump_context.p_mgr = p_mgr; - dump_context.file = file; - - cl_qmap_apply_func( &p_mgr->p_subn->sw_guid_tbl, - __osm_ucast_mgr_dump_table, &dump_context ); - - fclose(file); + if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) + ucast_mgr_dump(p_mgr, NULL, __osm_ucast_mgr_dump_path_distribution);; + ucast_mgr_dump_to_file(p_mgr, "osm.fdbs", __osm_ucast_mgr_dump_ucast_routes); } /********************************************************************** -- 1.4.3.g7768 From sashak at voltaire.com Thu Oct 19 13:35:25 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 22:35:25 +0200 Subject: [openib-general] [PATCH 5/5] opensm: dump_lfts.sh compatible dumper for OpenSM In-Reply-To: <11612901253393-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> Message-ID: <11612901763767-git-send-email-sashak@voltaire.com> This is bonus - switch forwarding tables dump compatible with output produced by dump_lfts.sh and which can be used as input for unicast forwarding tables loader (with -R 'file' -U ). The dump file name is 'opensm-lfts.sh' and will be generate if OSM_LOG_ROUTING logging flag is set. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 44 insertions(+), 0 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 8f4bfba..7e1e7cc 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -380,10 +380,54 @@ ucast_mgr_dump_lid_matrix(cl_map_item_t /********************************************************************** **********************************************************************/ +static void +ucast_mgr_dump_lfts(cl_map_item_t *p_map_item, void *cxt) +{ + osm_switch_t* p_sw = (osm_switch_t *)p_map_item; + osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; + FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file; + osm_node_t *p_node = osm_switch_get_node_ptr(p_sw); + unsigned max_lid = osm_switch_get_max_lid_ho(p_sw); + unsigned max_port = osm_switch_get_num_ports(p_sw); + uint16_t lid; + uint8_t port; + + fprintf(file, "Unicast lids [0x0-0x%x] of switch Lid %u guid 0x%016" + PRIx64 " (\'%s\'):\n", + max_lid, osm_node_get_base_lid(p_node, 0), + cl_ntoh64(osm_node_get_node_guid(p_node)), + p_node->node_desc.description); + for (lid = 0; lid <= max_lid; lid++) { + osm_port_t *p_port; + port = osm_switch_get_port_by_lid(p_sw, lid); + + if (port >= max_port) + continue; + + fprintf(file, "0x%04x %03u # ", lid, port); + + p_port = cl_ptr_vector_get(&p_mgr->p_subn->port_lid_tbl, lid); + if (p_port) { + p_node = osm_port_get_parent_node(p_port); + fprintf(file, "%s portguid 0x016%" PRIx64 ": \'%s\'", + ib_get_node_type_str(osm_node_get_type(p_node)), + cl_ntoh64(osm_port_get_guid(p_port)), + p_node->node_desc.description); + } + else + fprintf(file, "unknown node and type"); + fprintf(file, "\n"); + } + fprintf(file, "%u lids dumped\n", max_lid); +} + +/********************************************************************** + **********************************************************************/ static void __osm_ucast_mgr_dump_tables(osm_ucast_mgr_t *p_mgr) { ucast_mgr_dump_to_file(p_mgr, "opensm-lid-matrix.dump", ucast_mgr_dump_lid_matrix); + ucast_mgr_dump_to_file(p_mgr, "opensm-lfts.dump", ucast_mgr_dump_lfts); if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) ucast_mgr_dump(p_mgr, NULL, __osm_ucast_mgr_dump_path_distribution);; ucast_mgr_dump_to_file(p_mgr, "osm.fdbs", __osm_ucast_mgr_dump_ucast_routes); -- 1.4.3.g7768 From sashak at voltaire.com Thu Oct 19 13:35:24 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 22:35:24 +0200 Subject: [openib-general] [PATCH 4/5] opensm: lid matrix file loader In-Reply-To: <11612901253393-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> Message-ID: <1161290166631-git-send-email-sashak@voltaire.com> This adds lid matrices dump file parser and loader. It is part of 'file' routing engine, the file name should be specified in OpenSM command line with -M or --lid_matrix_file option (no default value). Example of valid usage is: opensm -R file -M ./opensm-lid-matrix.dump The file format is compatible with one generated by OpenSM ucast dumper (will be created when OSM_LOG_ROUTING logging flag is set). When specified file does not exist or in case of parser/loader failure default lid matrix calculation algorithm will be used. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_subnet.h | 5 + osm/opensm/main.c | 13 +++- osm/opensm/osm_subnet.c | 10 +++ osm/opensm/osm_ucast_file.c | 166 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 193 insertions(+), 1 deletions(-) diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 36a4b55..abd3c56 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -276,6 +276,7 @@ typedef struct _osm_subn_opt boolean_t sweep_on_trap; osm_testability_modes_t testability_mode; char * routing_engine_name; + char * lid_matrix_dump_file; char * ucast_dump_file; char * updn_guid_file; boolean_t exit_on_fatal; @@ -431,6 +432,10 @@ typedef struct _osm_subn_opt * Name of used routing engine * (other than default Min Hop Algorithm) * +* lid_matrix_dump_file +* Name of the lid matrix dump file from where switch +* lid matrices (min hops tables) will be loaded +* * ucast_dump_file * Name of the unicast routing dump file from where switch * forwarding tables will be loaded diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 90f3dbd..729702a 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -177,6 +177,11 @@ show_usage(void) "--routing_engine \n" " This option chooses routing engine instead of Min Hop\n" " algorithm (default). Supported engines: updn, file\n\n"); + printf( "-M\n" + "--lid_matrix_file \n" + " This option specifies name of the lid matrix dump file\n" + " from where switch lid matrices (min hops tables will be\n" + " loaded.\n\n"); printf( "-U\n" "--ucast_file \n" " This option specifies name of the unicast dump file\n" @@ -531,7 +536,7 @@ #endif boolean_t cache_options = FALSE; char *ignore_guids_file_name = NULL; uint32_t val; - const char * const short_option = "i:f:ed:g:l:L:s:t:a:R:U:P:NQvVhorcyx"; + const char * const short_option = "i:f:ed:g:l:L:s:t:a:R:M:U:P:NQvVhorcyx"; /* In the array below, the 2nd parameter specified the number @@ -565,6 +570,7 @@ #endif { "priority", 1, NULL, 'p'}, { "smkey", 1, NULL, 'k'}, { "routing_engine",1, NULL, 'R'}, + { "lid_matrix_file",1, NULL, 'M'}, { "ucast_file" ,1, NULL, 'U'}, { "add_guid_file", 1, NULL, 'a'}, { "cache-options", 0, NULL, 'c'}, @@ -795,6 +801,11 @@ #endif printf(" Activate \'%s\' routing engine\n", optarg); break; + case 'M': + opt.lid_matrix_dump_file = optarg; + printf(" Lid matrix dump file is \'%s\'\n", optarg); + break; + case 'U': opt.ucast_dump_file = optarg; printf(" Ucast dump file is \'%s\'\n", optarg); diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index 2b25fc7..9c9acee 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -492,6 +492,7 @@ osm_subn_set_default_opt( p_opt->sweep_on_trap = TRUE; p_opt->testability_mode = OSM_TEST_MODE_NONE; p_opt->routing_engine_name = NULL; + p_opt->lid_matrix_dump_file = NULL; p_opt->ucast_dump_file = NULL; p_opt->updn_guid_file = NULL; p_opt->exit_on_fatal = TRUE; @@ -950,6 +951,10 @@ osm_subn_parse_conf_file( "dump_files_dir", p_key, p_val, &p_opts->dump_files_dir); + __osm_subn_opts_unpack_charp( + "lid_matrix_dump_file", + p_key, p_val, &p_opts->lid_matrix_dump_file); + __osm_subn_opts_unpack_charp( "ucast_dump_file", p_key, p_val, &p_opts->ucast_dump_file); @@ -1123,6 +1128,11 @@ osm_subn_write_conf_file( "# Routing engine\n" "routing_engine %s\n\n", p_opts->routing_engine_name); + if (p_opts->lid_matrix_dump_file) + fprintf( opts_file, + "# Lid matrix dump file name\n" + "lid_matrix_dump_file %s\n\n", + p_opts->lid_matrix_dump_file); if (p_opts->ucast_dump_file) fprintf( opts_file, "# Ucast dump file name\n" diff --git a/osm/opensm/osm_ucast_file.c b/osm/opensm/osm_ucast_file.c index 446c243..a9ea2c0 100644 --- a/osm/opensm/osm_ucast_file.c +++ b/osm/opensm/osm_ucast_file.c @@ -108,6 +108,28 @@ static void add_path(osm_opensm_t * p_os (osm_switch_get_node_ptr(p_sw)))); } +static void add_lid_hops(osm_opensm_t *p_osm, osm_switch_t *p_sw, + uint16_t lid, ib_net64_t guid, + uint8_t hops[], unsigned len) +{ + uint16_t new_lid; + unsigned i; + + new_lid = guid ? remap_lid(p_osm, lid, guid) : lid; + if (len > osm_switch_get_num_ports(p_sw)) + len = osm_switch_get_num_ports(p_sw); + + for (i = 0 ; i < len ; i++) + osm_switch_set_hops(p_sw, lid, i, hops[i]); +} + +static void clean_sw_lid_matrix(cl_map_item_t* const p_map_item, void *context) +{ + osm_switch_t * const p_sw = (osm_switch_t *)p_map_item; + + osm_lid_matrix_clear(&p_sw->lmx); +} + static void clean_sw_fwd_table(cl_map_item_t* const p_map_item, void *context) { osm_switch_t * const p_sw = (osm_switch_t *)p_map_item; @@ -254,9 +276,153 @@ static int do_ucast_file_load(void *cont return 0; } +static int do_lid_matrix_file_load(void *context) +{ + char line[1024]; + uint8_t hops[256]; + char *file_name; + FILE *file; + ib_net64_t guid; + osm_opensm_t *p_osm = context; + osm_switch_t *p_sw; + unsigned lineno; + uint16_t lid; + + file_name = p_osm->subn.opt.lid_matrix_dump_file; + if (!file_name) { + osm_log(&p_osm->log, OSM_LOG_ERROR|OSM_LOG_SYS, + "do_lid_matrix_file_load: ERR 6304: " + "lid matrix file name is not defined; " + "using default lid matrix generation algorithm\n"); + return -1; + } + + file = fopen(file_name, "r"); + if (!file) { + osm_log(&p_osm->log, OSM_LOG_ERROR|OSM_LOG_SYS, + "do_do_lid_matrix_file_load: ERR 6305: " + "cannot open lid matrix file \'%s\'; " + "using default lid matrix generation algorithm\n", + file_name); + return -1; + } + + cl_qmap_apply_func(&p_osm->subn.sw_guid_tbl, clean_sw_lid_matrix, NULL); + + lineno = 0; + p_sw = NULL; + + while (fgets(line, sizeof(line) - 1, file) != NULL) { + char *p, *q; + lineno++; + + p = line; + while (isspace(*p)) + p++; + + if (*p == '#') + continue; + + if (!strncmp(p, "Switch", 6)) { + q = strstr(p, " guid 0x"); + if (!q) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " + "cannot parse switch definition\n", + file_name, lineno); + return -1; + } + p = q + 8; + guid = strtoull(p, &q, 16); + if (q == p || !isspace(*q)) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " + "cannot parse switch guid: \'%s\'\n", + file_name, lineno, p); + return -1; + } + guid = cl_hton64(guid); + + p_sw = (osm_switch_t *)cl_qmap_get(&p_osm->subn.sw_guid_tbl, + guid); + if (!p_sw || + p_sw == (osm_switch_t *)cl_qmap_end(&p_osm->subn.sw_guid_tbl)) { + p_sw = NULL; + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "do_lid_matrix_file_load: " + "cannot find switch %016" PRIx64 "\n", + cl_ntoh64(guid)); + continue; + } + } else if (p_sw && !strncmp(p, "0x", 2)) { + unsigned long num; + unsigned len = 0; + + memset(hops, 0xff, sizeof(hops)); + + p += 2; + num = strtoul(p, &q, 16); + if (num > 0xffff || q == p || + (*q != ':' && !isspace(*q))) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " + "cannot parse lid: \'%s\'\n", + file_name, lineno, p); + return -1; + } + lid = num; + p = q; + while (isspace(*p) || *p == ':') + p++; + while (len < 256 && *p && *p != '#') { + num = strtoul(p, &q, 16); + if (num > 0xff || q == p) { + osm_log(&p_osm->log, OSM_LOG_ERROR, + "PARSE ERROR: %s:%u: " + "cannot parse hops number: \'%s\'\n", + file_name, lineno, p); + return -1; + } + hops[len++] = num; + p = q; + while (isspace(*p)) + p++; + } + /* additionally try to exract guid */ + q = strstr(p, " portguid 0x"); + if (!q) { + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "PARSE WARNING: %s:%u: " + "cannot find port guid " + "(maybe broken dump): \'%s\'\n", + file_name, lineno, p); + guid = 0; + } + else { + p = q + 12; + guid = strtoull(p, &q, 16); + if (q == p || !isspace(*q)) { + osm_log(&p_osm->log, OSM_LOG_VERBOSE, + "PARSE WARNING: %s:%u: " + "cannot parse port guid " + "(maybe broken dump): \'%s\'\n", + file_name, lineno, p); + guid = 0; + } + } + guid = cl_hton64(guid); + add_lid_hops(p_osm, p_sw, lid, guid, hops, len); + } + } + + fclose(file); + return 0; +} + int osm_ucast_file_setup(osm_opensm_t * p_osm) { p_osm->routing_engine.context = (void *)p_osm; + p_osm->routing_engine.build_lid_matrices = do_lid_matrix_file_load; p_osm->routing_engine.ucast_build_fwd_tables = do_ucast_file_load; return 0; } -- 1.4.3.g7768 From rsalmon at tulane.edu Thu Oct 19 13:54:23 2006 From: rsalmon at tulane.edu (Rene Salmon) Date: Thu, 19 Oct 2006 15:54:23 -0500 Subject: [openib-general] Problems running OFED 1.0 on SUSE 10.1 Enterprise x86_64 In-Reply-To: <1161288892.9609.15.camel@julia.et.endace.com> Message-ID: Hi, I ran into the same problem. What I did to get past this was: mv /usr/lib/rpm/find-provides.ksyms /usr/lib/rpm/find-provides.ksyms.org mv /usr/lib/rpm/find-requires.ksyms /usr/lib/rpm/find-requires.ksyms.org mv /usr/lib/rpm/find-supplements.ksyms /usr/lib/rpm/find-supplements.ksyms.org When the rpms for OFED get built on suse 10.1 the above *.ksyms scripts get executed and that is why you get the messages about missing ksyms. To get OFED to install on my suse 10.1 box all I did was rename the scripts above then recompile the OFED RPMS and everything works fine now. Hope it helps. Rene On 10/19/06 3:14 PM, "vishal" wrote: > Hi, > > On executing the command 'rpm -ivh *.rpm ' the following error > message came up:- > > error: Failed dependencies: > kernel(kernel) = 07562a5eb4f39f26 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(drivers) = 2b3023c350dc4c0d is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(mm) = 5f65a47b6df522f2 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(drivers_base) = d16ee6013971e1f9 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(fs) = b9eb952096a047b7 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(net) = a74767cbe37d3a43 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(net_core) = 98e342e4018a6ece is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(net_ipv4) = c16c730207007344 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(drivers_scsi) = 6b24d4082de97cd1 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(security) = bbcf32817875f2a9 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > kernel(net_sched) = 8b63c168007fc895 is needed by > kernel-ib-1.0-2.6.16.21_0.8_smp.x86_64 > > > I have tried the patch provided on > https://bugzilla.novell.com/show_bug.cgi?id=199474 , but it didn't help. > Has anyone come across this ? > > Thanks! > > Vishal > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -- Rene Salmon Tulane University Center for Computational Science http://www.ccs.tulane.edu rsalmon at tulane.edu Tel 504-862-8393 Fax 504-862-8392 From sashak at voltaire.com Thu Oct 19 14:26:39 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 19 Oct 2006 23:26:39 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types Message-ID: <20061019212639.GA24600@sashak.voltaire.com> This makes local functions static and moves definitions of locally used types to .c file. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_opensm.h | 1 - osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- osm/opensm/osm_ucast_updn.c | 81 +++++++- 3 files changed, 70 insertions(+), 361 deletions(-) diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h index cb216a4..5557dbd 100644 --- a/osm/include/opensm/osm_opensm.h +++ b/osm/include/opensm/osm_opensm.h @@ -62,7 +62,6 @@ #include #include #include #include -#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h index 4609e1b..c2a4376 100644 --- a/osm/include/opensm/osm_ucast_updn.h +++ b/osm/include/opensm/osm_ucast_updn.h @@ -71,363 +71,14 @@ BEGIN_C_DECLS /* ENUM TypeDefs */ /* /////////////////////////// */ -/* -* DESCRIPTION -* This enum respresent available directions of arcs in the graph -* SYNOPSIS -*/ -typedef enum _updn_switch_dir -{ - UP = 0, - DOWN -} updn_switch_dir_t; - -/* - * TYPE DEFINITIONS - * UP - * Current switch direction in propogating the subnet is up - * DOWN - * Current switch direction in propogating the subnet is down - * - */ - -/* -* DESCRIPTION -* This enum respresent available states in the UPDN algorithm -* SYNOPSIS -*/ -typedef enum _updn_state -{ - UPDN_INIT = 0, - UPDN_RANK, - UPDN_MIN_HOP_CALC, -} updn_state_t; - -/* - * TYPE DEFINITIONS - * UPDN_INIT - loading the package but still not performing anything - * UPDN_RANK - post ranking algorithm - * UPDN_MIN_HOP_CALC - post min hop table calculation - */ - /* ////////////////////////////////// */ /* Struct TypeDefs */ /* ///////////////////////////////// */ -/****s* UPDN: Rank element/updn_rank_t -* NAME -* updn_rank_t -* -* DESCRIPTION -* This object represents a rank type element in a list -* -* The updn_rank_t object should be treated as opaque and should -* be manipulated only through the provided functions. -* -* SYNOPSIS -*/ - -typedef struct _updn_rank -{ - cl_map_item_t map_item; - uint8_t rank; -} updn_rank_t; - -/* -* FIELDS -* map_item -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! -* -* rank -* Rank value of this node -* -*/ - -/****s* UPDN: Histogram element/updn_hist_t -* NAME -* updn_hist_t -* -* DESCRIPTION -* This object represents a histogram type element in a list -* -* The updn_hist_t object should be treated as opaque and should -* be manipulated only through the provided functions. -* -* SYNOPSIS -*/ - -typedef struct _updn_hist -{ - cl_map_item_t map_item; - uint32_t bar_value; -} updn_hist_t; - -/* -* FIELDS -* map_item -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! -* -* bar_value -* The number of occurences of the same hop value -* -*/ - -typedef struct _updn_next_step -{ - updn_switch_dir_t state; - osm_switch_t *p_sw; -} updn_next_step_t; - -/*****s* updn: updn/updn_input_t -* NAME updn_t -* -* -* DESCRIPTION -* updn input fields structure. -* -* SYNOPSIS -*/ - -typedef struct _updn_input -{ - uint32_t num_guids; - uint64_t * guid_list; -} updn_input_t; - -/* -* FIELDS -* num_guids -* number of guids given at the UI -* -* guid_list -* guids specified as an array (converted from a list given in the UI) -* -* -* SEE ALSO -* -*********/ - -/*****s* updn: updn/updn_t -* NAME updn_t -* -* -* DESCRIPTION -* updn structure. -* -* SYNOPSIS -*/ - -typedef struct _updn -{ - updn_state_t state; - boolean_t auto_detect_root_nodes; - cl_qmap_t guid_rank_tbl; - updn_input_t updn_ucast_reg_inputs; - cl_list_t * p_root_nodes; -} updn_t; - -/* -* FIELDS -* state -* state of the updn algorithm which basically should pass through Init -* - Ranking - UpDn algorithm -* -* guid_rank_tbl -* guid 2 rank mapping vector , indexed by guid in network order -* -* -* SEE ALSO -* -*********/ - /* ////////////////////////////// */ /* Function */ /* ////////////////////////////// */ -/***f** OpenSM: Updn/updn_construct -* NAME -* updn_construct -* -* DESCRIPTION -* Allocation of updn_t struct -* -* SYNOPSIS -*/ - -updn_t* -updn_construct(void); - -/* -* PARAMETERS -* -* -* RETURN VALUE -* Return a pointer to an updn struct. Null if fails to do so. -* -* NOTES -* First step of the creation of updn_t -*/ - -/****s* OpenSM: Updn/updn_destroy -* NAME -* updn_destroy -* -* DESCRIPTION -* release of updn_t struct -* -* SYNOPSIS -*/ - -void -updn_destroy( - IN updn_t* const p_updn ); - -/* -* PARAMETERS -* p_updn -* A pointer to the updn_t struct that is goining to be released -* -* RETURN VALUE -* -* NOTES -* Final step of the releasing of updn_t -* -* SEE ALSO -* updn_construct -*********/ - -/****f* OpenSM: Updn/updn_init -* NAME -* updn_init -* -* DESCRIPTION -* Initialization of an updn_t struct -* -* SYNOPSIS -*/ -cl_status_t -updn_init( - IN updn_t* const p_updn ); - -/* -* PARAMETERS -* p_updn -* A pointer to the updn_t struct that is goining to be initilized -* -* RETURN VALUE -* The status of the function. -* -* NOTES -* -* SEE ALSO -* updn_construct -********/ - -/****** OpenSM: Updn/updn_subn_rank -* NAME -* updn_subn_rank -* -* DESCRIPTION -* This function ranks the subnet for credit loop free algorithm -* -* SYNOPSIS -*/ -int -updn_subn_rank( - IN uint64_t root_guid , - IN uint8_t base_rank, - IN updn_t* p_updn ); - -/* -* PARAMETERS -* p_subn -* [in] Pointer to a Subnet object to construct. -* -* base_rank -* [in] The base ranking value (lowest value) -* -* p_updn -* [in] Pointer to updn structure which includes state & lid2rank table -* -* RETURN VALUE -* This function returns 0 when rankning has succeded , otherwise 1. -******/ - -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table -* NAME -* osm_subn_set_up_down_min_hop_table -* -* DESCRIPTION -* This function set min hop table of all switches by BFS through each -* port guid at the subnet using ranking done before. -* -* SYNOPSIS -*/ - -int -osm_subn_set_up_down_min_hop_table( - IN updn_t* p_updn ); - -/* -* PARAMETERS -* p_updn -* [in] Pointer to updn structure which includes state & lid2rank table -* -* RETURN VALUE -* This function returns 0 when rankning has succeded , otherwise 1. -******/ - -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table -* NAME -* osm_subn_calc_up_down_min_hop_table -* -* DESCRIPTION -* This function perform ranking and setting of all switches' min hop table -* by UP DOWN algorithm -* -* SYNOPSIS -*/ - -int -osm_subn_calc_up_down_min_hop_table( - IN uint32_t num_guids, - IN uint64_t* guid_list, - IN updn_t* p_updn ); - -/* -* PARAMETERS -* -* guid_list -* [in] Guid list from which to start ranking . -* -* p_updn -* [in] Pointer to updn structure which includes state & lid2rank table -* RETURN VALUE -* This function returns 0 when rankning has succeded , otherwise 1. -******/ - -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop -* NAME -* osm_updn_find_root_nodes_by_min_hop -* -* DESCRIPTION -* This function perform auto identification of root nodes for UPDN ranking phase -* -* SYNOPSIS -*/ -int -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); - -/* -* PARAMETERS -* p_root_nodes_list -* -* [out] Pointer to the root nodes list found in the subnet -* -* RETURN VALUE -* This function returns 0 when auto identification had succeeded -******/ - END_C_DECLS #endif /* _OSM_UCAST_UPDN_H_ */ diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 86ac3ad..0121e6e 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -55,8 +55,62 @@ #include #include #include #include -#include -#include + +/* //////////////////////////// */ +/* Local types */ +/* /////////////////////////// */ + +/* direction */ +typedef enum _updn_switch_dir +{ + UP = 0, + DOWN +} updn_switch_dir_t; + +/* This enum respresent available states in the UPDN algorithm */ +typedef enum _updn_state +{ + UPDN_INIT = 0, + UPDN_RANK, + UPDN_MIN_HOP_CALC, +} updn_state_t; + +/* Rank value of this node */ +typedef struct _updn_rank +{ + cl_map_item_t map_item; + uint8_t rank; +} updn_rank_t; + +/* Histogram element - the number of occurences of the same hop value */ +typedef struct _updn_hist +{ + cl_map_item_t map_item; + uint32_t bar_value; +} updn_hist_t; + +typedef struct _updn_next_step +{ + updn_switch_dir_t state; + osm_switch_t *p_sw; +} updn_next_step_t; + +/* guids list */ +typedef struct _updn_input +{ + uint32_t num_guids; + uint64_t * guid_list; +} updn_input_t; + +/* updn structure */ +typedef struct _updn +{ + updn_state_t state; + boolean_t auto_detect_root_nodes; + cl_qmap_t guid_rank_tbl; + updn_input_t updn_ucast_reg_inputs; + cl_list_t * p_root_nodes; +} updn_t; /* ///////////////////////////////// */ @@ -65,6 +119,11 @@ #include /* This var is predefined and initialized */ extern osm_opensm_t osm; +/* ///////////////////////////////// */ +/* Statics */ +/* ///////////////////////////////// */ +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); + /********************************************************************** **********************************************************************/ /* This function returns direction based on rank and guid info of current & @@ -471,7 +530,7 @@ __updn_bfs_by_node( /********************************************************************** **********************************************************************/ -void +static void updn_destroy( IN updn_t* const p_updn ) { @@ -508,7 +567,7 @@ updn_destroy( /********************************************************************** **********************************************************************/ -updn_t* +static updn_t* updn_construct(void) { updn_t* p_updn; @@ -523,7 +582,7 @@ updn_construct(void) /********************************************************************** **********************************************************************/ -cl_status_t +static cl_status_t updn_init( IN updn_t* const p_updn ) { @@ -635,7 +694,7 @@ updn_init( **********************************************************************/ /* NOTE : PLS check if we need to decide that the first */ /* rank is a SWITCH for BFS purpose */ -int +static int updn_subn_rank( IN uint64_t root_guid, IN uint8_t base_rank, @@ -795,7 +854,7 @@ updn_subn_rank( /********************************************************************** **********************************************************************/ -int +static int osm_subn_set_up_down_min_hop_table( IN updn_t* p_updn ) { @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( /********************************************************************** **********************************************************************/ -int +static int osm_subn_calc_up_down_min_hop_table( IN uint32_t num_guids, IN uint64_t * guid_list, @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( /********************************************************************** **********************************************************************/ /* UPDN callback function */ -int __osm_updn_call( +static int __osm_updn_call( void *ctx ) { OSM_LOG_ENTER(&(osm.log), __osm_updn_call); @@ -969,7 +1028,7 @@ int __osm_updn_call( /********************************************************************** **********************************************************************/ /* UPDN convert cl_list to guid array in updn struct */ -void __osm_updn_convert_list2array( +static void __osm_updn_convert_list2array( IN updn_t * p_updn ) { uint32_t i = 0, max_num = 0; @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( /********************************************************************** **********************************************************************/ /* Find Root nodes automatically by Min Hop Table info */ -int +static int osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ) { -- 1.4.3.g7768 From xma at us.ibm.com Thu Oct 19 14:37:40 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 19 Oct 2006 14:37:40 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: <20061019202144.GC2674@mellanox.co.il> Message-ID: Thanks Michael for all these tips. I have tried several suggestions as you proposed here. I couldn't see performance any better. The TCP_RR is dropped to 472 trans/s from about 18,000 trans/s , and TCP_STREAM BW is dropped to 1/3 as before ( ehca + scaling code) with same TCP configuration, send queue size=recve queue size = 1K. Thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From MEDER at de.ibm.com Thu Oct 19 14:49:48 2006 From: MEDER at de.ibm.com (Marcus Eder) Date: Thu, 19 Oct 2006 23:49:48 +0200 Subject: [openib-general] [openfabrics-ewg] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? In-Reply-To: Message-ID: Hi Scott, attached please find the test results of OFED 1.1 pre1 in conjunction with IBM eHCA on POWER5. Regarding performance I will provide a link to a published performance whitepaper shortly. Mit freundlichen Gruessen / Kind Regards Marcus Eder InfiniBand Development Mail: IBM Deutschland Entwicklung GmbH, Labor Boeblingen, D3627/7103-19 (009), Schoenaicher Str. 220, D-71032 Boeblingen, Germany Phone: ++49 (0) 7031-16-3202, FAX: -2042 Internet: meder at de.ibm.com "Scott Weitzenkamp (sweitzen)" Sent by: openfabrics-ewg-bounces at openib.org 19.10.2006 02:11 To openfabrics-ewg at openib.org cc openib-general at openib.org Subject [openfabrics-ewg] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? What testing did these companies do with rc7? I'd kinda like to see performance data for the QLogic and IBM HCAs... Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFED_SniffTest_report_IBMeHCA_1.1_pre1.xls Type: application/vnd.ms-excel Size: 2046976 bytes Desc: not available URL: From rdreier at cisco.com Thu Oct 19 15:30:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 15:30:07 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: <20061019202144.GC2674@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 19 Oct 2006 22:21:45 +0200") References: <20061019202144.GC2674@mellanox.co.il> Message-ID: OK, as promised I redid the request notify patches according to Michael's suggestion to add a new flag. I think I like this a lot better -- I'll send out the new patches as replies to this email for comments. - R. From rdreier at cisco.com Thu Oct 19 15:33:50 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 15:33:50 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe missed event" hint from ib_req_notify_cq() In-Reply-To: (Roland Dreier's message of "Thu, 19 Oct 2006 15:30:07 -0700") References: <20061019202144.GC2674@mellanox.co.il> Message-ID: The semantics defined by the InfiniBand specification say that completion events are only generated when a completions is added to a completion queue (CQ) after completion notification is requested. In other words, this means that the following race is possible: while (CQ is not empty) ib_poll_cq(CQ); // new completion is added after while loop is exited ib_req_notify_cq(CQ); // no event is generated for the existing completion To close this race, the IB spec recommends doing another poll of the CQ after requesting notification. However, it is not always possible to arrange code this way (for example, we have found that NAPI for IPoIB cannot poll after requesting notification). Also, some hardware (eg Mellanox HCAs) actually will generate an event for completions added before the call to ib_req_notify_cq() -- which is allowed by the spec, since there's no way for any upper-layer consumer to know exactly when a completion was really added -- so the extra poll of the CQ is just a waste. Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for ib_req_notify_cq() so that it can return a hint about whether the a completion may have been added before the request for notification. The return value of ib_req_notify_cq() is extended so: < 0 means an error occurred while requesting notification == 0 means notification was requested successfully, and if IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events were missed and it is safe to wait for another event. > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed in. It means that the consumer must poll the CQ again to make sure it is empty to avoid the race described above. We add a flag to enable this behavior rather than turning it on uncondiationally, because checking for missed events may incur significant overhead for some low-level drivers, and consumers that don't care about the results of this test shouldn't be forced to pay for the test. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/amso1100/c2.h | 2 + drivers/infiniband/hw/amso1100/c2_cq.c | 16 +++++++++--- drivers/infiniband/hw/ehca/ehca_iverbs.h | 2 + drivers/infiniband/hw/ehca/ehca_reqs.c | 14 ++++++++-- drivers/infiniband/hw/ehca/ipz_pt_fn.h | 8 ++++++ drivers/infiniband/hw/ipath/ipath_cq.c | 15 ++++++++--- drivers/infiniband/hw/ipath/ipath_verbs.h | 2 + drivers/infiniband/hw/mthca/mthca_cq.c | 12 +++++---- drivers/infiniband/hw/mthca/mthca_dev.h | 4 +-- include/rdma/ib_verbs.h | 40 ++++++++++++++++++++++------- 10 files changed, 85 insertions(+), 30 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h index 1b17dcd..a74a3b8 100644 --- a/drivers/infiniband/hw/amso1100/c2.h +++ b/drivers/infiniband/hw/amso1100/c2.h @@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2 extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index); extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index); extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags); /* CM */ extern int c2_llp_connect(struct iw_cm_id *cm_id, diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 9d7bcc5..ab8f801 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -217,17 +217,19 @@ int c2_poll_cq(struct ib_cq *ibcq, int n return npolled; } -int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) { struct c2_mq_shared __iomem *shared; struct c2_cq *cq; + unsigned long flags; + int ret = 0; cq = to_c2cq(ibcq); shared = cq->mq.peer; - if (notify == IB_CQ_NEXT_COMP) + if ((notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_NEXT_COMP) writeb(C2_CQ_NOTIFICATION_TYPE_NEXT, &shared->notification_type); - else if (notify == IB_CQ_SOLICITED) + else if ((notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED) writeb(C2_CQ_NOTIFICATION_TYPE_NEXT_SE, &shared->notification_type); else return -EINVAL; @@ -241,7 +243,13 @@ int c2_arm_cq(struct ib_cq *ibcq, enum i */ readb(&shared->armed); - return 0; + if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) { + spin_lock_irqsave(&cq->lock, flags); + ret = !c2_mq_empty(&cq->mq); + spin_unlock_irqrestore(&cq->lock, flags); + } + + return ret; } static void c2_free_cq_buf(struct c2_dev *c2dev, struct c2_mq *mq) diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h index 319c39d..9e6f1aa 100644 --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h @@ -135,7 +135,7 @@ int ehca_poll_cq(struct ib_cq *cq, int n int ehca_peek_cq(struct ib_cq *cq, int wc_cnt); -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify); +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags notify_flags); struct ib_qp *ehca_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index b46bda1..f55742c 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -634,11 +634,13 @@ poll_cq_exit0: return ret; } -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags notify_flags) { struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); + unsigned long spl_flags; + int ret = 0; - switch (cq_notify) { + switch (notify_flags & IB_CQ_SOLICITED_MASK) { case IB_CQ_SOLICITED: hipz_set_cqx_n0(my_cq, 1); break; @@ -649,5 +651,11 @@ int ehca_req_notify_cq(struct ib_cq *cq, return -EINVAL; } - return 0; + if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) { + spin_lock_irqsave(&my_cq->spinlock, spl_flags); + ret = ipz_qeit_is_valid(&my_cq->ipz_queue); + spin_unlock_irqrestore(&my_cq->spinlock, spl_flags); + } + + return ret; } diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h index 2f13509..5601755 100644 --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h @@ -140,6 +140,14 @@ static inline void *ipz_qeit_get_inc_val return cqe; } +static inline int ipz_qeit_is_valid(struct ipz_queue *queue) +{ + struct ehca_cqe *cqe = ipz_qeit_get(queue); + u32 cqe_flags = cqe->cqe_flags; + + return cqe_flags >> 7 == (queue->toggle_state & 1); +} + /* * returns and resets Queue Entry iterator * returns address (kv) of first Queue Entry diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c index 87462e0..9582145 100644 --- a/drivers/infiniband/hw/ipath/ipath_cq.c +++ b/drivers/infiniband/hw/ipath/ipath_cq.c @@ -306,17 +306,18 @@ int ipath_destroy_cq(struct ib_cq *ibcq) /** * ipath_req_notify_cq - change the notification type for a completion queue * @ibcq: the completion queue - * @notify: the type of notification to request + * @notify_flags: the type of notification to request * * Returns 0 for success. * * This may be called from interrupt context. Also called by * ib_req_notify_cq() in the generic verbs code. */ -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) { struct ipath_cq *cq = to_icq(ibcq); unsigned long flags; + int ret = 0; spin_lock_irqsave(&cq->lock, flags); /* @@ -324,9 +325,15 @@ int ipath_req_notify_cq(struct ib_cq *ib * any other transitions (see C11-31 and C11-32 in ch. 11.4.2.2). */ if (cq->notify != IB_CQ_NEXT_COMP) - cq->notify = notify; + cq->notify = notify_flags & IB_CQ_SOLICITED_MASK; + + if ((notify_flags & IB_CQ_REPORT_MISSED_EVENTS) && + cq->queue->head != cq->queue->tail) + ret = 1; + spin_unlock_irqrestore(&cq->lock, flags); - return 0; + + return ret; } /** diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index 8039f6e..b0ea548 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -716,7 +716,7 @@ struct ib_cq *ipath_create_cq(struct ib_ int ipath_destroy_cq(struct ib_cq *ibcq); -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags); int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata); diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index 149b369..086d2ff 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -723,11 +723,12 @@ repoll: return err == 0 || err == -EAGAIN ? npolled : err; } -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify) +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags) { __be32 doorbell[2]; - doorbell[0] = cpu_to_be32((notify == IB_CQ_SOLICITED ? + doorbell[0] = cpu_to_be32(((flags & IB_CQ_SOLICITED_MASK) == + IB_CQ_SOLICITED ? MTHCA_TAVOR_CQ_DB_REQ_NOT_SOL : MTHCA_TAVOR_CQ_DB_REQ_NOT) | to_mcq(cq)->cqn); @@ -740,7 +741,7 @@ int mthca_tavor_arm_cq(struct ib_cq *cq, return 0; } -int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) { struct mthca_cq *cq = to_mcq(ibcq); __be32 doorbell[2]; @@ -752,7 +753,8 @@ int mthca_arbel_arm_cq(struct ib_cq *ibc doorbell[0] = ci; doorbell[1] = cpu_to_be32((cq->cqn << 8) | (2 << 5) | (sn << 3) | - (notify == IB_CQ_SOLICITED ? 1 : 2)); + ((flags & IB_CQ_SOLICITED_MASK) == + IB_CQ_SOLICITED ? 1 : 2)); mthca_write_db_rec(doorbell, cq->arm_db); @@ -763,7 +765,7 @@ int mthca_arbel_arm_cq(struct ib_cq *ibc wmb(); doorbell[0] = cpu_to_be32((sn << 28) | - (notify == IB_CQ_SOLICITED ? + (flags == IB_CQ_SOLICITED ? MTHCA_ARBEL_CQ_DB_REQ_NOT_SOL : MTHCA_ARBEL_CQ_DB_REQ_NOT) | cq->cqn); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index fe5cecf..5da2807 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -493,8 +493,8 @@ void mthca_unmap_eq_icm(struct mthca_dev int mthca_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify); -int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify); +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags); +int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags); int mthca_init_cq(struct mthca_dev *dev, int nent, struct mthca_ucontext *ctx, u32 pdn, struct mthca_cq *cq); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 8eacc35..42f59f4 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -428,9 +428,11 @@ struct ib_wc { u8 port_num; /* valid only for DR SMPs on switches */ }; -enum ib_cq_notify { - IB_CQ_SOLICITED, - IB_CQ_NEXT_COMP +enum ib_cq_notify_flags { + IB_CQ_SOLICITED = 1 << 0, + IB_CQ_NEXT_COMP = 1 << 1, + IB_CQ_SOLICITED_MASK = IB_CQ_SOLICITED | IB_CQ_NEXT_COMP, + IB_CQ_REPORT_MISSED_EVENTS = 1 << 2, }; enum ib_srq_attr_mask { @@ -941,7 +943,7 @@ struct ib_device { struct ib_wc *wc); int (*peek_cq)(struct ib_cq *cq, int wc_cnt); int (*req_notify_cq)(struct ib_cq *cq, - enum ib_cq_notify cq_notify); + enum ib_cq_notify_flags flags); int (*req_ncomp_notif)(struct ib_cq *cq, int wc_cnt); struct ib_mr * (*get_dma_mr)(struct ib_pd *pd, @@ -1366,14 +1368,34 @@ int ib_peek_cq(struct ib_cq *cq, int wc_ /** * ib_req_notify_cq - Request completion notification on a CQ. * @cq: The CQ to generate an event for. - * @cq_notify: If set to %IB_CQ_SOLICITED, completion notification will - * occur on the next solicited event. If set to %IB_CQ_NEXT_COMP, - * notification will occur on the next completion. + * @flags: + * Must contain exactly one of %IB_CQ_SOLICITED or %IB_CQ_NEXT_COMP + * to request an event on the next solicited event or next work + * completion at any type, respectively. %IB_CQ_REPORT_MISSED_EVENTS + * may also be |ed in to request a hint about missed events, as + * described below. + * + * Return Value: + * < 0 means an error occurred while requesting notification + * == 0 means notification was requested successfully, and if + * IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events + * were missed and it is safe to wait for another event. In + * this case is it guaranteed that any work completions added + * to the CQ since the last CQ poll will trigger a completion + * notification event. + * > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed + * in. It means that the consumer must poll the CQ again to + * make sure it is empty to avoid missing an event because of a + * race between requesting notification and an entry being + * added to the CQ. This return value means it is possible + * (but not guaranteed) that a work completion has been added + * to the CQ since the last poll without triggering a + * completion notification event. */ static inline int ib_req_notify_cq(struct ib_cq *cq, - enum ib_cq_notify cq_notify) + enum ib_cq_notify_flags flags) { - return cq->device->req_notify_cq(cq, cq_notify); + return cq->device->req_notify_cq(cq, flags); } /** -- 1.4.1 From rjwalsh at pathscale.com Thu Oct 19 16:22:34 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 19 Oct 2006 16:22:34 -0700 Subject: [openib-general] [openfabrics-ewg] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? In-Reply-To: References: Message-ID: <453808BA.3080507@pathscale.com> QLogic have tests OFED 1.1pre1 and are happy with the results. We have tests UD, UC, RC, IPoIB, SDP and uDAPL. Regards, Robert. From sweitzen at cisco.com Thu Oct 19 16:49:30 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 19 Oct 2006 16:49:30 -0700 Subject: [openib-general] [openfabrics-ewg] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? Message-ID: Do you have any performance data? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Robert Walsh > Sent: Thursday, October 19, 2006 4:23 PM > To: openfabrics-ewg at openib.org; openib-general at openib.org > Subject: Re: [openib-general] [openfabrics-ewg] test results > from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? > > QLogic have tests OFED 1.1pre1 and are happy with the > results. We have > tests UD, UC, RC, IPoIB, SDP and uDAPL. > > Regards, > Robert. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sweitzen at cisco.com Thu Oct 19 17:15:16 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 19 Oct 2006 17:15:16 -0700 Subject: [openib-general] OFED-1.1-pre1 is ready Message-ID: Cisco is happy with OFED 1.1 pre1, we only did light testing because no C changes were made. The following bugs have been tested and closed. 273 OFED 1.1 rc7 does not work with Cisco FC Gateway 278 OFED 1.1: two copies of openib.spec in openib-1.1.tgz 268 OFED openibd script references IBG2 267 OFED 1.1 MVAPICH not working on SLES10 due to 127.0.0.2 /etc/hosts entry 271 misleading error message when stopping openibd if SDP in use 277 OFED 1.1 rc7: uninitialized value during IPoIB failover in ipoib_ha.pl 274 OFED 1.1: RENICE_IB_MAD=yes hangs dual-HCA system with dual-port HCAs 249 OFED 1.1: Open MPI 1.1.1 won't compile with Intel C 9.[01] on SLES 10 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren > Sent: Tuesday, October 17, 2006 12:09 PM > To: Open Fabrics > Cc: openib > Subject: [openib-general] OFED-1.1-pre1 is ready > > Hi All, > > OFED 1.1-pre1 is available: > URL: > https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1. > 1-pre1.tgz > > According to the 1.1 release schedule I published yesterday > and got all > partners approval (Qlogic have not answered so I assumed its OK with > them too). > > Each company has 3 days for basic "dead or alive tests" and > making sure > that no blocker issues are still open. > > If everything goes well we will do the release at the end of this > Thursday. > > Components owners: Please remember to update the release notes till > Wednesday. > Documents should be the only component that will be changed from this > pre-release to the official release. > > Tziporet & Vlad > > > ============================================================== > ========== > ======== > Release details: > > BUILD_ID: > OFED-1.1-pre1 > > openib-1.1 (REV=9854) > # User space > https://openib.org/svn/gen2/branches/1.1/src/userspace > Git: > ref: refs/heads/ofed_1_1 > commit 936b9fc0bd1411b52826213a5d89e2ceb4f52a78 > > # MPI > mpi_osu-0.9.7-mlx2.2.0.tgz > openmpi-1.1.1-1.src.rpm > mpitests-2.0-0.src.rpm > > > Fixed bugs: > BUG 273: OFED 1.1 rc7 does not work with Cisco FC Gateway > BUG 274: OFED 1.1: RENICE_IB_MAD=yes hangs dual-HCA system with > dual-port HCAs > BUG 277: OFED 1.1 rc7: uninitialized value during IPoIB failover in > ipoib_ha.pl > BUG 278: OFED 1.1: two copies of openib.spec in openib-1.1.tgz > > Other changes from OFED-1.1-rc7: > - Fix in ibdiagnet to support SM on a switch > - Activate scaling code of ehca as default in the install > - Documentation update > - Dapl: removed SCM from the configuration file dat.conf. > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Thu Oct 19 17:47:27 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 20 Oct 2006 02:47:27 +0200 Subject: [openib-general] [PATCH] opensm: remove obsolete p_report_buf Message-ID: <20061020004727.GH24676@sashak.voltaire.com> This removes obsolete now shared sm->p_report_buf buffer and cleans up related code. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_base.h | 5 -- osm/include/opensm/osm_sm.h | 2 - osm/include/opensm/osm_state_mgr.h | 8 --- osm/include/opensm/osm_ucast_mgr.h | 5 -- osm/opensm/osm_mcast_mgr.c | 11 ++-- osm/opensm/osm_sm.c | 15 +----- osm/opensm/osm_state_mgr.c | 104 ++++++++++------------------------- osm/opensm/osm_ucast_mgr.c | 70 +++++++----------------- 8 files changed, 57 insertions(+), 163 deletions(-) diff --git a/osm/include/opensm/osm_base.h b/osm/include/opensm/osm_base.h index 57dd4fd..20e2cc3 100644 --- a/osm/include/opensm/osm_base.h +++ b/osm/include/opensm/osm_base.h @@ -714,11 +714,6 @@ typedef enum _osm_state_mgr_mode * **********/ -#define OSM_REPORT_BUF_SIZE 0x10000 -#define OSM_REPORT_LINE_SIZE 0x256 -#define OSM_REPORT_BUF_THRESHOLD (OSM_REPORT_BUF_SIZE / OSM_REPORT_LINE_SIZE) - - /****d* OpenSM: Base/osm_sm_signal_t * NAME * osm_sm_signal_t diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h index bc812f3..05b87ac 100644 --- a/osm/include/opensm/osm_sm.h +++ b/osm/include/opensm/osm_sm.h @@ -178,8 +178,6 @@ typedef struct _osm_sm osm_vla_rcv_ctrl_t vla_rcv_ctrl; osm_pkey_rcv_t pkey_rcv; osm_pkey_rcv_ctrl_t pkey_rcv_ctrl; - char* p_report_buf; - } osm_sm_t; /* * FIELDS diff --git a/osm/include/opensm/osm_state_mgr.h b/osm/include/opensm/osm_state_mgr.h index ad4afa0..7aaab58 100644 --- a/osm/include/opensm/osm_state_mgr.h +++ b/osm/include/opensm/osm_state_mgr.h @@ -121,7 +121,6 @@ typedef struct _osm_state_mgr cl_qlist_t idle_time_list; cl_plock_t *p_lock; cl_event_t *p_subnet_up_event; - char *p_report_buf; osm_sm_state_t state; osm_state_mgr_mode_t state_step_mode; osm_signal_t next_stage_signal; @@ -170,9 +169,6 @@ typedef struct _osm_state_mgr * p_subnet_up_event * Pointer to the event to set if/when the subnet comes up. * -* p_report_buf -* Pointer to the large log buffer used for user reports. -* * state * State of the SM. * @@ -380,7 +376,6 @@ osm_state_mgr_init( IN const osm_sm_mad_ctrl_t* const p_mad_ctrl, IN cl_plock_t* const p_lock, IN cl_event_t* const p_subnet_up_event, - IN char* const p_report_buf, IN osm_log_t* const p_log ); /* * PARAMETERS @@ -420,9 +415,6 @@ osm_state_mgr_init( * p_subnet_up_event * [in] Pointer to the event to set if/when the subnet comes up. * -* p_report_buf -* [in] Pointer to the large log buffer used for user reports. -* * p_log * [in] Pointer to the log object. * diff --git a/osm/include/opensm/osm_ucast_mgr.h b/osm/include/opensm/osm_ucast_mgr.h index 0fbfc66..1c10abb 100644 --- a/osm/include/opensm/osm_ucast_mgr.h +++ b/osm/include/opensm/osm_ucast_mgr.h @@ -105,7 +105,6 @@ typedef struct _osm_ucast_mgr osm_req_t *p_req; osm_log_t *p_log; cl_plock_t *p_lock; - char *p_report_buf; } osm_ucast_mgr_t; /* * FIELDS @@ -204,7 +203,6 @@ osm_ucast_mgr_init( IN osm_ucast_mgr_t* const p_mgr, IN osm_req_t* const p_req, IN osm_subn_t* const p_subn, - IN char* const p_report_buf, IN osm_log_t* const p_log, IN cl_plock_t* const p_lock ); /* @@ -218,9 +216,6 @@ osm_ucast_mgr_init( * p_subn * [in] Pointer to the Subnet object for this subnet. * -* p_report_buf -* [in] Pointer to the large log buffer used for user reporting. -* * p_log * [in] Pointer to the log object. * diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index 5a01578..82ef7c3 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -1382,14 +1382,13 @@ static void mcast_mgr_dump_sw_routes( IN const osm_mcast_mgr_t* const p_mgr, IN const osm_switch_t* const p_sw, - IN FILE *p_mcfdbFile ) + IN FILE *file ) { osm_mcast_tbl_t* p_tbl; int16_t mlid_ho = 0; int16_t mlid_start_ho; uint8_t position = 0; int16_t block_num = 0; - char line[OSM_REPORT_LINE_SIZE]; boolean_t print_lid; const osm_node_t* p_node; uint16_t i, j; @@ -1404,7 +1403,7 @@ mcast_mgr_dump_sw_routes( p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); - fprintf( p_mcfdbFile, "\nSwitch 0x%016" PRIx64 "\n" + fprintf( file, "\nSwitch 0x%016" PRIx64 "\n" "LID : Out Port(s)\n", cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); while ( block_num <= p_tbl->max_block_in_use ) @@ -1415,7 +1414,7 @@ mcast_mgr_dump_sw_routes( mlid_ho = mlid_start_ho + i; position = 0; print_lid = FALSE; - sprintf( line, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); + fprintf( file, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); while ( position <= p_tbl->max_position ) { mask_entry = cl_ntoh16((*p_tbl->p_mask_tbl)[mlid_ho][position]); @@ -1428,13 +1427,13 @@ mcast_mgr_dump_sw_routes( for (j = 0 ; j < 16 ; j++) { if ( (1 << j) & mask_entry ) - sprintf( line, "%s 0x%03X ", line, j+(position*16) ); + fprintf( file, " 0x%03X ", j+(position*16) ); } position++; } if (print_lid) { - fprintf( p_mcfdbFile, "%s\n", line ); + fprintf( file, "\n" ); } } block_num++; diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c index fef3cac..fb4f759 100644 --- a/osm/opensm/osm_sm.c +++ b/osm/opensm/osm_sm.c @@ -256,9 +256,6 @@ osm_sm_destroy( cl_event_destroy( &p_sm->signal ); cl_event_destroy( &p_sm->subnet_up_event ); - if( p_sm->p_report_buf != NULL ) - free( p_sm->p_report_buf ); - osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ OSM_LOG_EXIT( p_sm->p_log ); } @@ -291,15 +288,6 @@ osm_sm_init( p_sm->p_disp = p_disp; p_sm->p_lock = p_lock; - p_sm->p_report_buf = malloc( OSM_REPORT_BUF_SIZE ); - if( p_sm->p_report_buf == NULL ) - { - osm_log( p_sm->p_log, OSM_LOG_ERROR, - "osm_sm_init: ERR 2E09: " - "Can't allocate report buffer\n" ); - status = IB_INSUFFICIENT_MEMORY; - goto Exit; - } status = cl_event_init( &p_sm->signal, FALSE ); if( status != CL_SUCCESS ) goto Exit; @@ -385,7 +373,6 @@ osm_sm_init( status = osm_ucast_mgr_init( &p_sm->ucast_mgr, &p_sm->req, p_sm->p_subn, - p_sm->p_report_buf, p_sm->p_log, p_sm->p_lock ); if( status != IB_SUCCESS ) goto Exit; @@ -409,7 +396,7 @@ osm_sm_init( &p_sm->mad_ctrl, p_sm->p_lock, &p_sm->subnet_up_event, - p_sm->p_report_buf, p_sm->p_log ); + p_sm->p_log ); if( status != IB_SUCCESS ) goto Exit; diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index d43e9fc..9c159df 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -118,7 +118,6 @@ osm_state_mgr_init( IN const osm_sm_mad_ctrl_t * const p_mad_ctrl, IN cl_plock_t * const p_lock, IN cl_event_t * const p_subnet_up_event, - IN char *const p_report_buf, IN osm_log_t * const p_log ) { cl_status_t status; @@ -136,7 +135,6 @@ osm_state_mgr_init( CL_ASSERT( p_sm_state_mgr ); CL_ASSERT( p_mad_ctrl ); CL_ASSERT( p_lock ); - CL_ASSERT( p_report_buf ); osm_state_mgr_construct( p_mgr ); @@ -154,7 +152,6 @@ osm_state_mgr_init( p_mgr->state = OSM_SM_STATE_IDLE; p_mgr->p_lock = p_lock; p_mgr->p_subnet_up_event = p_subnet_up_event; - p_mgr->p_report_buf = p_report_buf; p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; p_mgr->next_stage_signal = OSM_SIGNAL_NONE; @@ -1247,16 +1244,19 @@ __osm_state_mgr_report( uint8_t port_num; uint8_t start_port; uint32_t num_ports; - char line[OSM_REPORT_LINE_SIZE]; uint8_t node_type; - uint32_t line_num = 0; + + if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) + return; OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report ); - if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) - { - goto Exit; - } + fprintf( stdout, + "\n===================================================" + "====================================================" + "\nVendor : Ty " + ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " + " : Neighbor Port (Port #)\n" ); p_tbl = &p_mgr->p_subn->port_guid_tbl; @@ -1286,91 +1286,56 @@ __osm_state_mgr_report( num_ports = osm_port_get_num_physp( p_port ); for( port_num = start_port; port_num < num_ports; port_num++ ) { - if( line_num == 0 ) - { - strcpy( p_mgr->p_report_buf, - "\n===================================================" - "====================================================" ); - strcat( p_mgr->p_report_buf, - "\nVendor : Ty " - ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " - " : Neighbor Port (Port #)\n" ); - line_num++; - } - p_physp = osm_port_get_phys_ptr( p_port, port_num ); if( ( p_physp == NULL ) || ( !osm_physp_is_valid( p_physp ) ) ) continue; - sprintf( line, "%s : %s : %02X :", + fprintf( stdout, "%s : %s : %02X :", osm_get_manufacturer_str( cl_ntoh64 ( osm_node_get_node_guid ( p_node ) ) ), osm_get_node_type_str_fixed_width( node_type ), port_num ); - strcat( p_mgr->p_report_buf, line ); - p_pi = osm_physp_get_port_info_ptr( p_physp ); /* * Port state is not defined for switch port 0 */ if( port_num == 0 ) - strcat( p_mgr->p_report_buf, " :" ); + fprintf( stdout, " :" ); else - { - sprintf( line, " %s :", + fprintf( stdout, " %s :", osm_get_port_state_str_fixed_width ( ib_port_info_get_port_state( p_pi ) ) ); - strcat( p_mgr->p_report_buf, line ); - } /* * LID values are only meaningful in select cases. */ - if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) - { - if( ( ( node_type == IB_NODE_TYPE_SWITCH ) && ( port_num == 0 ) ) - || ( node_type != IB_NODE_TYPE_SWITCH ) ) - { - sprintf( line, " %04X : %01X :", - cl_ntoh16( p_pi->base_lid ), - ib_port_info_get_lmc( p_pi ) ); - - strcat( p_mgr->p_report_buf, line ); - } - else - strcat( p_mgr->p_report_buf, " : :" ); - } + if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN + && ( ( node_type == IB_NODE_TYPE_SWITCH && port_num == 0 ) + || node_type != IB_NODE_TYPE_SWITCH ) ) + fprintf( stdout, " %04X : %01X :", + cl_ntoh16( p_pi->base_lid ), + ib_port_info_get_lmc( p_pi ) ); else - strcat( p_mgr->p_report_buf, " : :" ); + fprintf( stdout, " : :" ); if( port_num != 0 ) - { - sprintf( line, " %s : %s : %s ", + fprintf( stdout, " %s : %s : %s ", osm_get_mtu_str( ib_port_info_get_neighbor_mtu( p_pi ) ), osm_get_lwa_str( p_pi->link_width_active ), osm_get_lsa_str( ib_port_info_get_link_speed_active ( p_pi ) ) ); - } else - { - sprintf( line, " %s : %s : %s ", " ", " ", " " ); - } - strcat( p_mgr->p_report_buf, line ); + fprintf( stdout, " : : " ); if( osm_physp_get_port_guid( p_physp ) == p_mgr->p_subn->sm_port_guid ) - { - sprintf( line, "* %016" PRIx64 " *", + fprintf( stdout, "* %016" PRIx64 " *", cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); - } else - { - sprintf( line, ": %016" PRIx64 " :", + fprintf( stdout, ": %016" PRIx64 " :", cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); - } - strcat( p_mgr->p_report_buf, line ); if( port_num && ( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) ) @@ -1378,36 +1343,27 @@ __osm_state_mgr_report( p_remote_physp = osm_physp_get_remote( p_physp ); if( p_remote_physp && osm_physp_is_valid( p_remote_physp ) ) { - sprintf( line, " %016" PRIx64 " (%02X)", + fprintf( stdout, " %016" PRIx64 " (%02X)", cl_ntoh64( osm_physp_get_port_guid ( p_remote_physp ) ), osm_physp_get_port_num( p_remote_physp ) ); - strcat( p_mgr->p_report_buf, line ); } else - strcat( p_mgr->p_report_buf, " UNKNOWN" ); + fprintf( stdout, " UNKNOWN" ); } - strcat( p_mgr->p_report_buf, "\n" ); - - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) - { - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); - line_num = 0; - } + fprintf( stdout, "\n" ); } - strcat( p_mgr->p_report_buf, + + fprintf( stdout, "------------------------------------------------------" "------------------------------------------------\n" ); p_port = ( osm_port_t * ) cl_qmap_next( &p_port->map_item ); } - CL_PLOCK_RELEASE( p_mgr->p_lock ); - - if( line_num != 0 ) - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); + fflush(stdout); - Exit: + CL_PLOCK_RELEASE( p_mgr->p_lock ); OSM_LOG_EXIT( p_mgr->p_log ); } diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 39d6899..da9e9f2 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -103,7 +103,6 @@ osm_ucast_mgr_init( IN osm_ucast_mgr_t* const p_mgr, IN osm_req_t* const p_req, IN osm_subn_t* const p_subn, - IN char* const p_report_buf, IN osm_log_t* const p_log, IN cl_plock_t* const p_lock ) { @@ -121,7 +120,6 @@ osm_ucast_mgr_init( p_mgr->p_subn = p_subn; p_mgr->p_lock = p_lock; p_mgr->p_req = p_req; - p_mgr->p_report_buf = p_report_buf; OSM_LOG_EXIT( p_mgr->p_log ); return( status ); @@ -140,14 +138,13 @@ __osm_ucast_mgr_dump_path_distribution( uint8_t num_ports; uint32_t num_paths; ib_net64_t remote_guid_ho; - char line[OSM_REPORT_LINE_SIZE]; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution ); p_node = osm_switch_get_node_ptr( p_sw ); num_ports = osm_switch_get_num_ports( p_sw ); - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_path_distribution: " + fprintf( stdout, "__osm_ucast_mgr_dump_path_distribution: " "Switch 0x%" PRIx64 "\n" "Port : Path Count Through Port", cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); @@ -155,11 +152,10 @@ __osm_ucast_mgr_dump_path_distribution( for( i = 0; i < num_ports; i++ ) { num_paths = osm_switch_path_count_get( p_sw , i ); - sprintf( line, "\n %03u : %u", i, num_paths ); - strcat( p_mgr->p_report_buf, line ); + fprintf( stdout, "\n %03u : %u", i, num_paths ); if( i == 0 ) { - strcat( p_mgr->p_report_buf, " (switch management port)" ); + fprintf( stdout, " (switch management port)" ); continue; } @@ -172,26 +168,23 @@ __osm_ucast_mgr_dump_path_distribution( switch( osm_node_get_remote_type( p_node, i ) ) { case IB_NODE_TYPE_SWITCH: - strcat( p_mgr->p_report_buf, " (link to switch" ); + fprintf( stdout, " (link to switch" ); break; case IB_NODE_TYPE_ROUTER: - strcat( p_mgr->p_report_buf, " (link to router" ); + fprintf( stdout, " (link to router" ); break; case IB_NODE_TYPE_CA: - strcat( p_mgr->p_report_buf, " (link to CA" ); + fprintf( stdout, " (link to CA" ); break; default: - strcat( p_mgr->p_report_buf, " (link to unknown node type" ); + fprintf( stdout, " (link to unknown node type" ); break; } - sprintf( line, " 0x%" PRIx64 ")", remote_guid_ho ); - strcat( p_mgr->p_report_buf, line ); + fprintf( stdout, " 0x%" PRIx64 ")", remote_guid_ho ); } - strcat( p_mgr->p_report_buf, "\n" ); - - osm_log_raw( p_mgr->p_log, OSM_LOG_ROUTING, p_mgr->p_report_buf ); + fprintf( stdout, "\n" ); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -202,7 +195,7 @@ static void __osm_ucast_mgr_dump_ucast_routes( IN const osm_ucast_mgr_t* const p_mgr, IN const osm_switch_t* const p_sw, - IN FILE *p_fdbFile ) + IN FILE *file ) { const osm_node_t* p_node; uint8_t port_num; @@ -211,8 +204,6 @@ __osm_ucast_mgr_dump_ucast_routes( uint8_t best_port; uint16_t max_lid_ho; uint16_t lid_ho; - char line[OSM_REPORT_LINE_SIZE]; - uint32_t line_num = 0; boolean_t ui_ucast_fdb_assign_func_defined; OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes ); @@ -221,16 +212,13 @@ __osm_ucast_mgr_dump_ucast_routes( max_lid_ho = osm_switch_get_max_lid_ho( p_sw ); + fprintf( file, "__osm_ucast_mgr_dump_ucast_routes: " + "Switch 0x%016" PRIx64 "\n" + "LID : Port : Hops : Optimal\n", + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) { - if( line_num == 0 ) - { - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_ucast_routes: " - "Switch 0x%016" PRIx64 "\n" - "LID : Port : Hops : Optimal\n", - cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); - line_num++; - } + fprintf(file, "0x%04X : ", lid_ho); port_num = osm_switch_get_port_by_lid( p_sw, lid_ho ); if( port_num == OSM_NO_PATH ) @@ -241,9 +229,7 @@ __osm_ucast_mgr_dump_ucast_routes( will reassign and compress the LID range. The subnet should work fine either way. */ - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); - strcat( p_mgr->p_report_buf, line ); - line_num++; + fprintf( file, "UNREACHABLE\n" ); continue; } /* @@ -255,19 +241,15 @@ __osm_ucast_mgr_dump_ucast_routes( num_hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); if( num_hops == OSM_NO_PATH ) { - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); - strcat( p_mgr->p_report_buf, line ); - line_num++; + fprintf( file, "UNREACHABLE\n" ); continue; } best_hops = osm_switch_get_least_hops( p_sw, lid_ho ); - sprintf( line, "0x%04X : %03u : %02u : ", - lid_ho, port_num, num_hops ); - strcat( p_mgr->p_report_buf, line ); + fprintf( file, "%03u : %02u : ", port_num, num_hops ); if( best_hops == num_hops ) - strcat( p_mgr->p_report_buf, "yes" ); + fprintf( file, "yes" ); else { if (p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign) @@ -282,23 +264,13 @@ __osm_ucast_mgr_dump_ucast_routes( p_sw, lid_ho, TRUE, NULL, NULL, NULL, NULL, /* No LMC Optimization */ ui_ucast_fdb_assign_func_defined ); - sprintf( line, "No %u hop path possible via port %u!", + fprintf( file, "No %u hop path possible via port %u!", best_hops, best_port ); - strcat( p_mgr->p_report_buf, line ); } - strcat( p_mgr->p_report_buf, "\n" ); - - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) - { - fprintf(p_fdbFile,"%s",p_mgr->p_report_buf ); - line_num = 0; - } + fprintf( file, "\n" ); } - if( line_num != 0 ) - fprintf(p_fdbFile,"%s\n",p_mgr->p_report_buf ); - OSM_LOG_EXIT( p_mgr->p_log ); } -- 1.4.3.g7768 From vishal at endace.com Thu Oct 19 18:20:00 2006 From: vishal at endace.com (vishal) Date: Fri, 20 Oct 2006 14:20:00 +1300 Subject: [openib-general] openib-1.1.tgz Message-ID: <1161307200.5074.10.camel@julia.et.endace.com> Hi, Where could I find the file openib-1.1.tgz ? Couldn't find it on www.openib.org ...Thanks! Vishal From xma at us.ibm.com Thu Oct 19 19:34:14 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 19 Oct 2006 19:34:14 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Roland, I have applied this patch and updated patch 2/2. You will send out an updated patch 2/2, I think. I did some extra modification in ipoib code, (which has more extra repolls). I do see around 10% or more performance improvement now with this change on both scaling and none scaling code. I will run oprofile tomorrow to see the difference. I think with these extra repolls, the cpu utilization would be much higher. Thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu Oct 19 19:37:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 19:37:24 -0700 Subject: [openib-general] [PATCH/RFC 2/2] IPoIB: Convert to NAPI In-Reply-To: (Roland Dreier's message of "Thu, 19 Oct 2006 15:33:50 -0700") References: <20061019202144.GC2674@mellanox.co.il> Message-ID: Convert the IP-over-InfiniBand network device driver over to using NAPI to handle all completions (both receive and send). Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib.h | 1 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 78 +++++++++++++++++++++++------ drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 + 3 files changed, 65 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 0b8a79d..025cef2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -239,6 +239,7 @@ extern struct workqueue_struct *ipoib_wo /* functions */ +int ipoib_poll(struct net_device *dev, int *budget); void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 8bf5e9e..7c0fe8f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -286,26 +286,58 @@ static void ipoib_ib_handle_tx_wc(struct wc->status, wr_id, wc->vendor_err); } -static void ipoib_ib_handle_wc(struct net_device *dev, struct ib_wc *wc) +int ipoib_poll(struct net_device *dev, int *budget) { - if (wc->wr_id & IPOIB_OP_RECV) - ipoib_ib_handle_rx_wc(dev, wc); - else - ipoib_ib_handle_tx_wc(dev, wc); + struct ipoib_dev_priv *priv = netdev_priv(dev); + int max = min(*budget, dev->quota); + int done; + int t; + int empty; + int n, i; + +repoll: + done = 0; + empty = 0; + + while (max) { + t = min(IPOIB_NUM_WC, max); + n = ib_poll_cq(priv->cq, t, priv->ibwc); + + for (i = 0; i < n; ++i) { + if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) { + ++done; + --max; + ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); + } else + ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); + } + + if (n != t) { + empty = 1; + break; + } + } + + dev->quota -= done; + *budget -= done; + + if (empty) { + netif_rx_complete(dev); + if (unlikely(ib_req_notify_cq(priv->cq, + IB_CQ_NEXT_COMP | + IB_CQ_REPORT_MISSED_EVENTS)) && + netif_rx_reschedule(dev, 0)) + goto repoll; + + return 0; + } + + return 1; } void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) { - struct net_device *dev = (struct net_device *) dev_ptr; - struct ipoib_dev_priv *priv = netdev_priv(dev); - int n, i; - - ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); - do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); - for (i = 0; i < n; ++i) - ipoib_ib_handle_wc(dev, priv->ibwc + i); - } while (n == IPOIB_NUM_WC); + netif_rx_schedule(dev_ptr); } static inline int post_send(struct ipoib_dev_priv *priv, @@ -510,9 +542,10 @@ int ipoib_ib_dev_stop(struct net_device struct ib_qp_attr qp_attr; unsigned long begin; struct ipoib_tx_buf *tx_req; - int i; + int i, n; clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); + netif_poll_disable(dev); /* * Move our QP to the error state and then reinitialize in @@ -559,6 +592,16 @@ int ipoib_ib_dev_stop(struct net_device goto timeout; } + do { + n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc); + for (i = 0; i < n; ++i) { + if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) + ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); + else + ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); + } + } while (n == IPOIB_NUM_WC); + msleep(1); } @@ -587,6 +630,9 @@ timeout: msleep(1); } + netif_poll_enable(dev); + ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP); + return 0; } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1eaf00e..d90aafb 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -896,6 +896,8 @@ static void ipoib_setup(struct net_devic dev->hard_header = ipoib_hard_header; dev->set_multicast_list = ipoib_set_mcast_list; dev->neigh_setup = ipoib_neigh_setup_dev; + dev->poll = ipoib_poll; + dev->weight = 100; dev->watchdog_timeo = HZ; -- 1.4.1 From rdreier at cisco.com Thu Oct 19 19:39:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 19:39:25 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: (Shirley Ma's message of "Thu, 19 Oct 2006 19:34:14 -0700") References: Message-ID: > I have applied this patch and updated patch 2/2. You will send out an > updated patch 2/2, I think. Sorry, messed that up. I just sent out the patch. > I did some extra modification in ipoib code, (which has more extra > repolls). I do see around 10% or more performance improvement now with this > change on both scaling and none scaling code. I will run oprofile tomorrow > to see the difference. I think with these extra repolls, the cpu > utilization would be much higher. You mean you add more calls to ib_poll_cq()? Where do you add them? Why does it help? - R. From xma at us.ibm.com Thu Oct 19 20:37:15 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 19 Oct 2006 20:37:15 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Roland Dreier wrote on 10/19/2006 07:39:25 PM: > > I have applied this patch and updated patch 2/2. You will send out an > > updated patch 2/2, I think. > > Sorry, messed that up. I just sent out the patch. No problem, I did same change. > You mean you add more calls to ib_poll_cq()? Where do you add them? > Why does it help? > > - R. I run out of ideas why losing 2/3 of the throughput and got 476 trans/s. So I assumed there was always a missed event, then ipoib would stay in its napi poll within its scheduled time. That's why it helps. This is really a hack, doesn't address the problem. It sacrificed cpu utilization and gained the performance back. I need to understand how ehca reports missing event, there might be some delay there? Thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu Oct 19 20:48:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 20:48:44 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <4537D68C.4040409@sgi.com> (John Partridge's message of "Thu, 19 Oct 2006 14:48:28 -0500") References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> Message-ID: > 23454: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 > 23462: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 > 23470: Split compl. Lower A = 00 Req = (0,0,0) Tag = 0 Comp = (0,2,0) WAIT = 1 (Error completion) > 23476: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE) > > We see here that a Config Write to Reg 01 (PCI_COMMAND) is issued across the > bus. We then see the Memory Read to 698 that goes across the bus before the > completion of the Config Write to Reg 01. OK, this is the crux of my confusion. I always thought (and the PCI spec seems to say this too) that config writes are non-posted, which means that the Config Write cycle in your trace should block everything until it is completed. Is that not true? Or could this be a bug in the SAL for this platform or something like that? - R. From rdreier at cisco.com Thu Oct 19 21:10:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 19 Oct 2006 21:10:35 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: (Shirley Ma's message of "Thu, 19 Oct 2006 20:37:15 -0700") References: Message-ID: > I run out of ideas why losing 2/3 of the throughput and got 476 trans/s. So > I assumed there was always a missed event, then ipoib would stay in its > napi poll within its scheduled time. That's why it helps. This is really a > hack, doesn't address the problem. It sacrificed cpu utilization and gained > the performance back. I need to understand how ehca reports missing event, > there might be some delay there? It's entirely possible that my implementation of the missing event hint in ehca is wrong. I just guessed based on how poll CQ is implemented -- if the consumer requests a hint about missing events, then I lock the CQ and check if its empty after requesting notification. I looked over my code again, and I don't see anything obviously wrong, but it's quite possible I made a mistake that I just can't see right now (like reversing a truth value somewhere). Someone who knows how ehca works might be able to spot the error. - R. From xma at us.ibm.com Thu Oct 19 21:44:20 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 19 Oct 2006 21:44:20 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Roland Dreier wrote on 10/19/2006 09:10:35 PM: > It's entirely possible that my implementation of the missing event > hint in ehca is wrong. I just guessed based on how poll CQ is > implemented -- if the consumer requests a hint about missing events, > then I lock the CQ and check if its empty after requesting > notification. > > I looked over my code again, and I don't see anything obviously wrong, > but it's quite possible I made a mistake that I just can't see right > now (like reversing a truth value somewhere). Someone who knows how > ehca works might be able to spot the error. > > - R. The oprofile data (with your napi + this hack patch) looks good, it reduced cpu utilization significantly. (I was wrong about cpu utilization.) I will talk with ehca team regarding this missing event hint patch on ehca. thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From HNGUYEN at de.ibm.com Thu Oct 19 23:02:11 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 20 Oct 2006 08:02:11 +0200 Subject: [openib-general] [openfabrics-ewg] test results from Mellanox, Voltaire, QLogic, and IBM for OFED 1.1 rc7? In-Reply-To: Message-ID: Hi, > attached please find the test results of OFED 1.1 pre1 in conjunction with IBM eHCA on POWER5. One comment: the build+installation tests were succeeded with pre1 as we had to apply patches for openib.spec, libehca/configure.in and libehca/config.h.in on our test systems. As Vlad explained in another thread the last two ones would not be required in release version since ofed config would handle that. Let's see for it. Thanks! Nam From xma at us.ibm.com Fri Oct 20 01:30:03 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 20 Oct 2006 01:30:03 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: Message-ID: Retest several times, this hack patch only fixed the none scaling code. I thought I tested both scaling and none scaling, it seems I made a mistake, I might configure and test none scaling configuration twice in previous run. thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Oct 20 02:29:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Oct 2006 05:29:11 -0400 Subject: [openib-general] [PATCH] osm: reviewing osmtest - osmt_multicast.c In-Reply-To: <45379937.4040004@dev.mellanox.co.il> References: <45379937.4040004@dev.mellanox.co.il> Message-ID: <1161336425.25985.113319.camel@hal.voltaire.com> On Thu, 2006-10-19 at 11:26, Yevgeny Kliteynik wrote: > In case osmtest failed to remove some MC group, > dumping all the MC groups that still remain, > and their member. > > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From halr at voltaire.com Fri Oct 20 02:45:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Oct 2006 05:45:03 -0400 Subject: [openib-general] [PATCH][TRIVIAL] osmtest/osmt_multicast.c: Eliminate superfluous clearing of some variables Message-ID: <1161337478.25985.113880.camel@hal.voltaire.com> osmtest/osmt_multicast.c: Eliminate superfluous clearing of some variables Also, some other cosmetic changes Signed-off-by: Hal Rosenstock Index: osmtest/osmt_multicast.c =================================================================== --- osmtest/osmt_multicast.c (revision 9918) +++ osmtest/osmt_multicast.c (working copy) @@ -84,8 +84,7 @@ __osmt_print_all_multicast_records( req.pfn_query_cb = osmtest_query_res_cb; req.p_query_input = &user; - /* UnTrusted - get the multicast groups */ - req.sm_key = 0; + /* UnTrusted (SMKey of 0) - get the multicast groups */ status = osmv_query_sa(p_osmt->h_bind, &req); if (status != IB_SUCCESS || context.result.status != IB_SUCCESS) @@ -187,7 +186,6 @@ osmt_query_mcast( IN osmtest_t * const p context.p_osmt = p_osmt; user.attr_id = IB_MAD_ATTR_MCMEMBER_RECORD; user.attr_offset = ib_get_attr_offset( sizeof( ib_member_rec_t ) ); - user.comp_mask = 0; req.query_type = OSMV_QUERY_USER_DEFINED; req.timeout_ms = p_osmt->opt.transaction_timeout; @@ -196,7 +194,6 @@ osmt_query_mcast( IN osmtest_t * const p req.query_context = &context; req.pfn_query_cb = osmtest_query_res_cb; req.p_query_input = &user; - req.sm_key = 0; status = osmv_query_sa( p_osmt->h_bind, &req ); @@ -325,7 +322,7 @@ osmt_send_mcast_request( IN osmtest_t * memset( &req, 0, sizeof( req ) ); memset( &user, 0, sizeof( user ) ); memset( &context, 0, sizeof( context ) ); - memset( p_res, 0, sizeof(ib_sa_mad_t ) ); + memset( p_res, 0, sizeof( ib_sa_mad_t ) ); context.p_osmt = p_osmt; @@ -371,7 +368,6 @@ osmt_send_mcast_request( IN osmtest_t * req.query_context = &context; req.pfn_query_cb = osmtest_query_res_cb; req.p_query_input = &user; - req.sm_key = 0; status = osmv_query_sa( p_osmt->h_bind, &req ); @@ -417,7 +413,6 @@ osmt_send_mcast_request( IN osmtest_t * OSM_LOG_EXIT( &p_osmt->log ); return ( status ); - } /********************************************************************** @@ -3251,7 +3246,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons if (p_osmt->opt.mmode > 2) { /* Check invalid Join with max mlid which is more than the - Mellanox switches support 0xC000+0x1000 = 0xd000 */ + Mellanox switches support 0xC000+0x1000 = 0xd000 */ osm_log( &p_osmt->log, OSM_LOG_INFO, "osmt_run_mcast_flow: " "Checking Creation of Maximum avaliable Groups (MulticastFDBCap)...\n" @@ -3260,6 +3255,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons while (tmp_mlid > 0 && !ReachedMlidLimit) { uint16_t cur_mlid = 0; + /* Request Set */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); /* Good Flow - mgid is 0 while giving all required fields for From tziporet at mellanox.co.il Fri Oct 20 03:25:03 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Fri, 20 Oct 2006 12:25:03 +0200 Subject: [openib-general] OFED 1.1 - Official Release Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> I am happy to announce that OFED 1.1 Official Release is now available. The release can be found under: https://openib.org/svn/gen2/branches/1.1/ofed/releases/   And later today it will be on the OpenFabrics download page:  http://www.openfabrics.org/downloads.html.   This release was done in a joint effort of the following companies: * Cisco * SilverStorm * Voltaire * QLogic * Intel * IBM * Mellanox Technologies   I wish to thank all who contributed to the success of this release.   Tziporet ===============================================================================   Release summary: ================ The OFED software package is composed of several software modules intended for use on a computer cluster constructed as an InfiniBand network.   OFED package contains the following components: ===============================================   o   OpenFabrics core and ULPs:         - HCA drivers (mthca, ipath, ehca)         - core         - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host and uDAPL   o   OpenFabrics utilities:         - OpenSM: InfiniBand Subnet Manager         - Diagnostic tools         - Performance tests   o   MPI:         - OSU MPI stack supporting the InfiniBand interface         - Open MPI stack supporting the InfiniBand interface         - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, Presta)   o   Sources of all software modules (under conditions mentioned in the modules'       LICENSE files)   o   Documentation   Notes: 1. SDP is in beta quality. 2. ehca driver is in technology preview state. 3. All other OFED components are of production quality. Supported Platforms and Operating Systems =========================================     CPU architectures:         * x86_64         * x86         * ia64         * ppc64   Linux Operating Systems: - RedHat EL4 up3: 2.6.9-34.ELsmp - RedHat EL4 up4: 2.6.9-42.ELsmp - SLES9 SP3: 2.6.5-7.244-smp - SLES10: 2.6.16.21-0.8-smp - kernel.org: 2.6.17.x and 2.6.18.x   HCAs Supported ==============  Mellanox HCAs:         - InfiniHost         - InfiniHost III Ex (both modes: with memory and MemFree)         - InfiniHost III Lx         Both SDR and DDR mode of the InfiniHost III family are supported.           For official FW versions please see:         http://www.mellanox.com/support/firmware_table.php   Qlogic HCAs:         - QHT6040 (PathScale InfiniPath HT-460)         - QHT6140 (PathScale InfiniPath HT-465)         - QLE6140 (PathScale InfiniPath PE-880) IBM HCAs: - GX Dual-port 4x IB HCA - GX Dual-port 12x IB HCA   Switches Supported This release was tested with switches and gateways provided by the following companies:         - Cisco         - Voltaire         - SilverStorm         - Flextronics   Third Party Packages ==================== The following third party packages have been tested with OFED 1.1: 1. Intel MPI, Version 2.0.1 - refresh, and Version 3.0 2. HP MPI OFED Sources: ============= Source repositories: Kernel: git://www.mellanox.co.il/~git/infiniband ofed_1_1 User: https://openib.org/svn/gen2/branches/1.1/src/userspace  Main changed from OFED 1.0: ============================ - Kernel code based on 2.6.18 - High Availability in IPoIB and SRP (beta) - RDS was removed for the OFED package (to be added in future releases) - IBM low level driver (ehca) was added - MPI: - OSU MVAPICH: Message coalescing - Open MPI: Version was updated to v1.1.1 - MPI tests: Updated to latest versions from LLNL, Intel and OSU - Management: Added utilities and enhanced tools - Full support for ppc64 libraries (32 and 64 bits) See the attached are the release notes for details     Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380   -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED_release_notes.txt URL: From tziporet at mellanox.co.il Fri Oct 20 03:36:17 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Fri, 20 Oct 2006 12:36:17 +0200 Subject: [openib-general] changes between OFED 1.1-pre1 and OFED 1.1 Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACF03@mtlexch01.mtl.com> The following changes where done between OFED 1.1-pre1 and OFED 1.1 official release: - libehca.so is not a part of libehca rpm: BUG 282. - libmthca and libipathverbs 32 bit are not installed on x86_64: BUG 283. - fixed lib/lib64 issue on ppc64 SLES9.0 - ibdiagnet: Failed to discover sm entity running on switch: BUG 284 - Documentation update The first 3 fixes where done since they were critical to IBM and the other fix is only influencing the ibdiagnet utility. Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 From mst at mellanox.co.il Fri Oct 20 06:30:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 20 Oct 2006 15:30:15 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: References: Message-ID: <20061020133015.GA32177@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: [openfabrics-ewg] [openib-general] RHEL5 and OFED ... > > Roland Dreier wrote on 10/19/2006 09:19:14 AM: > > > Shirley> How can RHEL5 pick up this particular patch? Applications > > Shirley> with fork() depend on this patch. > > > > It can't really, since it breaks the libibverbs ABI and therefore has > > to be part of a major release. > > Then we need to wait for the new release or find an alternative way which I doubt. It is actually possible: You can use malloc hooks to set the madvise flag for all allocated memory. This will include memory allocated by libibverbs and its plugins and memory you are going to register. -- MST From rdreier at cisco.com Fri Oct 20 07:35:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 20 Oct 2006 07:35:51 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061020133015.GA32177@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 20 Oct 2006 15:30:15 +0200") References: <20061020133015.GA32177@mellanox.co.il> Message-ID: Michael> It is actually possible: You can use malloc hooks to set Michael> the madvise flag for all allocated memory. This will Michael> include memory allocated by libibverbs and its plugins Michael> and memory you are going to register. ...and also all the memory that the child process you fork() might want to use. But if you're really desperate you could try to do it I guess. If all you care about is system()/fork()+exec(), then be aware that RHEL5 has a new enough kernel so that will work without doing anything. From mst at mellanox.co.il Fri Oct 20 07:42:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 20 Oct 2006 16:42:02 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: References: Message-ID: <20061020144202.GB32177@mellanox.co.il> Quoting r. Roland Dreier : > If all you care about is system()/fork()+exec(), then be aware that > RHEL5 has a new enough kernel so that will work without doing anything. As was pointed out, its really system/fork+wait. -- MST From dledford at redhat.com Fri Oct 20 07:42:24 2006 From: dledford at redhat.com (Doug Ledford) Date: Fri, 20 Oct 2006 10:42:24 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061019050907.GA1547@mellanox.co.il> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> Message-ID: <1161355344.2917.609.camel@fc6.xsintricity.com> On Thu, 2006-10-19 at 07:09 +0200, Michael S. Tsirkin wrote: > Quoting r. Doug Ledford : > > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > > > On Wed, 2006-10-18 at 09:29 +0200, Michael S. Tsirkin wrote: > > > Quoting r. Doug Ledford : > > > > > >From our dicussion, it seems we should be able to just push the > > > > > small number of missing bits into RHEL5 directly. That would be > > > > > nicer of course. > > > > > > > > It depends. If there's lots of individual changes, it might be easier > > > > to push the OFED 1.1 change. But, that depends on when the final OFED > > > > 1.1 comes out and how much it varies from the existing RPMs. > > > > > > OFED is in deep freeze, so you can already look at it to estimate the amount of > > > changes against 2.6.18. > > > Could you look at the diff please so that I know whether it's worth it > > > to invest in building the minimal patch set for pushing into RHEL5, > > > or whether you'll push OFED 1.1 into RHEL kernel as is? > > > > Yeah, I'll look over the diff today. > > How does it look? OK. The total diffstat is large enough that it's going to be a very hard sell to include. I'll try, but it'll be hard. There is a test kernel available at my site below. I haven't even gotten around to testing it myself yet, I fired off the build last night and it was done when I came in this morning. Anyone interested is free to play with it. Of note is that both ipath and ipoib modules required minor tweaks to work with the inode-diet patch set which could effect their operation (I suspect the patch is correct, but haven't tested it). -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mst at mellanox.co.il Fri Oct 20 08:01:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 20 Oct 2006 17:01:30 +0200 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161355344.2917.609.camel@fc6.xsintricity.com> References: <1161355344.2917.609.camel@fc6.xsintricity.com> Message-ID: <20061020150130.GC32177@mellanox.co.il> Quoting r. Doug Ledford : > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > On Thu, 2006-10-19 at 07:09 +0200, Michael S. Tsirkin wrote: > > Quoting r. Doug Ledford : > > > Subject: Re: [openfabrics-ewg] RHEL5 and OFED ... > > > > > > On Wed, 2006-10-18 at 09:29 +0200, Michael S. Tsirkin wrote: > > > > Quoting r. Doug Ledford : > > > > > > >From our dicussion, it seems we should be able to just push the > > > > > > small number of missing bits into RHEL5 directly. That would be > > > > > > nicer of course. > > > > > > > > > > It depends. If there's lots of individual changes, it might be easier > > > > > to push the OFED 1.1 change. But, that depends on when the final OFED > > > > > 1.1 comes out and how much it varies from the existing RPMs. > > > > > > > > OFED is in deep freeze, so you can already look at it to estimate the > > > > amount of changes against 2.6.18. Could you look at the diff please so > > > > that I know whether it's worth it to invest in building the minimal > > > > patch set for pushing into RHEL5, or whether you'll push OFED 1.1 into > > > > RHEL kernel as is? > > > > > > Yeah, I'll look over the diff today. > > > > How does it look? > > OK. The total diffstat is large enough that it's going to be a very > hard sell to include. I'll try, but it'll be hard. Actually, the only large patch by far is the ipath-fixes.patch: Some of these patches are already in 2.6.18. wc -l *.patch 39 cm_add_mra_timeout_limit.patch 72 cma_established1.patch 18 cma_increase_max_cm_retries.patch 24 cma_list_init.patch 20 cma_mem_leak.patch 38 cma_race_fix.patch 33 cma_tavor_quirk.patch 18 cm_cleanup_timewait.patch 21 ib_sa_names.patch 11273 ipath-fixes.patch 194 ipath-limit-packets-sent-without-ack.patch 169 ipath-memcpy_cachebypass.patch 13 ipath-x86_64.patch 44 ipoib_issue3.patch 47 ipoib_mcast_join_mask.patch 28 ipoib_mcast_restart.patch 74 ipoib_selector_updated.patch 66 ishai_srp_attributes.patch 57 ishai_srp_remove_reconnect.patch 71 ishai_srp_wa_post_send.patch 20 lockdep_header.patch 18 mthca_av_statrate.patch 314 mthca_catas_reset.patch 18 mthca_mad_traps.patch 18 mthca_query_port.patch 38 mthca_query_qp_portnum.patch 17 mthca_query_qp_statrate_bits.patch 29 mthca_use_uar2.patch 96 robert-ipath-diagpkt-init-fixup.patch 39 sdp_credits_by_seq.patch 52 sdp_post_credits.patch 157 sean_cma_establish.patch 104 sean_cma_hotplug.patch 17 sean_cma_typo_fix.patch 68 sean_cm_drep_on_not_found.patch 101 sean_cm_randomize_psn.patch 199 sean_cm_unload_crash.patch 83 srp_1_recreate_at_reconnect.patch 112 srp_2_use_multiple_initiator_ports.patch 121 svnehca_0015_1.patch 12 svnehca_0015_2.patch and despite the name it is actually adding a new features - such as mmap of kernel buffers insto userspace - not purely a bugfix. So, how about pushing all the rest for starters? -- MST From HNGUYEN at de.ibm.com Fri Oct 20 08:06:06 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Fri, 20 Oct 2006 17:06:06 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <6C2C79E72C305246B504CBA17B5500C934789A@mtlexch01.mtl.com> Message-ID: Hi Vladimir! > No, the updated configure.in and config.h.in are not in OFED-1.1. > In any case, I believe that most of the checks you have added to > configure scripts are provided by OFED installation scripts. > So, in case OFED installation fails, ehca configure would fail as well. Created a bugzilla as libehca could not be found by libibverbs, http://openib.org/bugzilla/show_bug.cgi?id=285. I guess I was not precise enough in that configure does not only check libsysfs, but also generates corresponding define HAVE_SYSFS_LIBSYSFS_H in config.h to enable/export openib_driver_init() from libehca. So we really need configure.in and config.h.in to enable libehca for user space. Without them ehca configure still goes through. Regards Nam From dledford at redhat.com Fri Oct 20 08:38:51 2006 From: dledford at redhat.com (Doug Ledford) Date: Fri, 20 Oct 2006 11:38:51 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <20061020150130.GC32177@mellanox.co.il> References: <1161355344.2917.609.camel@fc6.xsintricity.com> <20061020150130.GC32177@mellanox.co.il> Message-ID: <1161358731.2917.615.camel@fc6.xsintricity.com> On Fri, 2006-10-20 at 17:01 +0200, Michael S. Tsirkin wrote: > > OK. The total diffstat is large enough that it's going to be a very > > hard sell to include. I'll try, but it'll be hard. > > Actually, the only large patch by far is the ipath-fixes.patch: > > Some of these patches are already in 2.6.18. > > wc -l *.patch This doesn't equate to the update I had to do. Keep in mind that the 2.6.18.1 kernel didn't have either the SDP or ehca modules, and as such our previous kernel grabbed those two from the OFED 1.0 code. So, the diffstat just to bring them both up to OFED 1.1 status so that the remaining smaller patches would apply was large. In addition, the previous kernel was missing the rdma_ucma module entirely, so that's another largish diff prior to applying these patches. So, my diffstat is much larger than this. > 39 cm_add_mra_timeout_limit.patch > 72 cma_established1.patch > 18 cma_increase_max_cm_retries.patch > 24 cma_list_init.patch > 20 cma_mem_leak.patch > 38 cma_race_fix.patch > 33 cma_tavor_quirk.patch > 18 cm_cleanup_timewait.patch > 21 ib_sa_names.patch > 11273 ipath-fixes.patch > 194 ipath-limit-packets-sent-without-ack.patch > 169 ipath-memcpy_cachebypass.patch > 13 ipath-x86_64.patch > 44 ipoib_issue3.patch > 47 ipoib_mcast_join_mask.patch > 28 ipoib_mcast_restart.patch > 74 ipoib_selector_updated.patch > 66 ishai_srp_attributes.patch > 57 ishai_srp_remove_reconnect.patch > 71 ishai_srp_wa_post_send.patch > 20 lockdep_header.patch > 18 mthca_av_statrate.patch > 314 mthca_catas_reset.patch > 18 mthca_mad_traps.patch > 18 mthca_query_port.patch > 38 mthca_query_qp_portnum.patch > 17 mthca_query_qp_statrate_bits.patch > 29 mthca_use_uar2.patch > 96 robert-ipath-diagpkt-init-fixup.patch > 39 sdp_credits_by_seq.patch > 52 sdp_post_credits.patch > 157 sean_cma_establish.patch > 104 sean_cma_hotplug.patch > 17 sean_cma_typo_fix.patch > 68 sean_cm_drep_on_not_found.patch > 101 sean_cm_randomize_psn.patch > 199 sean_cm_unload_crash.patch > 83 srp_1_recreate_at_reconnect.patch > 112 srp_2_use_multiple_initiator_ports.patch > 121 svnehca_0015_1.patch > 12 svnehca_0015_2.patch > > and despite the name it is actually adding a new features - > such as mmap of kernel buffers into user space - not purely a bug fix. > So, how about pushing all the rest for starters? > > -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From xma at us.ibm.com Fri Oct 20 10:02:54 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 20 Oct 2006 10:02:54 -0700 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: <20061019202144.GC2674@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 10/19/2006 01:21:45 PM: > Please also note that due to factors such as TCP window limits, TX on a single > socket is often stalled. To really stress a connection and see benefit from > NAPI you should be running multiple socket streams in parallel: > either just run multiple instances of netperf/netserver, or use iperf with -P flag. I used to get 7600Mb/s IPoIB one socket duplex throughput with my other IPoIB patches on 2.6.5 kernel under certain configuration. Which makes me believe we could gain close to link throughput with one UD QP. Now I couldn't get it anymore on the new kernel. I was struggling with TCP window limits on the new kernel. Do you have any hint? thanks Shirley Ma IBM Linux Technology Center -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Fri Oct 20 10:33:04 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Fri, 20 Oct 2006 10:33:04 -0700 (PDT) Subject: [openib-general] [Bug 159] OFED1.0: Missing interfaces Message-ID: <20061020173304.057122283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=159 ------- Comment #4 from venkatesh.babu at 3leafnetworks.com 2006-10-20 10:33 ------- Created an attachment (id=65) --> (http://openib.org/bugzilla/attachment.cgi?id=65&action=view) Implementation of new interface This is the implementation of the ib_sa_serv_notice_hdlr() which can be used to register the remote event notifications like Port UP/DOWN. This interface is crusial to implement the APM functionality in which these notifications from the passive RC QP connection side can be used by the active RC QP connection side to either cause the failover to the alternate path or reload the new alternate path. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jgunthorpe at obsidianresearch.com Fri Oct 20 11:24:45 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 20 Oct 2006 12:24:45 -0600 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> Message-ID: <20061020182445.GA4054@obsidianresearch.com> On Thu, Oct 19, 2006 at 08:48:44PM -0700, Roland Dreier wrote: > OK, this is the crux of my confusion. I always thought (and the PCI > spec seems to say this too) that config writes are non-posted, which > means that the Config Write cycle in your trace should block > everything until it is completed. Is that not true? Or could this be > a bug in the SAL for this platform or something like that? The posted/non-posted write stuff in the spec really only means that a split completion is generated for that transaction on the bus. There is no bus-level requirement that the bus halt while an outstanding split is pending. In fact, the PCI-X ordering rules in this case actually would allow your config read and memory read to be re-ordered by a the bridge (table 8-3). ``Split requests are permitted to be blocked by or pass other split requests.'' Most implementations block the CPU on a non-posted write which provides the necessary serialization, but Altix clearly didn't.. To properly correct this you need to have a barrier that synchronizes with the split completion from the device. I'd even go so far as to say that not doing this for config_write is unusual/useless behavior and the barrier might be better part of the config_write primitive... Jason From rjwalsh at pathscale.com Fri Oct 20 11:44:31 2006 From: rjwalsh at pathscale.com (Robert Walsh) Date: Fri, 20 Oct 2006 11:44:31 -0700 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <1161355344.2917.609.camel@fc6.xsintricity.com> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> <1161355344.2917.609.camel@fc6.xsintricity.com> Message-ID: <4539190F.1000209@pathscale.com> > Of note is that both ipath and ipoib modules required minor tweaks to > work with the inode-diet patch set which could effect their operation (I > suspect the patch is correct, but haven't tested it). Can you send a patch for this, please? From venkatesh.babu at 3leafnetworks.com Fri Oct 20 13:29:08 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Fri, 20 Oct 2006 13:29:08 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: References: Message-ID: <45393194.40707@3leafnetworks.com> I have added couple of patches to the OFED stack as described in bug#160, bug#172, and bug#159 and with this successfully tested the APM functionality, except one issue. Configuration: 2 Nodes CPU: AMD Opteron(tm) Processor 252 Dual processor CA type: MT25208 Firmware version: 5.1.4 OS: CentOS release 4.2 IB: OFED 1.0 2 Flextronics 24 port switchs Node1 Port1 connected to Switch1 Node1 Port2 connected to Switch2 Node2 Port1 connected to switch1 Node 2 Port 2 connected to Switch2 Node1 : Active side of the RC QP Node 2 : Passive side of the RC QP Test1: Failover simulation on Node1 1. Simulate the port1 failure, RC QP migrates the path to port2 2. Simulate the port1 UP to rearm the alternate path from port1 3. Simulate the port2 failure, RC QP migrate the path to port1 4. Simulate the port2 IP to rearm the alternate path from port2 Test2: Real failover my manually pulling the cable 1. Simulate the failover/failback by pulling cable of Node1 port1 2. Simulate the failover/failback by pulling cable of Node1 port2 3. Simulate the failover/failback by pulling cable of Node2 port1 4. Simulate the failover/failback by pulling cable of Node2 port2 ISSUE: If I pull the both the cables then there are no paths to the destination, so RC QP connection is supposed to tear down. But it is not working. 1. Create a RC QP and load both primary and alternate path (I was setting rnr_retry_count = 6, retry_count = 6, packet_life_time field of struct ib_sa_path_rec to 15 and also tried with 12) 2. Send some traffic over RC QP 3. Disconnect the cable belonging to the primary path 4. It smoothly fails over to alternate path and it becomes primary path. No affect to the traffic on that RC QP 5. Remove the second cable belonging to the new primary path. 6. Obviously traffic stops since there are no paths to the destination. But for the outstanding WRs in the RC QP I don't get any callback from the verbs layer describing whether it succeeded or failed due to some error like IB_WC_RETRY_EXC_ERR. When I query the RC QP properties it still shows that it is in IB_QPS_RTS state. Without APM functionality it behaves correctly - 1. Create a RC QP and load only primary path (I was setting rnr_retry_count = 6, retry_count = 6, packet_life_time field of struct ib_sa_path_rec to 15 and also tried with 12) 2. Send some traffic over RC QP 3. Disconnect the cable belonging to the primary path 4. Obviously traffic stops since there are no paths to the destination. For the outstanding WRs in the RC QP I do get a callback from the verbs layer describing the first WR that it failed due to error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR. I will close this RC QP. VBabu Date: Mon, 16 Oct 2006 14:03:50 -0700 From: "Sean Hefty" Subject: Re: [openib-general] APM support in openib stack To: somenath at veritas.com Cc: openib-general at openib.org Message-ID: <4533F3B6.1030509 at ichips.intel.com> Content-Type: text/plain; charset=iso-8859-1; format=flowed somenath wrote: >>>> Doesn't ib_cm_init_qp_attr() set this for you? >> >> >> >> No, it doesn't. it returns me >> attr_mask= 0x12d181 >> port=0x0 alt_port=0x0 > > Okay - there was a fix to the cm.c file (svn rev 8267) that added setting the alternate port number when initializing the QP attributes. Apparently that fix did not make it into the release that you're using. - Sean From chetm at us.ibm.com Fri Oct 20 13:47:05 2006 From: chetm at us.ibm.com (Chet Mehta) Date: Fri, 20 Oct 2006 15:47:05 -0500 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: Message-ID: Tziporet, I understand that OFED1.1-rc7 was released without the requested change below despite the fact that the request came in before the deadline. As things stand, userspace is broken for ehca in OFED 1.1. Was the request rejected because ehca is considered a technology preview for this OFED release and if so are there well understood (documented) rules/limitations on how their submissions are handled? I'm asking to (a) learn from this experience (b) understand what we could/should have done differently (c) help avoid issues of this type in the future for us or other contributors. I would appreciate hearing back from you. Thx! :Chet. Hoang-Nam Nguyen/Germany/IBM at IBMDE 10/19/2006 09:41 AM To Tziporet Koren cc "Michael S. Tsirkin" , openfabrics-ewg at openib.org, openib-general at openib.org, Vladimir Sokolovsky , Chet Mehta/Austin/IBM at IBMUS Subject Re: [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs Hi, > The release is closed. We only updating the documents now (will be > closed in the coming few hours only). > Since ehca is in technology preview state these issues are not blockers. > Please document all issues in ehca release_notes (or send me parts you > want to include). Even though ehca is in technology preview state, there are certainly some ones who would like to use it. In current ofed the build process is incomplete for ehca user space. We did send the patches yesterday as discussed w/ Vladimir and Michael and considered to be on the category "2. Small updates for the install". It's very important for us that one can get ehca compiled and built with ofed. Since those patches do not touch any other components and we were one day before todays deadline, we're very surprised that you reject this code now. What should be the next steps? Regards Nam -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri Oct 20 14:14:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 20 Oct 2006 14:14:43 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <20061020182445.GA4054@obsidianresearch.com> (Jason Gunthorpe's message of "Fri, 20 Oct 2006 12:24:45 -0600") References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> Message-ID: > The posted/non-posted write stuff in the spec really only means that a > split completion is generated for that transaction on the bus. There > is no bus-level requirement that the bus halt while an outstanding > split is pending. In fact, the PCI-X ordering rules in this case > actually would allow your config read and memory read to be re-ordered > by a the bridge (table 8-3). ``Split requests are permitted to be > blocked by or pass other split requests.'' > > Most implementations block the CPU on a non-posted write which > provides the necessary serialization, but Altix clearly didn't.. Thanks, I think I get it now. It does seem like this behavior skirts right inside the boundary of what the PCI spec allows. What was confusing me was the section: Non-posted transactions reach their ultimate destination before completing at the originating device. The master cannot proceed with any other work until the transaction has completed at the ultimate destination (if a dependency exists). I just saw "the master cannot proceed" but the first few times I read this, I didn't see the "(if a dependency exists)." Since no dependency exists between the config write and the subsequent memory read, the master _can_ proceed in this case. So I do agree that this patch looks correct and needed, although the Altix behavior is somewhat unusual. Thanks, Roland From jgunthorpe at obsidianresearch.com Fri Oct 20 14:49:21 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 20 Oct 2006 15:49:21 -0600 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> Message-ID: <20061020214921.GB4054@obsidianresearch.com> On Fri, Oct 20, 2006 at 02:14:43PM -0700, Roland Dreier wrote: > I just saw "the master cannot proceed" but the first few times I read > this, I didn't see the "(if a dependency exists)." Since no Yes, the languange in PCI-X is actually a bit clearer (pg 80): ``As in conventional PCI, if a requester requires one non-posted transaction to complete before another, it must not initiate the second transcation until the first one completes.'' Where completion should be understood to mean that the split completion is observed by the requestor. I just took a quick look at asm-ia64/io.h and there is __ia64_mf_a barriers after all non-posted IO operations (ib/outb). config write and config read transcations have identical rules to IO transactions at the PCI bus level. I'm going to go out on a limb here and say that if Linux code assumes strong ordering of IO operations then it makes sense to also assume strong ordering on config writes. So, instead of patching mthca with this barrier it should go in the Altix config access mechanism.. >From io.h: /* * For the in/out routines, we need to do "mf.a" _after_ doing the I/O access to ensure * that the access has completed before executing other I/O accesses. Since we're doing * the accesses through an uncachable (UC) translation, the CPU will execute them in * program order. However, we still need to tell the compiler not to shuffle them around * during optimization, which is why we use "volatile" pointers. */ Does that make sense? Jason From dledford at redhat.com Fri Oct 20 15:17:16 2006 From: dledford at redhat.com (Doug Ledford) Date: Fri, 20 Oct 2006 18:17:16 -0400 Subject: [openib-general] [openfabrics-ewg] RHEL5 and OFED ... In-Reply-To: <4539190F.1000209@pathscale.com> References: <1161155330.2917.511.camel@fc6.xsintricity.com> <20061018072904.GA26507@mellanox.co.il> <1161177058.2917.513.camel@fc6.xsintricity.com> <20061019050907.GA1547@mellanox.co.il> <1161355344.2917.609.camel@fc6.xsintricity.com> <4539190F.1000209@pathscale.com> Message-ID: <1161382637.2917.620.camel@fc6.xsintricity.com> On Fri, 2006-10-20 at 11:44 -0700, Robert Walsh wrote: > > Of note is that both ipath and ipoib modules required minor tweaks to > > work with the inode-diet patch set which could effect their operation (I > > suspect the patch is correct, but haven't tested it). > > Can you send a patch for this, please? The attached patch addresses an update to our iscsi lib that required a one line change, and the update for the inode diet patch. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: openib-update.patch Type: text/x-patch Size: 3251 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mshefty at ichips.intel.com Fri Oct 20 15:22:54 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 20 Oct 2006 15:22:54 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> References: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> Message-ID: <45394C3E.4020404@ichips.intel.com> Based on the feedback, I've made the following changes to this patch: * Remove the multicast APIs from ib_sa.h. Multicast definitions are relocated from ib_sa to the ib_multicast module. (This provides cleaner encapsulation of the multicast services.) * Added support for non-equal MTU, rate, and packet lifetime selectors. * Added documentation around status values returned in the callback. * Added client register/unregister routines. Please let me know if I missed anything, or if there's disagreement with these changes (like moving the multicast definitions). I should have updated patches next week. - Sean From johnip at sgi.com Fri Oct 20 15:24:06 2006 From: johnip at sgi.com (John Partridge) Date: Fri, 20 Oct 2006 17:24:06 -0500 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> Message-ID: <45394C86.2030708@sgi.com> Roland, I checked with our chief hardware engineer and here's what he had to say :- Yes, config writes are non-posted, but, no, it should not block everything. From the 1.0 version of the PCI-X spec, sec 8.4.4 Transaction Ordering and Passing Rules for Bridges, p 153 "Split Requests are permitted to be blocked by or to pass other Split Requests. (These Split Transactions in PCI-X have the same requirements as Delayed Transactions in conventional PCI.)" (Appendix E of the version 2.3 PCI spec says the same thing about Delayed Transactions) And on p151 of the PCI-X spec: "If an initiator requires two Split Transactions to complete in order, the initiator must not issue the second request until the first Split Transaction completes." So in the failing example, the bridge chip is issuing the 2nd Split Transaction (the read) before the 1st (the config write) completes. From my hardware bias, it seems like the driver should be in charge of determining whether or not there's an ordering dependence and take the appropriate action. Otherwise, the hardware has no way of knowing whether or not there's an order requirement and will have to serialize everything. That's a negative impact on performance and kind of renders the spec-allowed option of passing or not passing meaningless. Roland Dreier wrote: > > The posted/non-posted write stuff in the spec really only means that a > > split completion is generated for that transaction on the bus. There > > is no bus-level requirement that the bus halt while an outstanding > > split is pending. In fact, the PCI-X ordering rules in this case > > actually would allow your config read and memory read to be re-ordered > > by a the bridge (table 8-3). ``Split requests are permitted to be > > blocked by or pass other split requests.'' > > > > Most implementations block the CPU on a non-posted write which > > provides the necessary serialization, but Altix clearly didn't.. > > Thanks, I think I get it now. It does seem like this behavior skirts > right inside the boundary of what the PCI spec allows. What was > confusing me was the section: > > Non-posted transactions reach their ultimate destination before > completing at the originating device. The master cannot proceed > with any other work until the transaction has completed at the > ultimate destination (if a dependency exists). > > I just saw "the master cannot proceed" but the first few times I read > this, I didn't see the "(if a dependency exists)." Since no > dependency exists between the config write and the subsequent memory > read, the master _can_ proceed in this case. So I do agree that this > patch looks correct and needed, although the Altix behavior is > somewhat unusual. So, do you need anything alse from me like an attached copy of the patch ? Thanks John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From mshefty at ichips.intel.com Fri Oct 20 15:45:46 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 20 Oct 2006 15:45:46 -0700 Subject: [openib-general] [RFC] Notice/InformInfo event reporting In-Reply-To: <4533DD81.9030502@ichips.intel.com> References: <4533DD81.9030502@ichips.intel.com> Message-ID: <4539519A.5000109@ichips.intel.com> Based on Todd's feedback, I'm using the following approach: Add a new module, ib_notice, that tracks InformInfo registration requests. Users register to receive Notices for a given trap number. Clients are automatically re-registered if needed by the ib_notice module, but do receive an error notification when re-registration occurs. The ib_notice module uses the ib_sa module for subscriptions, but registers with the MAD layer to receive notices. InformInfo and Notice definitions are encapsulated in the ib_notice module. - Sean From dledford at redhat.com Fri Oct 20 22:53:01 2006 From: dledford at redhat.com (Doug Ledford) Date: Sat, 21 Oct 2006 01:53:01 -0400 Subject: [openib-general] OFED 1.1 - Official Release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> Message-ID: <1161409981.2917.656.camel@fc6.xsintricity.com> On Fri, 2006-10-20 at 12:25 +0200, Tziporet Koren wrote: > I am happy to announce that OFED 1.1 Official Release is now available. > > The release can be found under: > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > And later today it will be on the OpenFabrics download page: http://www.openfabrics.org/downloads.html. > > This release was done in a joint effort of the following companies: > * Cisco > * SilverStorm > * Voltaire > * QLogic > * Intel > * IBM > * Mellanox Technologies > > I wish to thank all who contributed to the success of this release. For those interested, I've made RHEL5 packages of this release available on my web site in my signature (kernel, openib, openmpi, and mpitests are done, missing is mvapich and ibutils). There are a few differences relative to the official OFED 1.1 release, those are noted below. > Tziporet > =============================================================================== > > Release summary: > ================ > The OFED software package is composed of several software modules intended for use on a computer > cluster constructed as an InfiniBand network. > > OFED package contains the following components: > =============================================== > o OpenFabrics core and ULPs: > - HCA drivers (mthca, ipath, ehca) > - core > - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host and uDAPL > o OpenFabrics utilities: > - OpenSM: InfiniBand Subnet Manager > - Diagnostic tools > - Performance tests > o MPI: > - OSU MPI stack supporting the InfiniBand interface The OSU mvapich stack is not available at this time. > - Open MPI stack supporting the InfiniBand interface > - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, Presta) > o Sources of all software modules (under conditions mentioned in the modules' > LICENSE files) > o Documentation > > Notes: > 1. SDP is in beta quality. > 2. ehca driver is in technology preview state. > 3. All other OFED components are of production quality. > > Supported Platforms and Operating Systems > ========================================= > CPU architectures: > * x86_64 > * x86 > * ia64 > * ppc64 > > Linux Operating Systems: > - RedHat EL4 up3: 2.6.9-34.ELsmp > - RedHat EL4 up4: 2.6.9-42.ELsmp > - SLES9 SP3: 2.6.5-7.244-smp > - SLES10: 2.6.16.21-0.8-smp > - kernel.org: 2.6.17.x and 2.6.18.x Currently, only the RHEL5 beta update packages are available. RHEL4 packages will be available soon. > > HCAs Supported > ============== > Mellanox HCAs: > - InfiniHost > - InfiniHost III Ex (both modes: with memory and MemFree) > - InfiniHost III Lx > Both SDR and DDR mode of the InfiniHost III family are supported. > > For official FW versions please see: > http://www.mellanox.com/support/firmware_table.php > > Qlogic HCAs: > - QHT6040 (PathScale InfiniPath HT-460) > - QHT6140 (PathScale InfiniPath HT-465) > - QLE6140 (PathScale InfiniPath PE-880) > > IBM HCAs: > - GX Dual-port 4x IB HCA > - GX Dual-port 12x IB HCA > > > Switches Supported > This release was tested with switches and gateways provided by the following companies: > - Cisco > - Voltaire > - SilverStorm > - Flextronics > > Third Party Packages > ==================== > The following third party packages have been tested with OFED 1.1: > 1. Intel MPI, Version 2.0.1 - refresh, and Version 3.0 > 2. HP MPI Neither of these packages were tested with the Red Hat specific RPMs. > OFED Sources: > ============= > Source repositories: > Kernel: git://www.mellanox.co.il/~git/infiniband ofed_1_1 > User: https://openib.org/svn/gen2/branches/1.1/src/userspace Source rpms are available along side the binary rpms. > Main changed from OFED 1.0: > ============================ > - Kernel code based on 2.6.18 > - High Availability in IPoIB and SRP (beta) The ipoibha ability is not tested and known not to work until the changes to arping are completed. The SRP HA code is totally untested in the RHEL5 packages and is shipped as it is found in the OFED 1.1 release. > - RDS was removed for the OFED package (to be added in future releases) > - IBM low level driver (ehca) was added > - MPI: > - OSU MVAPICH: Message coalescing > - Open MPI: Version was updated to v1.1.1 > - MPI tests: Updated to latest versions from LLNL, Intel and OSU > - Management: Added utilities and enhanced tools > - Full support for ppc64 libraries (32 and 64 bits) > > > See the attached are the release notes for details Of special note are the following differences between the OFED 1.1 release and the RHEL5 specific rpms: - The ibutils package has not been rebuilt since the OFED 1.0 stack was introduced into RHEL5 (compile problems not yet sorted out) - The ipoibha facility, when ready, has it's own init script that enables the service after the network interfaces are brought up - The RHEL5 scripts make no attempt to create ifcfg-ib? network device configuration files during installation, the user is responsible for creating any network device configuration files they need - Most configuration files are stored in /etc/ofed for ease in keeping the files together and organized - The openmpi rpms are designed to coexist with the lam rpms and any other mpi rpms that might be made available on RHEL5 via use of the alternatives service (man alternatives for more information) - All of the various -devel packages support multilib installation where you may select amongst the available arch options on multilib arches (x86_64 supports x86_64/i386, ppc64 support ppc/ppc64, ia64 supports ia64/i386) - For mpi compilation with openmpi, you may use the system wide default compiler as linked to /usr/bin/mpicc, or you may use a specific mode compiler by using /usr/share/openmpi/bin{32,64}/mpicc as the program file - For users not familiar with the specifics of multilib installations on FC6 and later Red Hat distributions, rpm is designed to install both the x86_64 and i386 version of packages whenever multilib operation is desired, and in the event that both an x86_64 and i386 binary conflict between the two rpms, it will automatically install the best binary. This means that, for example, installing all of the i386 and x86_64 rpms (openmpi, openmpi-libs, and openmpi-devel) on an x86_64 machine will result in the complete devel environment for both i386 and x86_64, the complete set of libs for i386 and x86_64, and the x86_64 binaries in /usr/bin. Standard practice is to install both i386 and x86_64 versions of all packages that have a -devel subpackage. Those packages that don't have a -devel subpackage normally only have the preferred arch installed. For example, since libibverbs has a -devel package, the standard setup would include both the i386/x86_64 libibverbs and libibverbs-devel. However, as libibverbs-utils has no devel component, the default would be to only install the x86_64 version. Normally this would all be handled by anaconda during the install process, but should anyone wish to test, knowing which packages to install to duplicate a correct installation by anaconda will help to eliminate needless confusion. Preference is for x86_64 over i386 and ppc over ppc64 where both exist and only one is needed. It should be fine to install both the i386 and x86_64 versions of *all* of the packages, but it may result in a few spurious "file created as " messages. -- Doug Ledford GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From halr at voltaire.com Sat Oct 21 06:44:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Oct 2006 09:44:35 -0400 Subject: [openib-general] [PATCH 1/2] Diags/libibmad: Support optional PortExtendedCounters Message-ID: <1161438266.25985.181188.camel@hal.voltaire.com> Diags/libibmad: Support optional PortExtendedCounters Signed-off-by: Hal Rosenstock Index: include/infiniband/mad.h =================================================================== --- include/infiniband/mad.h (revision 9921) +++ include/infiniband/mad.h (working copy) @@ -502,6 +502,22 @@ enum MAD_FIELDS { IB_VEND2_OUI_F, IB_VEND2_DATA_F, + /* + * PortCountersExtended + */ + IB_PC_EXT_FIRST_F, + IB_PC_EXT_PORT_SELECT_F = IB_PC_EXT_FIRST_F, + IB_PC_EXT_COUNTER_SELECT_F, + IB_PC_EXT_XMT_BYTES_F, + IB_PC_EXT_RCV_BYTES_F, + IB_PC_EXT_XMT_PKTS_F, + IB_PC_EXT_RCV_PKTS_F, + IB_PC_EXT_XMT_UPKTS_F, + IB_PC_EXT_RCV_UPKTS_F, + IB_PC_EXT_XMT_MPKTS_F, + IB_PC_EXT_RCV_MPKTS_F, + IB_PC_EXT_LAST_F, + IB_FIELD_LAST_ /* must be last */ }; @@ -782,6 +798,10 @@ uint8_t *port_performance_query(void *rc uint timeout); uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, uint mask, uint timeout); +uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, + uint timeout); +uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, + uint mask, uint timeout); uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, uint timeout); uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, @@ -800,7 +820,7 @@ ib_mad_dump_fn mad_dump_node_type, mad_dump_sltovl, mad_dump_vlarbitration, mad_dump_nodedesc, mad_dump_nodeinfo, mad_dump_portinfo, mad_dump_switchinfo, - mad_dump_perfcounters; + mad_dump_perfcounters, mad_dump_perfcounters_ext; int _mad_dump(ib_mad_dump_fn *fn, char *name, void *val, int valsz); char * _mad_dump_field(ib_field_t *f, char *name, char *buf, int bufsz, Index: src/gs.c =================================================================== --- src/gs.c (revision 9827) +++ src/gs.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -67,6 +67,7 @@ pma_query(void *rcvbuf, ib_portid_t *des rpc.method = IB_MAD_METHOD_GET; rpc.attr.id = id; + /* Same for attribute IDs */ mad_set_field(rcvbuf, 0, IB_PC_PORT_SELECT_F, port); rpc.attr.mod = 0; rpc.timeout = timeout; @@ -92,9 +93,9 @@ port_performance_query(void *rcvbuf, ib_ return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_COUNTERS); } -uint8_t * -port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, uint mask, - uint timeout) +static uint8_t * +performance_reset(void *rcvbuf, ib_portid_t *dest, int port, uint mask, + uint timeout, uint id) { ib_rpc_t rpc = {0}; int lid = dest->lid; @@ -111,17 +112,17 @@ port_performance_reset(void *rcvbuf, ib_ rpc.mgtclass = IB_PERFORMANCE_CLASS; rpc.method = IB_MAD_METHOD_SET; - rpc.attr.id = IB_GSI_PORT_COUNTERS; + rpc.attr.id = id; memset(rcvbuf, 0, IB_MAD_SIZE); - + + /* Same for attribute IDs */ mad_set_field(rcvbuf, 0, IB_PC_PORT_SELECT_F, port); mad_set_field(rcvbuf, 0, IB_PC_COUNTER_SELECT_F, mask); rpc.attr.mod = 0; rpc.timeout = timeout; rpc.datasz = IB_PC_DATA_SZ; rpc.dataoffs = IB_PC_DATA_OFFS; - dest->qp = 1; if (!dest->qkey) dest->qkey = IB_DEFAULT_QP1_QKEY; @@ -130,6 +131,26 @@ port_performance_reset(void *rcvbuf, ib_ } uint8_t * +port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, uint mask, + uint timeout) +{ + return performance_reset(rcvbuf, dest, port, mask, timeout, IB_GSI_PORT_COUNTERS); +} + +uint8_t * +port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, uint timeout) +{ + return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_COUNTERS_EXT); +} + +uint8_t * +port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, uint mask, + uint timeout) +{ + return performance_reset(rcvbuf, dest, port, mask, timeout, IB_GSI_PORT_COUNTERS_EXT); +} + +uint8_t * port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, uint timeout) { return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_SAMPLES_CONTROL); Index: src/fields.c =================================================================== --- src/fields.c (revision 9827) +++ src/fields.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -340,6 +340,20 @@ ib_field_t ib_mad_f [] = { [IB_VEND2_OUI_F] {BE_OFFS(36*8, 24), "OUI", mad_dump_array}, [IB_VEND2_DATA_F] {40*8, (256-40)*8, "Vendor2Data", mad_dump_array}, + /* + * Extended port counters + */ + [IB_PC_EXT_PORT_SELECT_F] {BITSOFFS(8, 8), "PortSelect", mad_dump_uint}, + [IB_PC_EXT_COUNTER_SELECT_F] {BITSOFFS(16, 16), "CounterSelect", mad_dump_hex}, + [IB_PC_EXT_XMT_BYTES_F] {64, 64, "PortXmitData", mad_dump_uint}, + [IB_PC_EXT_RCV_BYTES_F] {128, 64, "PortRcvData", mad_dump_uint}, + [IB_PC_EXT_XMT_PKTS_F] {192, 64, "PortXmitPkts", mad_dump_uint}, + [IB_PC_EXT_RCV_PKTS_F] {256, 64, "PortRcvPkts", mad_dump_uint}, + [IB_PC_EXT_XMT_UPKTS_F] {320, 64, "PortUnicastXmitPkts", mad_dump_uint}, + [IB_PC_EXT_RCV_UPKTS_F] {384, 64, "PortUnicastRcvPkts", mad_dump_uint}, + [IB_PC_EXT_XMT_MPKTS_F] {448, 64, "PortMulticastXmitPkts", mad_dump_uint}, + [IB_PC_EXT_RCV_MPKTS_F] {512, 64, "PortMulticastPkts", mad_dump_uint}, + }; void Index: src/dump.c =================================================================== --- src/dump.c (revision 9827) +++ src/dump.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -771,6 +771,12 @@ mad_dump_perfcounters(char *buf, int buf _dump_fields(buf, bufsz, val, IB_PC_FIRST_F, IB_PC_LAST_F); } +void +mad_dump_perfcounters_ext(char *buf, int bufsz, void *val, int valsz) +{ + _dump_fields(buf, bufsz, val, IB_PC_EXT_FIRST_F, IB_PC_EXT_LAST_F); +} + /************************/ char * Index: libibmad.ver =================================================================== --- libibmad.ver (revision 9921) +++ libibmad.ver (working copy) @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=2:0:1 +LIBVERSION=3:0:2 Index: src/libibmad.map =================================================================== --- src/libibmad.map (revision 9827) +++ src/libibmad.map (working copy) @@ -1,4 +1,4 @@ -IBMAD_1.2 { +IBMAD_1.3 { global: _mad_dump; _mad_dump_field; @@ -21,6 +21,7 @@ IBMAD_1.2 { mad_dump_nodeinfo; mad_dump_opervls; mad_dump_perfcounters; + mad_dump_perfcounters_ext; mad_dump_physportstate; mad_dump_portcapmask; mad_dump_portinfo; @@ -43,6 +44,8 @@ IBMAD_1.2 { perf_classportinfo_query; port_performance_query; port_performance_reset; + port_performance_ext_query; + port_performance_ext_reset; port_samples_control_query; port_samples_result_query; mad_build_pkt; From halr at voltaire.com Sat Oct 21 06:44:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Oct 2006 09:44:58 -0400 Subject: [openib-general] [PATCH 2/2] Diags/perfquery: Add support for PortExtendedCounters Message-ID: <1161438267.25985.181189.camel@hal.voltaire.com> Diags/perfquery: Add support for PortExtendedCounters Note that this patch requires the associated libibmad patch Signed-off-by: Hal Rosenstock Index: src/perfquery.c =================================================================== --- src/perfquery.c (revision 9827) +++ src/perfquery.c (working copy) @@ -42,7 +42,7 @@ #include #include -#define __BUILD_VERSION_TAG__ 1.1 +#define __BUILD_VERSION_TAG__ 1.2 #include #include #include @@ -89,9 +89,12 @@ usage(void) fprintf(stderr, "\tExamples:\n"); fprintf(stderr, "\t\t%s\t\t# read local port's performance counters\n", basename); fprintf(stderr, "\t\t%s 32 1\t\t# read performance counters from lid 32, port 1\n", basename); + fprintf(stderr, "\t\t%s -e 32 1\t# read extended performance counters from lid 32, port 1\n", basename); fprintf(stderr, "\t\t%s -a 32\t\t# read performance counters from lid 32, all ports\n", basename); fprintf(stderr, "\t\t%s -r 32 1\t# read performance counters and reset\n", basename); + fprintf(stderr, "\t\t%s -e -r 32 1\t# read extended performance counters and reset\n", basename); fprintf(stderr, "\t\t%s -R 0x20 1\t# reset performance counters of port 1 only\n", basename); + fprintf(stderr, "\t\t%s -e -R 0x20 1\t# reset extended performance counters of port 1 only\n", basename); fprintf(stderr, "\t\t%s -R -a 32\t# reset performance counters of all ports\n", basename); fprintf(stderr, "\t\t%s -R 32 2 0x0fff\t# reset only error counters of port 2\n", basename); fprintf(stderr, "\t\t%s -R 32 2 0xf000\t# reset only non-error counters of port 2\n", basename); @@ -114,13 +117,15 @@ main(int argc, char **argv) int udebug = 0; char *ca = 0; int ca_port = 0; + int extended = 0; - static char const str_opts[] = "C:P:s:t:dGarRVhu"; + static char const str_opts[] = "C:P:s:t:dGearRVhu"; static const struct option long_opts[] = { { "C", 1, 0, 'C'}, { "P", 1, 0, 'P'}, { "debug", 0, 0, 'd'}, { "Guid", 0, 0, 'G'}, + { "extended", 0, 0, 'e'}, { "all_ports", 0, 0, 'a'}, { "reset_after_read", 0, 0, 'r'}, { "Reset_only", 0, 0, 'R'}, @@ -145,6 +150,9 @@ main(int argc, char **argv) case 'P': ca_port = strtoul(optarg, 0, 0); break; + case 'e': + extended = 1; + break; case 'a': all++; port = 0xff; @@ -205,18 +213,30 @@ main(int argc, char **argv) if (!perf_classportinfo_query(pc, &portid, port, timeout)) IBERROR("classportinfo query"); - if (!port_performance_query(pc, &portid, port, timeout)) - IBERROR("perfquery"); + if (extended != 1) { + if (!port_performance_query(pc, &portid, port, timeout)) + IBERROR("perfquery"); + + mad_dump_perfcounters(buf, sizeof buf, pc, sizeof pc); + } else { + if (!port_performance_ext_query(pc, &portid, port, timeout)) + IBERROR("perfextquery"); - mad_dump_perfcounters(buf, sizeof buf, pc, sizeof pc); + mad_dump_perfcounters_ext(buf, sizeof buf, pc, sizeof pc); + } printf("# Port counters: %s port %d\n%s", portid2str(&portid), port, buf); if (!reset) exit(0); do_reset: - if (!port_performance_reset(pc, &portid, port, mask, timeout)) - IBERROR("perf reset"); + if (extended != 1) { + if (!port_performance_reset(pc, &portid, port, mask, timeout)) + IBERROR("perf reset"); + } else { + if (!port_performance_ext_reset(pc, &portid, port, mask, timeout)) + IBERROR("perf ext reset"); + } exit(0); } Index: man/perfquery.8 =================================================================== --- man/perfquery.8 (revision 9827) +++ man/perfquery.8 (working copy) @@ -1,25 +1,30 @@ -.TH PERFQUERY 8 "October 9, 2006" "OpenIB" "OpenIB Diagnostics" +.TH PERFQUERY 8 "October 20, 2006" "OpenIB" "OpenIB Diagnostics" .SH NAME perfquery \- query InfiniBand port counters .SH SYNOPSIS .B perfquery -[\-d(ebug)] [\-G(uid)] [-a(ll_ports)] [-r(eset_after_read)] [-R(eset_only)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\-h(elp)] [ [[port] [reset_mask]]] +[\-d(ebug)] [\-G(uid)] [-e(xtended)] [-a(ll_ports)] [-r(eset_after_read)] [-R(eset_only)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\-h(elp)] [ [[port] [reset_mask]]] .SH DESCRIPTION .PP perfquery uses PerfMgt GMPs to obtain the PortCounters (basic performance -and error counters) from the PMA at the node/port specified. Optionally -show aggregated counters for all ports of node. Also, optionally, reset -after read, or only reset counters. +and error counters) or PortExtendedCounters from the PMA at the node/port +specified. Optionally shows aggregated counters for all ports of node. +Also, optionally, reset after read, or only reset counters. .SH OPTIONS .PP .TP +\fB\-e\fR, \fB\-\-extended\fR +show extended port counters rather than (basic) port counters. +Note that extended port counters attribute is optional. +.TP \fB\-a\fR, \fB\-\-all_ports\fR -show aggregated counters for all ports of the destination lid +show aggregated counters for all ports of the destination lid. +Note all ports support is optional. .TP \fB\-r\fR, \fB\-\-reset_after_read\fR reset counters after read @@ -88,12 +93,18 @@ perfquery # read local por .PP perfquery 32 1 # read performance counters from lid 32, port 1 .PP +perfquery -e 32 1 # read extended performance counters from lid 32, port 1 +.PP perfquery -a 32 # read perf counters from lid 32, all ports .PP perfquery -r 32 1 # read performance counters and reset .PP +perfquery -e -r 32 1 # read extended performance counters and reset +.PP perfquery -R 0x20 1 # reset performance counters of port 1 only .PP +perfquery -e -R 0x20 1 # reset extended performance counters of port 1 only +.PP perfquery -R -a 32 # reset performance counters of all ports .PP perfquery -R 32 2 0x0fff # reset only error counters of port 2 Index: configure.in =================================================================== --- configure.in (revision 9935) +++ configure.in (working copy) @@ -35,6 +35,8 @@ AC_CHECK_LIB(ibumad, umad_init, [], AC_MSG_ERROR([umad_init() not found. diags require libibumad.])) AC_CHECK_LIB(ibmad, mad_dump_int, [], AC_MSG_ERROR([mad_dump_int() not found. diags require libibmad.])) +AC_CHECK_LIB(ibmad, port_performance_ext_query, [], + AC_MSG_ERROR([port_performance_ext_query() not found. diags require more recent libibmad.])) fi dnl Checks for header files. From mst at mellanox.co.il Sat Oct 21 12:26:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 21 Oct 2006 21:26:29 +0200 Subject: [openib-general] [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() In-Reply-To: References: <20061019202144.GC2674@mellanox.co.il> Message-ID: <20061021192629.GA30649@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: [PATCH/RFC 1/2] IB: Return "maybe_missed_event" hint from ib_req_notify_cq() > > "Michael S. Tsirkin" wrote on 10/19/2006 01:21:45 PM: > > > Please also note that due to factors such as TCP window limits, TX on a single > > socket is often stalled. To really stress a connection and see benefit from > > NAPI you should be running multiple socket streams in parallel: > > either just run multiple instances of netperf/netserver, or use iperf with -P flag. > > I used to get 7600Mb/s IPoIB one socket duplex throughput with my other IPoIB patches on 2.6.5 kernel under certain configuration. Which makes me believe we could gain close to link throughput with one UD QP. Now I couldn't get it anymore on the new kernel. I was struggling with TCP window limits on the new kernel. Do you have any hint? Could be the stretch ACK fix - newer kernels are sending much more ACKs than 2.6.5. Without NAPI, this means we have more interrupts -> lower throughput. -- MST From mst at mellanox.co.il Sat Oct 21 12:32:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 21 Oct 2006 21:32:28 +0200 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <45394C3E.4020404@ichips.intel.com> References: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <45394C3E.4020404@ichips.intel.com> Message-ID: <20061021193228.GB30649@mellanox.co.il> Quoting r. Sean Hefty : > * Added client register/unregister routines. BTW, we want this stuff for CM an CMA as well, don't we? Could be nice to have that module unloading race in CM and CMA closed for 2.6.19. -- MST From rdreier at cisco.com Sat Oct 21 14:33:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 21 Oct 2006 14:33:12 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: <45394C3E.4020404@ichips.intel.com> (Sean Hefty's message of "Fri, 20 Oct 2006 15:22:54 -0700") References: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <45394C3E.4020404@ichips.intel.com> Message-ID: > * Remove the multicast APIs from ib_sa.h. Multicast definitions are > * relocated from ib_sa to the ib_multicast module. (This provides > * cleaner encapsulation of the multicast services.) I'm not sure about this -- does this lead to duplication of code like keeping track of outstanding requests? Or are you exporting some really low-level interface from ib_sa? Maybe the best thing to do is put the higher level multicast handling into the ib_sa module (and not export the current lower level multicast APIs any more). EXPORT_SYMBOL isn't totally free and if we're exporting really internal stuff for one other user (I'm guessing that you might be building on top of the sa_query.c::send_mad() level stuff), then we might as well just combine the multicast and SA modules into a single .ko (even if there are multiple .c files). Maybe ib_notice should just go into ib_sa as well. ...all of this is just based on speculating what you mean though. So more details would be in order at least. - R. From cppbala at yahoo.com Sat Oct 21 20:32:04 2006 From: cppbala at yahoo.com (Bala) Date: Sat, 21 Oct 2006 20:32:04 -0700 (PDT) Subject: [openib-general] Is there a way to recover Mellanox HCA card from wrong Firmware Message-ID: <20061022033204.99145.qmail@web35108.mail.mud.yahoo.com> Hi All, while updating the firmware in a set of nodes, accidentlly updated one of the Mellanox PCI-X card with "fw-25218-5_1_400-MHEL-CF128-T-MEMFREE.bin.zip" (PCI-Express memfree FW from Mellanox site) using tvflash utility. After rebooting the machine, now the OS is not even detecting the HCA card. 1. Is there anyway to flash with correct firmware?? 2. or Is there anyway to reset the EPROM?? It will be very helpful, if you could offer some way out from this issue?? thanks, -bala- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Sat Oct 21 23:03:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 22 Oct 2006 08:03:01 +0200 Subject: [openib-general] Is there a way to recover Mellanox HCA card from wrong Firmware In-Reply-To: <20061022033204.99145.qmail@web35108.mail.mud.yahoo.com> References: <20061022033204.99145.qmail@web35108.mail.mud.yahoo.com> Message-ID: <20061022060301.GA1980@mellanox.co.il> Quoting r. Bala : > Subject: Is there a way to recover Mellanox HCA card from wrong Firmware > > Hi All, > while updating the firmware in a set of > nodes, accidentlly updated one of the Mellanox > PCI-X card with > "fw-25218-5_1_400-MHEL-CF128-T-MEMFREE.bin.zip" > (PCI-Express memfree FW from Mellanox site) using > tvflash > utility. > > After rebooting the machine, now the OS is > not even detecting the HCA card. > > 1. Is there anyway to flash with correct firmware?? > > 2. or Is there anyway to reset the EPROM?? > > It will be very helpful, if you could offer some way > out from this issue?? Most cards have a single jumper to disable the flash. If you set it, the card will boot in flash recovery mode which will make it possible for you to re-flash with correct firmware. -- MST From kliteyn at dev.mellanox.co.il Sun Oct 22 00:24:05 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 22 Oct 2006 09:24:05 +0200 Subject: [openib-general] [PATCH][TRIVIAL] osmtest/osmt_multicast.c: Eliminate superfluous clearing of some variables In-Reply-To: <1161337478.25985.113880.camel@hal.voltaire.com> References: <1161337478.25985.113880.camel@hal.voltaire.com> Message-ID: <453B1C95.6020403@dev.mellanox.co.il> Looks good, thanks. -- Yevgeny Hal Rosenstock wrote: > osmtest/osmt_multicast.c: Eliminate superfluous clearing of some > variables > > Also, some other cosmetic changes > > Signed-off-by: Hal Rosenstock > > Index: osmtest/osmt_multicast.c > =================================================================== > --- osmtest/osmt_multicast.c (revision 9918) > +++ osmtest/osmt_multicast.c (working copy) > @@ -84,8 +84,7 @@ __osmt_print_all_multicast_records( > req.pfn_query_cb = osmtest_query_res_cb; > req.p_query_input = &user; > > - /* UnTrusted - get the multicast groups */ > - req.sm_key = 0; > + /* UnTrusted (SMKey of 0) - get the multicast groups */ > status = osmv_query_sa(p_osmt->h_bind, &req); > > if (status != IB_SUCCESS || context.result.status != IB_SUCCESS) > @@ -187,7 +186,6 @@ osmt_query_mcast( IN osmtest_t * const p > context.p_osmt = p_osmt; > user.attr_id = IB_MAD_ATTR_MCMEMBER_RECORD; > user.attr_offset = ib_get_attr_offset( sizeof( ib_member_rec_t ) ); > - user.comp_mask = 0; > > req.query_type = OSMV_QUERY_USER_DEFINED; > req.timeout_ms = p_osmt->opt.transaction_timeout; > @@ -196,7 +194,6 @@ osmt_query_mcast( IN osmtest_t * const p > req.query_context = &context; > req.pfn_query_cb = osmtest_query_res_cb; > req.p_query_input = &user; > - req.sm_key = 0; > > status = osmv_query_sa( p_osmt->h_bind, &req ); > > @@ -325,7 +322,7 @@ osmt_send_mcast_request( IN osmtest_t * > memset( &req, 0, sizeof( req ) ); > memset( &user, 0, sizeof( user ) ); > memset( &context, 0, sizeof( context ) ); > - memset( p_res, 0, sizeof(ib_sa_mad_t ) ); > + memset( p_res, 0, sizeof( ib_sa_mad_t ) ); > > context.p_osmt = p_osmt; > > @@ -371,7 +368,6 @@ osmt_send_mcast_request( IN osmtest_t * > req.query_context = &context; > req.pfn_query_cb = osmtest_query_res_cb; > req.p_query_input = &user; > - req.sm_key = 0; > > status = osmv_query_sa( p_osmt->h_bind, &req ); > > @@ -417,7 +413,6 @@ osmt_send_mcast_request( IN osmtest_t * > > OSM_LOG_EXIT( &p_osmt->log ); > return ( status ); > - > } > > /********************************************************************** > @@ -3251,7 +3246,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons > if (p_osmt->opt.mmode > 2) > { > /* Check invalid Join with max mlid which is more than the > - Mellanox switches support 0xC000+0x1000 = 0xd000 */ > + Mellanox switches support 0xC000+0x1000 = 0xd000 */ > osm_log( &p_osmt->log, OSM_LOG_INFO, > "osmt_run_mcast_flow: " > "Checking Creation of Maximum avaliable Groups (MulticastFDBCap)...\n" > @@ -3260,6 +3255,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons > > while (tmp_mlid > 0 && !ReachedMlidLimit) { > uint16_t cur_mlid = 0; > + > /* Request Set */ > ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); > /* Good Flow - mgid is 0 while giving all required fields for > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From erezz at voltaire.com Sun Oct 22 01:28:38 2006 From: erezz at voltaire.com (Erez Zilber) Date: Sun, 22 Oct 2006 10:28:38 +0200 (IST) Subject: [openib-general] [PATCH] IB/iser: start conn after enabling iSER Message-ID: When a connection is started (a new connection or a recovered one), iSER should prepare its resources for full-featured mode and only then notify the iSCSI layer that it is ready to start queueing commands. Signed-off-by: Erez Zilber --- drivers/infiniband/ulp/iser/iscsi_iser.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index eb6f98d..9b2041e 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -363,11 +363,11 @@ iscsi_iser_conn_start(struct iscsi_cls_c struct iscsi_conn *conn = cls_conn->dd_data; int err; - err = iscsi_conn_start(cls_conn); + err = iser_conn_set_full_featured_mode(conn); if (err) return err; - return iser_conn_set_full_featured_mode(conn); + return iscsi_conn_start(cls_conn); } static struct iscsi_transport iscsi_iser_transport; -- 1.4.2 From kliteyn at dev.mellanox.co.il Sun Oct 22 01:53:40 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Sun, 22 Oct 2006 10:53:40 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <20061019212639.GA24600@sashak.voltaire.com> References: <20061019212639.GA24600@sashak.voltaire.com> Message-ID: <453B3194.7000702@dev.mellanox.co.il> Hi Sasha. One small comments: [snip] > osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > ... > osm_updn_find_root_nodes_by_min_hop( > ... > osm_subn_set_up_down_min_hop_table( > ... > osm_subn_calc_up_down_min_hop_table( > ... Please add the "__" prefix to the static function names. Thanks. -- Yevgeny Sasha Khapyorsky wrote: > This makes local functions static and moves definitions of locally used > types to .c file. > > Signed-off-by: Sasha Khapyorsky > --- > osm/include/opensm/osm_opensm.h | 1 - > osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- > osm/opensm/osm_ucast_updn.c | 81 +++++++- > 3 files changed, 70 insertions(+), 361 deletions(-) > > diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h > index cb216a4..5557dbd 100644 > --- a/osm/include/opensm/osm_opensm.h > +++ b/osm/include/opensm/osm_opensm.h > @@ -62,7 +62,6 @@ #include > #include > #include > #include > -#include > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h > index 4609e1b..c2a4376 100644 > --- a/osm/include/opensm/osm_ucast_updn.h > +++ b/osm/include/opensm/osm_ucast_updn.h > @@ -71,363 +71,14 @@ BEGIN_C_DECLS > /* ENUM TypeDefs */ > /* /////////////////////////// */ > > -/* > -* DESCRIPTION > -* This enum respresent available directions of arcs in the graph > -* SYNOPSIS > -*/ > -typedef enum _updn_switch_dir > -{ > - UP = 0, > - DOWN > -} updn_switch_dir_t; > - > -/* > - * TYPE DEFINITIONS > - * UP > - * Current switch direction in propogating the subnet is up > - * DOWN > - * Current switch direction in propogating the subnet is down > - * > - */ > - > -/* > -* DESCRIPTION > -* This enum respresent available states in the UPDN algorithm > -* SYNOPSIS > -*/ > -typedef enum _updn_state > -{ > - UPDN_INIT = 0, > - UPDN_RANK, > - UPDN_MIN_HOP_CALC, > -} updn_state_t; > - > -/* > - * TYPE DEFINITIONS > - * UPDN_INIT - loading the package but still not performing anything > - * UPDN_RANK - post ranking algorithm > - * UPDN_MIN_HOP_CALC - post min hop table calculation > - */ > - > /* ////////////////////////////////// */ > /* Struct TypeDefs */ > /* ///////////////////////////////// */ > > -/****s* UPDN: Rank element/updn_rank_t > -* NAME > -* updn_rank_t > -* > -* DESCRIPTION > -* This object represents a rank type element in a list > -* > -* The updn_rank_t object should be treated as opaque and should > -* be manipulated only through the provided functions. > -* > -* SYNOPSIS > -*/ > - > -typedef struct _updn_rank > -{ > - cl_map_item_t map_item; > - uint8_t rank; > -} updn_rank_t; > - > -/* > -* FIELDS > -* map_item > -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > -* > -* rank > -* Rank value of this node > -* > -*/ > - > -/****s* UPDN: Histogram element/updn_hist_t > -* NAME > -* updn_hist_t > -* > -* DESCRIPTION > -* This object represents a histogram type element in a list > -* > -* The updn_hist_t object should be treated as opaque and should > -* be manipulated only through the provided functions. > -* > -* SYNOPSIS > -*/ > - > -typedef struct _updn_hist > -{ > - cl_map_item_t map_item; > - uint32_t bar_value; > -} updn_hist_t; > - > -/* > -* FIELDS > -* map_item > -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > -* > -* bar_value > -* The number of occurences of the same hop value > -* > -*/ > - > -typedef struct _updn_next_step > -{ > - updn_switch_dir_t state; > - osm_switch_t *p_sw; > -} updn_next_step_t; > - > -/*****s* updn: updn/updn_input_t > -* NAME updn_t > -* > -* > -* DESCRIPTION > -* updn input fields structure. > -* > -* SYNOPSIS > -*/ > - > -typedef struct _updn_input > -{ > - uint32_t num_guids; > - uint64_t * guid_list; > -} updn_input_t; > - > -/* > -* FIELDS > -* num_guids > -* number of guids given at the UI > -* > -* guid_list > -* guids specified as an array (converted from a list given in the UI) > -* > -* > -* SEE ALSO > -* > -*********/ > - > -/*****s* updn: updn/updn_t > -* NAME updn_t > -* > -* > -* DESCRIPTION > -* updn structure. > -* > -* SYNOPSIS > -*/ > - > -typedef struct _updn > -{ > - updn_state_t state; > - boolean_t auto_detect_root_nodes; > - cl_qmap_t guid_rank_tbl; > - updn_input_t updn_ucast_reg_inputs; > - cl_list_t * p_root_nodes; > -} updn_t; > - > -/* > -* FIELDS > -* state > -* state of the updn algorithm which basically should pass through Init > -* - Ranking - UpDn algorithm > -* > -* guid_rank_tbl > -* guid 2 rank mapping vector , indexed by guid in network order > -* > -* > -* SEE ALSO > -* > -*********/ > - > /* ////////////////////////////// */ > /* Function */ > /* ////////////////////////////// */ > > -/***f** OpenSM: Updn/updn_construct > -* NAME > -* updn_construct > -* > -* DESCRIPTION > -* Allocation of updn_t struct > -* > -* SYNOPSIS > -*/ > - > -updn_t* > -updn_construct(void); > - > -/* > -* PARAMETERS > -* > -* > -* RETURN VALUE > -* Return a pointer to an updn struct. Null if fails to do so. > -* > -* NOTES > -* First step of the creation of updn_t > -*/ > - > -/****s* OpenSM: Updn/updn_destroy > -* NAME > -* updn_destroy > -* > -* DESCRIPTION > -* release of updn_t struct > -* > -* SYNOPSIS > -*/ > - > -void > -updn_destroy( > - IN updn_t* const p_updn ); > - > -/* > -* PARAMETERS > -* p_updn > -* A pointer to the updn_t struct that is goining to be released > -* > -* RETURN VALUE > -* > -* NOTES > -* Final step of the releasing of updn_t > -* > -* SEE ALSO > -* updn_construct > -*********/ > - > -/****f* OpenSM: Updn/updn_init > -* NAME > -* updn_init > -* > -* DESCRIPTION > -* Initialization of an updn_t struct > -* > -* SYNOPSIS > -*/ > -cl_status_t > -updn_init( > - IN updn_t* const p_updn ); > - > -/* > -* PARAMETERS > -* p_updn > -* A pointer to the updn_t struct that is goining to be initilized > -* > -* RETURN VALUE > -* The status of the function. > -* > -* NOTES > -* > -* SEE ALSO > -* updn_construct > -********/ > - > -/****** OpenSM: Updn/updn_subn_rank > -* NAME > -* updn_subn_rank > -* > -* DESCRIPTION > -* This function ranks the subnet for credit loop free algorithm > -* > -* SYNOPSIS > -*/ > -int > -updn_subn_rank( > - IN uint64_t root_guid , > - IN uint8_t base_rank, > - IN updn_t* p_updn ); > - > -/* > -* PARAMETERS > -* p_subn > -* [in] Pointer to a Subnet object to construct. > -* > -* base_rank > -* [in] The base ranking value (lowest value) > -* > -* p_updn > -* [in] Pointer to updn structure which includes state & lid2rank table > -* > -* RETURN VALUE > -* This function returns 0 when rankning has succeded , otherwise 1. > -******/ > - > -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table > -* NAME > -* osm_subn_set_up_down_min_hop_table > -* > -* DESCRIPTION > -* This function set min hop table of all switches by BFS through each > -* port guid at the subnet using ranking done before. > -* > -* SYNOPSIS > -*/ > - > -int > -osm_subn_set_up_down_min_hop_table( > - IN updn_t* p_updn ); > - > -/* > -* PARAMETERS > -* p_updn > -* [in] Pointer to updn structure which includes state & lid2rank table > -* > -* RETURN VALUE > -* This function returns 0 when rankning has succeded , otherwise 1. > -******/ > - > -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table > -* NAME > -* osm_subn_calc_up_down_min_hop_table > -* > -* DESCRIPTION > -* This function perform ranking and setting of all switches' min hop table > -* by UP DOWN algorithm > -* > -* SYNOPSIS > -*/ > - > -int > -osm_subn_calc_up_down_min_hop_table( > - IN uint32_t num_guids, > - IN uint64_t* guid_list, > - IN updn_t* p_updn ); > - > -/* > -* PARAMETERS > -* > -* guid_list > -* [in] Guid list from which to start ranking . > -* > -* p_updn > -* [in] Pointer to updn structure which includes state & lid2rank table > -* RETURN VALUE > -* This function returns 0 when rankning has succeded , otherwise 1. > -******/ > - > -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop > -* NAME > -* osm_updn_find_root_nodes_by_min_hop > -* > -* DESCRIPTION > -* This function perform auto identification of root nodes for UPDN ranking phase > -* > -* SYNOPSIS > -*/ > -int > -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); > - > -/* > -* PARAMETERS > -* p_root_nodes_list > -* > -* [out] Pointer to the root nodes list found in the subnet > -* > -* RETURN VALUE > -* This function returns 0 when auto identification had succeeded > -******/ > - > END_C_DECLS > > #endif /* _OSM_UCAST_UPDN_H_ */ > diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > index 86ac3ad..0121e6e 100644 > --- a/osm/opensm/osm_ucast_updn.c > +++ b/osm/opensm/osm_ucast_updn.c > @@ -55,8 +55,62 @@ #include > #include > #include > #include > -#include > -#include > + > +/* //////////////////////////// */ > +/* Local types */ > +/* /////////////////////////// */ > + > +/* direction */ > +typedef enum _updn_switch_dir > +{ > + UP = 0, > + DOWN > +} updn_switch_dir_t; > + > +/* This enum respresent available states in the UPDN algorithm */ > +typedef enum _updn_state > +{ > + UPDN_INIT = 0, > + UPDN_RANK, > + UPDN_MIN_HOP_CALC, > +} updn_state_t; > + > +/* Rank value of this node */ > +typedef struct _updn_rank > +{ > + cl_map_item_t map_item; > + uint8_t rank; > +} updn_rank_t; > + > +/* Histogram element - the number of occurences of the same hop value */ > +typedef struct _updn_hist > +{ > + cl_map_item_t map_item; > + uint32_t bar_value; > +} updn_hist_t; > + > +typedef struct _updn_next_step > +{ > + updn_switch_dir_t state; > + osm_switch_t *p_sw; > +} updn_next_step_t; > + > +/* guids list */ > +typedef struct _updn_input > +{ > + uint32_t num_guids; > + uint64_t * guid_list; > +} updn_input_t; > + > +/* updn structure */ > +typedef struct _updn > +{ > + updn_state_t state; > + boolean_t auto_detect_root_nodes; > + cl_qmap_t guid_rank_tbl; > + updn_input_t updn_ucast_reg_inputs; > + cl_list_t * p_root_nodes; > +} updn_t; > > > /* ///////////////////////////////// */ > @@ -65,6 +119,11 @@ #include > /* This var is predefined and initialized */ > extern osm_opensm_t osm; > > +/* ///////////////////////////////// */ > +/* Statics */ > +/* ///////////////////////////////// */ > +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > + > /********************************************************************** > **********************************************************************/ > /* This function returns direction based on rank and guid info of current & > @@ -471,7 +530,7 @@ __updn_bfs_by_node( > > /********************************************************************** > **********************************************************************/ > -void > +static void > updn_destroy( > IN updn_t* const p_updn ) > { > @@ -508,7 +567,7 @@ updn_destroy( > > /********************************************************************** > **********************************************************************/ > -updn_t* > +static updn_t* > updn_construct(void) > { > updn_t* p_updn; > @@ -523,7 +582,7 @@ updn_construct(void) > > /********************************************************************** > **********************************************************************/ > -cl_status_t > +static cl_status_t > updn_init( > IN updn_t* const p_updn ) > { > @@ -635,7 +694,7 @@ updn_init( > **********************************************************************/ > /* NOTE : PLS check if we need to decide that the first */ > /* rank is a SWITCH for BFS purpose */ > -int > +static int > updn_subn_rank( > IN uint64_t root_guid, > IN uint8_t base_rank, > @@ -795,7 +854,7 @@ updn_subn_rank( > > /********************************************************************** > **********************************************************************/ > -int > +static int > osm_subn_set_up_down_min_hop_table( > IN updn_t* p_updn ) > { > @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( > > /********************************************************************** > **********************************************************************/ > -int > +static int > osm_subn_calc_up_down_min_hop_table( > IN uint32_t num_guids, > IN uint64_t * guid_list, > @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( > /********************************************************************** > **********************************************************************/ > /* UPDN callback function */ > -int __osm_updn_call( > +static int __osm_updn_call( > void *ctx ) > { > OSM_LOG_ENTER(&(osm.log), __osm_updn_call); > @@ -969,7 +1028,7 @@ int __osm_updn_call( > /********************************************************************** > **********************************************************************/ > /* UPDN convert cl_list to guid array in updn struct */ > -void __osm_updn_convert_list2array( > +static void __osm_updn_convert_list2array( > IN updn_t * p_updn ) > { > uint32_t i = 0, max_num = 0; > @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( > /********************************************************************** > **********************************************************************/ > /* Find Root nodes automatically by Min Hop Table info */ > -int > +static int > osm_updn_find_root_nodes_by_min_hop( > OUT updn_t * p_updn ) > { From sashak at voltaire.com Sun Oct 22 03:24:29 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 22 Oct 2006 12:24:29 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <453B3194.7000702@dev.mellanox.co.il> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> Message-ID: <20061022102429.GB29681@sashak.voltaire.com> On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: > Hi Sasha. > > One small comments: > > [snip] > > osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > > ... > > osm_updn_find_root_nodes_by_min_hop( > > ... > > osm_subn_set_up_down_min_hop_table( > > ... > > osm_subn_calc_up_down_min_hop_table( > > ... > > Please add the "__" prefix to the static function names. Then would be better to remove 'osm_' and '__osm_' prefixes in static names, but this will be function renaming, not just 'make static'. Sasha > > Thanks. > > -- > Yevgeny > > Sasha Khapyorsky wrote: > > This makes local functions static and moves definitions of locally used > > types to .c file. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > osm/include/opensm/osm_opensm.h | 1 - > > osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- > > osm/opensm/osm_ucast_updn.c | 81 +++++++- > > 3 files changed, 70 insertions(+), 361 deletions(-) > > > > diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h > > index cb216a4..5557dbd 100644 > > --- a/osm/include/opensm/osm_opensm.h > > +++ b/osm/include/opensm/osm_opensm.h > > @@ -62,7 +62,6 @@ #include > > #include > > #include > > #include > > -#include > > > > #ifdef __cplusplus > > # define BEGIN_C_DECLS extern "C" { > > diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h > > index 4609e1b..c2a4376 100644 > > --- a/osm/include/opensm/osm_ucast_updn.h > > +++ b/osm/include/opensm/osm_ucast_updn.h > > @@ -71,363 +71,14 @@ BEGIN_C_DECLS > > /* ENUM TypeDefs */ > > /* /////////////////////////// */ > > > > -/* > > -* DESCRIPTION > > -* This enum respresent available directions of arcs in the graph > > -* SYNOPSIS > > -*/ > > -typedef enum _updn_switch_dir > > -{ > > - UP = 0, > > - DOWN > > -} updn_switch_dir_t; > > - > > -/* > > - * TYPE DEFINITIONS > > - * UP > > - * Current switch direction in propogating the subnet is up > > - * DOWN > > - * Current switch direction in propogating the subnet is down > > - * > > - */ > > - > > -/* > > -* DESCRIPTION > > -* This enum respresent available states in the UPDN algorithm > > -* SYNOPSIS > > -*/ > > -typedef enum _updn_state > > -{ > > - UPDN_INIT = 0, > > - UPDN_RANK, > > - UPDN_MIN_HOP_CALC, > > -} updn_state_t; > > - > > -/* > > - * TYPE DEFINITIONS > > - * UPDN_INIT - loading the package but still not performing anything > > - * UPDN_RANK - post ranking algorithm > > - * UPDN_MIN_HOP_CALC - post min hop table calculation > > - */ > > - > > /* ////////////////////////////////// */ > > /* Struct TypeDefs */ > > /* ///////////////////////////////// */ > > > > -/****s* UPDN: Rank element/updn_rank_t > > -* NAME > > -* updn_rank_t > > -* > > -* DESCRIPTION > > -* This object represents a rank type element in a list > > -* > > -* The updn_rank_t object should be treated as opaque and should > > -* be manipulated only through the provided functions. > > -* > > -* SYNOPSIS > > -*/ > > - > > -typedef struct _updn_rank > > -{ > > - cl_map_item_t map_item; > > - uint8_t rank; > > -} updn_rank_t; > > - > > -/* > > -* FIELDS > > -* map_item > > -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > > -* > > -* rank > > -* Rank value of this node > > -* > > -*/ > > - > > -/****s* UPDN: Histogram element/updn_hist_t > > -* NAME > > -* updn_hist_t > > -* > > -* DESCRIPTION > > -* This object represents a histogram type element in a list > > -* > > -* The updn_hist_t object should be treated as opaque and should > > -* be manipulated only through the provided functions. > > -* > > -* SYNOPSIS > > -*/ > > - > > -typedef struct _updn_hist > > -{ > > - cl_map_item_t map_item; > > - uint32_t bar_value; > > -} updn_hist_t; > > - > > -/* > > -* FIELDS > > -* map_item > > -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > > -* > > -* bar_value > > -* The number of occurences of the same hop value > > -* > > -*/ > > - > > -typedef struct _updn_next_step > > -{ > > - updn_switch_dir_t state; > > - osm_switch_t *p_sw; > > -} updn_next_step_t; > > - > > -/*****s* updn: updn/updn_input_t > > -* NAME updn_t > > -* > > -* > > -* DESCRIPTION > > -* updn input fields structure. > > -* > > -* SYNOPSIS > > -*/ > > - > > -typedef struct _updn_input > > -{ > > - uint32_t num_guids; > > - uint64_t * guid_list; > > -} updn_input_t; > > - > > -/* > > -* FIELDS > > -* num_guids > > -* number of guids given at the UI > > -* > > -* guid_list > > -* guids specified as an array (converted from a list given in the UI) > > -* > > -* > > -* SEE ALSO > > -* > > -*********/ > > - > > -/*****s* updn: updn/updn_t > > -* NAME updn_t > > -* > > -* > > -* DESCRIPTION > > -* updn structure. > > -* > > -* SYNOPSIS > > -*/ > > - > > -typedef struct _updn > > -{ > > - updn_state_t state; > > - boolean_t auto_detect_root_nodes; > > - cl_qmap_t guid_rank_tbl; > > - updn_input_t updn_ucast_reg_inputs; > > - cl_list_t * p_root_nodes; > > -} updn_t; > > - > > -/* > > -* FIELDS > > -* state > > -* state of the updn algorithm which basically should pass through Init > > -* - Ranking - UpDn algorithm > > -* > > -* guid_rank_tbl > > -* guid 2 rank mapping vector , indexed by guid in network order > > -* > > -* > > -* SEE ALSO > > -* > > -*********/ > > - > > /* ////////////////////////////// */ > > /* Function */ > > /* ////////////////////////////// */ > > > > -/***f** OpenSM: Updn/updn_construct > > -* NAME > > -* updn_construct > > -* > > -* DESCRIPTION > > -* Allocation of updn_t struct > > -* > > -* SYNOPSIS > > -*/ > > - > > -updn_t* > > -updn_construct(void); > > - > > -/* > > -* PARAMETERS > > -* > > -* > > -* RETURN VALUE > > -* Return a pointer to an updn struct. Null if fails to do so. > > -* > > -* NOTES > > -* First step of the creation of updn_t > > -*/ > > - > > -/****s* OpenSM: Updn/updn_destroy > > -* NAME > > -* updn_destroy > > -* > > -* DESCRIPTION > > -* release of updn_t struct > > -* > > -* SYNOPSIS > > -*/ > > - > > -void > > -updn_destroy( > > - IN updn_t* const p_updn ); > > - > > -/* > > -* PARAMETERS > > -* p_updn > > -* A pointer to the updn_t struct that is goining to be released > > -* > > -* RETURN VALUE > > -* > > -* NOTES > > -* Final step of the releasing of updn_t > > -* > > -* SEE ALSO > > -* updn_construct > > -*********/ > > - > > -/****f* OpenSM: Updn/updn_init > > -* NAME > > -* updn_init > > -* > > -* DESCRIPTION > > -* Initialization of an updn_t struct > > -* > > -* SYNOPSIS > > -*/ > > -cl_status_t > > -updn_init( > > - IN updn_t* const p_updn ); > > - > > -/* > > -* PARAMETERS > > -* p_updn > > -* A pointer to the updn_t struct that is goining to be initilized > > -* > > -* RETURN VALUE > > -* The status of the function. > > -* > > -* NOTES > > -* > > -* SEE ALSO > > -* updn_construct > > -********/ > > - > > -/****** OpenSM: Updn/updn_subn_rank > > -* NAME > > -* updn_subn_rank > > -* > > -* DESCRIPTION > > -* This function ranks the subnet for credit loop free algorithm > > -* > > -* SYNOPSIS > > -*/ > > -int > > -updn_subn_rank( > > - IN uint64_t root_guid , > > - IN uint8_t base_rank, > > - IN updn_t* p_updn ); > > - > > -/* > > -* PARAMETERS > > -* p_subn > > -* [in] Pointer to a Subnet object to construct. > > -* > > -* base_rank > > -* [in] The base ranking value (lowest value) > > -* > > -* p_updn > > -* [in] Pointer to updn structure which includes state & lid2rank table > > -* > > -* RETURN VALUE > > -* This function returns 0 when rankning has succeded , otherwise 1. > > -******/ > > - > > -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table > > -* NAME > > -* osm_subn_set_up_down_min_hop_table > > -* > > -* DESCRIPTION > > -* This function set min hop table of all switches by BFS through each > > -* port guid at the subnet using ranking done before. > > -* > > -* SYNOPSIS > > -*/ > > - > > -int > > -osm_subn_set_up_down_min_hop_table( > > - IN updn_t* p_updn ); > > - > > -/* > > -* PARAMETERS > > -* p_updn > > -* [in] Pointer to updn structure which includes state & lid2rank table > > -* > > -* RETURN VALUE > > -* This function returns 0 when rankning has succeded , otherwise 1. > > -******/ > > - > > -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table > > -* NAME > > -* osm_subn_calc_up_down_min_hop_table > > -* > > -* DESCRIPTION > > -* This function perform ranking and setting of all switches' min hop table > > -* by UP DOWN algorithm > > -* > > -* SYNOPSIS > > -*/ > > - > > -int > > -osm_subn_calc_up_down_min_hop_table( > > - IN uint32_t num_guids, > > - IN uint64_t* guid_list, > > - IN updn_t* p_updn ); > > - > > -/* > > -* PARAMETERS > > -* > > -* guid_list > > -* [in] Guid list from which to start ranking . > > -* > > -* p_updn > > -* [in] Pointer to updn structure which includes state & lid2rank table > > -* RETURN VALUE > > -* This function returns 0 when rankning has succeded , otherwise 1. > > -******/ > > - > > -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop > > -* NAME > > -* osm_updn_find_root_nodes_by_min_hop > > -* > > -* DESCRIPTION > > -* This function perform auto identification of root nodes for UPDN ranking phase > > -* > > -* SYNOPSIS > > -*/ > > -int > > -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); > > - > > -/* > > -* PARAMETERS > > -* p_root_nodes_list > > -* > > -* [out] Pointer to the root nodes list found in the subnet > > -* > > -* RETURN VALUE > > -* This function returns 0 when auto identification had succeeded > > -******/ > > - > > END_C_DECLS > > > > #endif /* _OSM_UCAST_UPDN_H_ */ > > diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > > index 86ac3ad..0121e6e 100644 > > --- a/osm/opensm/osm_ucast_updn.c > > +++ b/osm/opensm/osm_ucast_updn.c > > @@ -55,8 +55,62 @@ #include > > #include > > #include > > #include > > -#include > > -#include > > + > > +/* //////////////////////////// */ > > +/* Local types */ > > +/* /////////////////////////// */ > > + > > +/* direction */ > > +typedef enum _updn_switch_dir > > +{ > > + UP = 0, > > + DOWN > > +} updn_switch_dir_t; > > + > > +/* This enum respresent available states in the UPDN algorithm */ > > +typedef enum _updn_state > > +{ > > + UPDN_INIT = 0, > > + UPDN_RANK, > > + UPDN_MIN_HOP_CALC, > > +} updn_state_t; > > + > > +/* Rank value of this node */ > > +typedef struct _updn_rank > > +{ > > + cl_map_item_t map_item; > > + uint8_t rank; > > +} updn_rank_t; > > + > > +/* Histogram element - the number of occurences of the same hop value */ > > +typedef struct _updn_hist > > +{ > > + cl_map_item_t map_item; > > + uint32_t bar_value; > > +} updn_hist_t; > > + > > +typedef struct _updn_next_step > > +{ > > + updn_switch_dir_t state; > > + osm_switch_t *p_sw; > > +} updn_next_step_t; > > + > > +/* guids list */ > > +typedef struct _updn_input > > +{ > > + uint32_t num_guids; > > + uint64_t * guid_list; > > +} updn_input_t; > > + > > +/* updn structure */ > > +typedef struct _updn > > +{ > > + updn_state_t state; > > + boolean_t auto_detect_root_nodes; > > + cl_qmap_t guid_rank_tbl; > > + updn_input_t updn_ucast_reg_inputs; > > + cl_list_t * p_root_nodes; > > +} updn_t; > > > > > > /* ///////////////////////////////// */ > > @@ -65,6 +119,11 @@ #include > > /* This var is predefined and initialized */ > > extern osm_opensm_t osm; > > > > +/* ///////////////////////////////// */ > > +/* Statics */ > > +/* ///////////////////////////////// */ > > +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > > + > > /********************************************************************** > > **********************************************************************/ > > /* This function returns direction based on rank and guid info of current & > > @@ -471,7 +530,7 @@ __updn_bfs_by_node( > > > > /********************************************************************** > > **********************************************************************/ > > -void > > +static void > > updn_destroy( > > IN updn_t* const p_updn ) > > { > > @@ -508,7 +567,7 @@ updn_destroy( > > > > /********************************************************************** > > **********************************************************************/ > > -updn_t* > > +static updn_t* > > updn_construct(void) > > { > > updn_t* p_updn; > > @@ -523,7 +582,7 @@ updn_construct(void) > > > > /********************************************************************** > > **********************************************************************/ > > -cl_status_t > > +static cl_status_t > > updn_init( > > IN updn_t* const p_updn ) > > { > > @@ -635,7 +694,7 @@ updn_init( > > **********************************************************************/ > > /* NOTE : PLS check if we need to decide that the first */ > > /* rank is a SWITCH for BFS purpose */ > > -int > > +static int > > updn_subn_rank( > > IN uint64_t root_guid, > > IN uint8_t base_rank, > > @@ -795,7 +854,7 @@ updn_subn_rank( > > > > /********************************************************************** > > **********************************************************************/ > > -int > > +static int > > osm_subn_set_up_down_min_hop_table( > > IN updn_t* p_updn ) > > { > > @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( > > > > /********************************************************************** > > **********************************************************************/ > > -int > > +static int > > osm_subn_calc_up_down_min_hop_table( > > IN uint32_t num_guids, > > IN uint64_t * guid_list, > > @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( > > /********************************************************************** > > **********************************************************************/ > > /* UPDN callback function */ > > -int __osm_updn_call( > > +static int __osm_updn_call( > > void *ctx ) > > { > > OSM_LOG_ENTER(&(osm.log), __osm_updn_call); > > @@ -969,7 +1028,7 @@ int __osm_updn_call( > > /********************************************************************** > > **********************************************************************/ > > /* UPDN convert cl_list to guid array in updn struct */ > > -void __osm_updn_convert_list2array( > > +static void __osm_updn_convert_list2array( > > IN updn_t * p_updn ) > > { > > uint32_t i = 0, max_num = 0; > > @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( > > /********************************************************************** > > **********************************************************************/ > > /* Find Root nodes automatically by Min Hop Table info */ > > -int > > +static int > > osm_updn_find_root_nodes_by_min_hop( > > OUT updn_t * p_updn ) > > { > From tziporet at dev.mellanox.co.il Sun Oct 22 05:38:38 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 22 Oct 2006 14:38:38 +0200 Subject: [openib-general] openib-1.1.tgz In-Reply-To: <1161307200.5074.10.camel@julia.et.endace.com> References: <1161307200.5074.10.camel@julia.et.endace.com> Message-ID: <453B664E.4070808@dev.mellanox.co.il> vishal wrote: > Hi, > > Where could I find the file openib-1.1.tgz ? Couldn't find it on > www.openib.org ...Thanks! > > Vishal > > > This file is part of OFED 1.1.tgz under the SOURCES directory. Tziporet From jsquyres at cisco.com Sun Oct 22 06:56:14 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Sun, 22 Oct 2006 09:56:14 -0400 Subject: [openib-general] OFED 1.1 - Official Release In-Reply-To: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> Message-ID: <3C25C250-70E6-43F1-860A-50BE1B895B09@cisco.com> Tziporet -- Is OFED 1.1 going to be listed on www.openib.org? I see that the "Downloads" section still lists all the OFED 1.0 stuff. On Oct 20, 2006, at 6:25 AM, Tziporet Koren wrote: > I am happy to announce that OFED 1.1 Official Release is now > available. > > The release can be found under: > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ > > And later today it will be on the OpenFabrics download page: > http://www.openfabrics.org/downloads.html. > > This release was done in a joint effort of the following companies: > * Cisco > * SilverStorm > * Voltaire > * QLogic > * Intel > * IBM > * Mellanox Technologies > > I wish to thank all who contributed to the success of this release. > > Tziporet > ====================================================================== > ========= > > Release summary: > ================ > The OFED software package is composed of several software modules > intended for use on a computer > cluster constructed as an InfiniBand network. > > OFED package contains the following components: > =============================================== > o OpenFabrics core and ULPs: > - HCA drivers (mthca, ipath, ehca) > - core > - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER > Host and uDAPL > o OpenFabrics utilities: > - OpenSM: InfiniBand Subnet Manager > - Diagnostic tools > - Performance tests > o MPI: > - OSU MPI stack supporting the InfiniBand interface > - Open MPI stack supporting the InfiniBand interface > - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, > Presta) > o Sources of all software modules (under conditions mentioned > in the modules' > LICENSE files) > o Documentation > > Notes: > 1. SDP is in beta quality. > 2. ehca driver is in technology preview state. > 3. All other OFED components are of production quality. > > Supported Platforms and Operating Systems > ========================================= > CPU architectures: > * x86_64 > * x86 > * ia64 > * ppc64 > > Linux Operating Systems: > - RedHat EL4 up3: 2.6.9-34.ELsmp > - RedHat EL4 up4: 2.6.9-42.ELsmp > - SLES9 SP3: 2.6.5-7.244-smp > - SLES10: 2.6.16.21-0.8-smp > - kernel.org: 2.6.17.x and 2.6.18.x > > > HCAs Supported > ============== > Mellanox HCAs: > - InfiniHost > - InfiniHost III Ex (both modes: with memory and MemFree) > - InfiniHost III Lx > Both SDR and DDR mode of the InfiniHost III family are > supported. > > For official FW versions please see: > http://www.mellanox.com/support/firmware_table.php > > Qlogic HCAs: > - QHT6040 (PathScale InfiniPath HT-460) > - QHT6140 (PathScale InfiniPath HT-465) > - QLE6140 (PathScale InfiniPath PE-880) > > IBM HCAs: > - GX Dual-port 4x IB HCA > - GX Dual-port 12x IB HCA > > > Switches Supported > This release was tested with switches and gateways provided by the > following companies: > - Cisco > - Voltaire > - SilverStorm > - Flextronics > > Third Party Packages > ==================== > The following third party packages have been tested with OFED 1.1: > 1. Intel MPI, Version 2.0.1 - refresh, and Version 3.0 > 2. HP MPI > > OFED Sources: > ============= > Source repositories: > Kernel: git://www.mellanox.co.il/~git/infiniband ofed_1_1 > User: https://openib.org/svn/gen2/branches/1.1/src/userspace > > Main changed from OFED 1.0: > ============================ > - Kernel code based on 2.6.18 > - High Availability in IPoIB and SRP (beta) > - RDS was removed for the OFED package (to be added in future > releases) > - IBM low level driver (ehca) was added > - MPI: > - OSU MVAPICH: Message coalescing > - Open MPI: Version was updated to v1.1.1 > - MPI tests: Updated to latest versions from LLNL, Intel and OSU > - Management: Added utilities and enhanced tools > - Full support for ppc64 libraries (32 and 64 bits) > > > See the attached are the release notes for details > > > Tziporet Koren > Software Director > Mellanox Technologies > mailto: tziporet at mellanox.co.il > Tel +972-4-9097200, ext 380 > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From mst at mellanox.co.il Sun Oct 22 07:09:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 22 Oct 2006 16:09:25 +0200 Subject: [openib-general] OFED 1.1 - Official Release In-Reply-To: <3C25C250-70E6-43F1-860A-50BE1B895B09@cisco.com> References: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> <3C25C250-70E6-43F1-860A-50BE1B895B09@cisco.com> Message-ID: <20061022140925.GB6955@mellanox.co.il> Quoting r. Jeff Squyres : > Subject: Re: [openib-general] OFED 1.1 - Official Release > > Tziporet -- > > Is OFED 1.1 going to be listed on www.openib.org? I see that the > "Downloads" section still lists all the OFED 1.0 stuff. I don't think we have access to www.openib.org. That's one of the things that I hope will be fixed with the new server setup. -- MST From tziporet at dev.mellanox.co.il Sun Oct 22 07:59:28 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 22 Oct 2006 16:59:28 +0200 Subject: [openib-general] [openfabrics-ewg] OFED 1.1 - Official Release In-Reply-To: <20061022140925.GB6955@mellanox.co.il> References: <6C2C79E72C305246B504CBA17B5500C92ACF01@mtlexch01.mtl.com> <3C25C250-70E6-43F1-860A-50BE1B895B09@cisco.com> <20061022140925.GB6955@mellanox.co.il> Message-ID: <453B8750.30809@dev.mellanox.co.il> Michael S. Tsirkin wrote: > Quoting r. Jeff Squyres : > >> Subject: Re: [openib-general] OFED 1.1 - Official Release >> >> Tziporet -- >> >> Is OFED 1.1 going to be listed on www.openib.org? I see that the >> "Downloads" section still lists all the OFED 1.0 stuff. >> > > I don't think we have access to www.openib.org. > That's one of the things that I hope will be fixed with the new server setup. > > I already asked Matt from Sandia to update the download page of OpenFabrics site. I hope it will be done soon. Tziporet From ishai at dev.mellanox.co.il Sun Oct 22 08:15:23 2006 From: ishai at dev.mellanox.co.il (Ishai Rabinovitz) Date: Sun, 22 Oct 2006 17:15:23 +0200 Subject: [openib-general] A question about sa_query Message-ID: <453B8B0B.2030402@dev.mellanox.co.il> Hi, There is something that bothers me in sa_query. According to table 115 in the IB-SPEC when the status in the MAD hdr is 1,2 or 3 it shouldn't be considered to as an error. (1 means busy, 2 means redirection, and 3 means both). The function "recv_handler" in core/sa_query.c sets the status of the sa_query before calling the callback function. It sets the status according to the status returned in the mad header. (mad_recv_wc->recv_buf.mad->mad_hdr.status) If the status in the mad_hdr is different from 0 it sets the return status to -EINVAL. This mean that the higher layers (e.g., SRP) do not know what was the exact status and therefore treat status 1 (busy) as an error. Am I missing something? Ishai From tziporet at dev.mellanox.co.il Sun Oct 22 09:09:23 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 22 Oct 2006 18:09:23 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: References: Message-ID: <453B97B3.4010403@dev.mellanox.co.il> Chet Mehta wrote: > > Tziporet, > > I understand that OFED1.1-rc7 was released without the requested > change below despite the fact that the request came in before the > deadline. As things stand, userspace is broken for ehca in OFED 1.1. > Was the request rejected because ehca is considered a technology > preview for this OFED release and if so are there well understood > (documented) rules/limitations on how their submissions are handled? > > I'm asking to (a) learn from this experience (b) understand what we > could/should have done differently (c) help avoid issues of this type > in the future for us or other contributors. I would appreciate hearing > back from you. Thx! > > :Chet. Hi Chet, I regret that some of the latest patches could not be included in the final release, but the main issue was that the patches came too late. Please note: 1. ehca was added to OFED 1.1 in RC3, 31-Aug (http://openib.org/pipermail/openfabrics-ewg/2006-August/001234.html) 2. Most user space issues raised by IBM were reported only on RC7, which was supposed to be the final RC, with no more major changes. We nevertheless accepted patches that you submitted at that late phase which affected only ehca. However, the patch in question (on the configure file for user space) touches ALL components of the userspace build, and was a very large patch as well. At that late date (2 days before final release!), it is impossible to start testing a change of this magnitude on ALL platforms to verify that the change was not harmful. I agree we should learn from this experience: in the future, /please /start your testing as early as possible so that problems of this nature are caught early in the development cycle, and do not need to be addressed at the last possible moment. I am sure that in the next release things will go more smoothly since this was IBM's first participation in an OFED release. Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From HNGUYEN at de.ibm.com Sun Oct 22 10:49:20 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Sun, 22 Oct 2006 19:49:20 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <453B97B3.4010403@dev.mellanox.co.il> Message-ID: Hello Tziporet! > I regret that some of the latest patches could not be included in the final release, but the main issue was that the > patches came too late. Why didn't you tell me that when I reported you, Vladimir and Michael, also via direct emails asking for help, about the issues with ofed build procedure, ie. openib.spec and rpm build, one week ago? Also when you asked the group for release schedule and I told you about the issues? > Please note: > 1. ehca was added to OFED 1.1 in RC3, 31-Aug (http://openib.org/pipermail/openfabrics-ewg/2006-August/001234.html) > 2. Most user space issues raised by IBM were reported only on RC7, which was supposed to be the final RC, with no more > major changes. > We nevertheless accepted patches that you submitted at that late phase which affected only ehca. However, the patch > in question (on the configure file for user space) touches ALL components of the userspace build, and was a very large > patch as well. At that late date (2 days before final release!), it is impossible to start testing a change of this > magnitude on ALL platforms to verify that the change was not harmful. No. As you can see and I explained clearly in previous emails the patch in question is in libehca only. For clarity again: The actual patch (libehca/configure.in and libehca/config.h.in) adds 8 lines. The other libehca/configure patch is a large file generated by the packaing/build process from Vladimir and is not part of as Michael pointed out. The only reason you told us that they are non-blocking issues because ehca is in preview tech state and hence the patch is rejected. > I agree we should learn from this experience: in the future, please start your testing as early as possible so that > problems of this nature are caught early in the development cycle, and do not need to be addressed at the last possible moment. I agree with this and will do. On the other hand you still have to consider that bugs can be found at late phase. Looking at Bugzilla Bug 283 "libmthca and libipathverbs 32 bit are not installed on x86_64", which also came late in 1.1, I see much analogies with the issues I found with libehca on ppc64. > I am sure that in the next release things will go more smoothly since this was IBM's first participation in an OFED release. Now libehca is broken in ofed-1.1. How can we fix that until 1.2? Regards Nam Nguyen From elmar at pruesse.net Sun Oct 22 13:11:34 2006 From: elmar at pruesse.net (Elmar Pruesse) Date: Sun, 22 Oct 2006 22:11:34 +0200 Subject: [openib-general] OFED 1.1 on Debian based system? Message-ID: <453BD076.80006@pruesse.net> I assume just running "./install.sh" as the Readme.txt suggests will not work. Are there any "best-practices"? Or is maybe even someone working on debian packages? regards, Elmar ps: btw: build.sh assumes you are running it using "./build.sh" ... "sh build.sh" makes it crash From umaxx at oleco.net Sun Oct 22 13:46:05 2006 From: umaxx at oleco.net (Joerg Zinke) Date: Sun, 22 Oct 2006 22:46:05 +0200 Subject: [openib-general] OFED 1.1 on Debian based system? In-Reply-To: <453BD076.80006@pruesse.net> References: <453BD076.80006@pruesse.net> Message-ID: <20061022224605.5408753e@marvin.local> On Sun, 22 Oct 2006 22:11:34 +0200 "Elmar Pruesse" wrote: > I assume just running "./install.sh" as the Readme.txt suggests will > not work. > it was working here. i just installed vanilla kernel from source and set some path (check install.sh options)... there are some packages for mthca and libibverbs in testing and unstable, i do not know how actual they are - but they are maintained by roland dreier so i assume they are actual. another option would be: convert all rpm with alien to deb - but i haven't tried that yet... regards, joerg From halr at voltaire.com Sun Oct 22 14:58:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Oct 2006 17:58:42 -0400 Subject: [openib-general] A question about sa_query In-Reply-To: <453B8B0B.2030402@dev.mellanox.co.il> References: <453B8B0B.2030402@dev.mellanox.co.il> Message-ID: <1161554313.25985.262623.camel@hal.voltaire.com> Ishai, On Sun, 2006-10-22 at 11:15, Ishai Rabinovitz wrote: > Hi, > > There is something that bothers me in sa_query. > > According to table 115 in the IB-SPEC when the status in the MAD hdr is > 1,2 or 3 it shouldn't be considered to as an error. (1 means busy, 2 > means redirection, and 3 means both). Yes, in fact the description of these bits in the table you mention clearly says these are not errors. > The function "recv_handler" in core/sa_query.c sets the status of the > sa_query before calling the callback function. > It sets the status according to the status returned in the mad header. > (mad_recv_wc->recv_buf.mad->mad_hdr.status) > > If the status in the mad_hdr is different from 0 it sets the return > status to -EINVAL. > > This mean that the higher layers (e.g., SRP) do not know what was the > exact status and therefore treat status 1 (busy) as an error. That appears to me to be the case and should be changed. -- Hal > Am I missing something? > > Ishai From panda at cse.ohio-state.edu Sun Oct 22 21:10:49 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon, 23 Oct 2006 00:10:49 -0400 (EDT) Subject: [openib-general] Announcing the release of MVAPICH2 0.9.6 with on-demand connection management, multi-core optimized shared memory communication and memory hook support Message-ID: <200610230410.k9N4An6l015399@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the availability of MVAPICH2 0.9.6 with the following NEW features: - On-demand connection management using native InfiniBand Unreliable Datagram (UD) support. This feature enables InfiniBand connections to be setup dynamically and has `near constant' memory usage with increasing number of processes. This feature together with the Shared Receive Queue (SRQ) feature (available since MVAPICH2 0.9.5) enhances the scalability of MVAPICH2 on multi-thousand node clusters. Performance of applications and memory scalability using on-demand connection management can be found here: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html - Multi-core optimized and scalable intra-node (intra-CMP and inter-CMP) shared-memory communication for the emerging multi-core platforms. (Available for Gen2, uDAPL (including Solaris) and VAPI interfaces) Performance benefits of multi-core optimized support can be found here: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-smp.html - Memory Hook support provided by integration with ptmalloc2 library. This provides safe release of memory to the Operating System and is expected to benefit the memory usage of applications that heavily use malloc and free operations - Auto-detection of Architecture and InfiniBand adapters (at run-time) and associated optimizations More details on all features and supported platforms can be obtained by visiting the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich2_features.html MVAPICH2 0.9.6 continues to deliver excellent performance. Sample performance numbers include: - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR: Two-sided operations: - 3.25 microsec one-way latency (4 bytes) - 1411 MB/sec unidirectional bandwidth - 2229 MB/sec bidirectional bandwidth One-sided operations: - 6.30 microsec Put latency - 1406 MB/sec unidirectional Put bandwidth - 2229 MB/sec bidirectional Put bandwidth - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR (Dual-rail): Two-sided operations: - 3.17 microsec one-way latency (4 bytes) - 2230 MB/sec unidirectional bandwidth - 2776 MB/sec bidirectional bandwidth One-sided operations: - 6.14 microsec Put latency - 2390 MB/sec unidirectional Put bandwidth - 2776 MB/sec bidirectional Put bandwidth - OpenIB/Gen2 on Opteron with PCI-Ex and IBA-DDR: Two-sided operations: - 2.79 microsec one-way latency (4 bytes) - 1411 MB/sec unidirectional bandwidth - 2239 MB/sec bidirectional bandwidth One-sided operations: - 4.68 microsec Put latency - 1411 MB/sec unidirectional Put bandwidth - 2239 MB/sec bidirectional Put bandwidth - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR: Two-sided operations: - 4.81 microsec one-way latency (4 bytes) - 981 MB/sec unidirectional bandwidth - 1903 MB/sec bidirectional bandwidth One-sided operations: - 7.49 microsec Put latency - 981 MB/sec unidirectional Put bandwidth - 1903 MB/sec bidirectional Put bandwidth - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-SDR: Two-sided operations: - 3.56 microsec one-way latency (4 bytes) - 964 MB/sec unidirectional bandwidth - 1846 MB/sec bidirectional bandwidth One-sided operations: - 6.85 microsec Put latency - 964 MB/sec unidirectional Put bandwidth - 1846 MB/sec bidirectional Put bandwidth Performance numbers for all other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. With the ADI-3-level design, MVAPICH2 0.9.6 delivers similar performance for two-sided operations compared to MVAPICH 0.9.8. Performance comparison between MVAPICH2 0.9.6 and MVAPICH 0.9.8 for sample applications can be seen by visiting the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html Organizations and users interested in getting the best performance for both two-sided and one-sided operations and also want to exploit `multi-threading', `integrated multi-rail', `multi-core optimization', and `memory hook support' capabilities may migrate from MVAPICH code base to MVAPICH2 code base. For downloading MVAPICH2 0.9.6 package and accessing the anonymous SVN, please visit the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ A stripped down version of this release is also available at the OpenIB SVN. Summary of the testing of this release with various interfaces and on different platforms is available here: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/testing.shtml All feedbacks, including bug reports and hints for performance tuning, are welcome. Please post it to the mvapich-discuss mailing list. Thanks, MVAPICH Team at OSU/NBCL ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, Sun Microsystems and Linux Networx; and with equipment support from Advanced Clustering, AMD, Apple, Appro, Dell, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm and Sun Microsystems. Other technology partner includes Etnus. ====================================================================== From HNGUYEN at de.ibm.com Sun Oct 22 22:55:15 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 23 Oct 2006 07:55:15 +0200 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: <17A82EE2-E2D3-4AED-A3FD-8E85F31361A1@scl.ameslab.gov> Message-ID: Hi Troy! > The netpipe code is available with mercurial by: > hg clone http://source.scl.ameslab.gov/hg/netpipe3-pvfs-dev > Once you have pvfs2-1.5.1 installed, you should be able to do 'make > pvfs' in the netpipe3-pvfs-dev directory and build NPpvfs. > The command line arguments I used to reproduce this were: > ./NPpvfs -d $PVFS_FILE_PATH -l 32768 -u 268435456 -n 100 -o > $NETPIPE_OUTPUT_FILE Thanks for this. I've been struggling with setting up the systems to recreate this problem. Please be patient. Can you please send me the ouput of modinfo ib_ehca (or hcad_mod in older version)? Also the firmware code level as plained in previous email. How many memory have you assigned to the partition? With those data I'd be able to have nearly the same envs like yours. > This is the dmesg log: > PU0001 000e0091:ehca_hcall_7arg_7ret HCAD_ERROR opcode=160 > ret=fffffffffffffff7 arg1=1000000003000004 arg2=5 arg3=4000f830000 > arg4=10000 arg5=e0000000000000 arg6=eb6b6920 arg7=0 out1=0 out2=0 > out3=0 out4=0 out5=0 out6=0 out7=0 > PU0001 00090454:ehca_reg_mr HCAD_ERROR hipz_alloc_mr failed, > h_ret=fffffffffffffff7 hca_hndl=1000000003000004 > PU0001 00090478:ehca_reg_mr <<< ret=ffffffea shca=c0000000e796b000 > e_mr=c0000000ce865e80 iova_start=000004000f830000 size=10000 acl=7 > e_pd=c0000000eb6b6920 pginfo=c0000000dfcb3a70 num_pages=10 num_4k=10 > PU0001 00090176:ehca_reg_user_mr <<< rc=ffffffffffffffea > pd=c0000000eb6b6920 region=c0000000ce861dd0 mr_access_flags=7 > udata=c0000000dfcb3ba0 I got this already from you and Kyle. I meant the full log with debug traces enabled: modprobe ib_ehca debug_level=1 or for older versions modprobe hcad_mod debug_level=9999999999999999999999. If possible, try to get it. Anyway I'll do that with my test env. Thanks! Nam From kliteyn at dev.mellanox.co.il Mon Oct 23 00:02:49 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 23 Oct 2006 09:02:49 +0200 Subject: [openib-general] [PATCH] opensm: remove obsolete p_report_buf In-Reply-To: <20061020004727.GH24676@sashak.voltaire.com> References: <20061020004727.GH24676@sashak.voltaire.com> Message-ID: <453C6919.7010008@dev.mellanox.co.il> Hi Sasha. The removal of the sm->p_report_buf is a good idea. However, I do have one comment: In several cases this buffer was printed using the osm_log_raw() function, and you replaced this with a plain fprintf(stdout,...). Right now the osm_log_raw function just prints to stdout too, but this doesn't always have to be the case. Besides, osm_log_raw provides verbosity level checking, which is lost when you replace it with printf. --Yevgeny Sasha Khapyorsky wrote: > This removes obsolete now shared sm->p_report_buf buffer and cleans > up related code. > > Signed-off-by: Sasha Khapyorsky > --- > osm/include/opensm/osm_base.h | 5 -- > osm/include/opensm/osm_sm.h | 2 - > osm/include/opensm/osm_state_mgr.h | 8 --- > osm/include/opensm/osm_ucast_mgr.h | 5 -- > osm/opensm/osm_mcast_mgr.c | 11 ++-- > osm/opensm/osm_sm.c | 15 +----- > osm/opensm/osm_state_mgr.c | 104 ++++++++++------------------------- > osm/opensm/osm_ucast_mgr.c | 70 +++++++----------------- > 8 files changed, 57 insertions(+), 163 deletions(-) > > diff --git a/osm/include/opensm/osm_base.h b/osm/include/opensm/osm_base.h > index 57dd4fd..20e2cc3 100644 > --- a/osm/include/opensm/osm_base.h > +++ b/osm/include/opensm/osm_base.h > @@ -714,11 +714,6 @@ typedef enum _osm_state_mgr_mode > * > **********/ > > -#define OSM_REPORT_BUF_SIZE 0x10000 > -#define OSM_REPORT_LINE_SIZE 0x256 > -#define OSM_REPORT_BUF_THRESHOLD (OSM_REPORT_BUF_SIZE / OSM_REPORT_LINE_SIZE) > - > - > /****d* OpenSM: Base/osm_sm_signal_t > * NAME > * osm_sm_signal_t > diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h > index bc812f3..05b87ac 100644 > --- a/osm/include/opensm/osm_sm.h > +++ b/osm/include/opensm/osm_sm.h > @@ -178,8 +178,6 @@ typedef struct _osm_sm > osm_vla_rcv_ctrl_t vla_rcv_ctrl; > osm_pkey_rcv_t pkey_rcv; > osm_pkey_rcv_ctrl_t pkey_rcv_ctrl; > - char* p_report_buf; > - > } osm_sm_t; > /* > * FIELDS > diff --git a/osm/include/opensm/osm_state_mgr.h b/osm/include/opensm/osm_state_mgr.h > index ad4afa0..7aaab58 100644 > --- a/osm/include/opensm/osm_state_mgr.h > +++ b/osm/include/opensm/osm_state_mgr.h > @@ -121,7 +121,6 @@ typedef struct _osm_state_mgr > cl_qlist_t idle_time_list; > cl_plock_t *p_lock; > cl_event_t *p_subnet_up_event; > - char *p_report_buf; > osm_sm_state_t state; > osm_state_mgr_mode_t state_step_mode; > osm_signal_t next_stage_signal; > @@ -170,9 +169,6 @@ typedef struct _osm_state_mgr > * p_subnet_up_event > * Pointer to the event to set if/when the subnet comes up. > * > -* p_report_buf > -* Pointer to the large log buffer used for user reports. > -* > * state > * State of the SM. > * > @@ -380,7 +376,6 @@ osm_state_mgr_init( > IN const osm_sm_mad_ctrl_t* const p_mad_ctrl, > IN cl_plock_t* const p_lock, > IN cl_event_t* const p_subnet_up_event, > - IN char* const p_report_buf, > IN osm_log_t* const p_log ); > /* > * PARAMETERS > @@ -420,9 +415,6 @@ osm_state_mgr_init( > * p_subnet_up_event > * [in] Pointer to the event to set if/when the subnet comes up. > * > -* p_report_buf > -* [in] Pointer to the large log buffer used for user reports. > -* > * p_log > * [in] Pointer to the log object. > * > diff --git a/osm/include/opensm/osm_ucast_mgr.h b/osm/include/opensm/osm_ucast_mgr.h > index 0fbfc66..1c10abb 100644 > --- a/osm/include/opensm/osm_ucast_mgr.h > +++ b/osm/include/opensm/osm_ucast_mgr.h > @@ -105,7 +105,6 @@ typedef struct _osm_ucast_mgr > osm_req_t *p_req; > osm_log_t *p_log; > cl_plock_t *p_lock; > - char *p_report_buf; > } osm_ucast_mgr_t; > /* > * FIELDS > @@ -204,7 +203,6 @@ osm_ucast_mgr_init( > IN osm_ucast_mgr_t* const p_mgr, > IN osm_req_t* const p_req, > IN osm_subn_t* const p_subn, > - IN char* const p_report_buf, > IN osm_log_t* const p_log, > IN cl_plock_t* const p_lock ); > /* > @@ -218,9 +216,6 @@ osm_ucast_mgr_init( > * p_subn > * [in] Pointer to the Subnet object for this subnet. > * > -* p_report_buf > -* [in] Pointer to the large log buffer used for user reporting. > -* > * p_log > * [in] Pointer to the log object. > * > diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c > index 5a01578..82ef7c3 100644 > --- a/osm/opensm/osm_mcast_mgr.c > +++ b/osm/opensm/osm_mcast_mgr.c > @@ -1382,14 +1382,13 @@ static void > mcast_mgr_dump_sw_routes( > IN const osm_mcast_mgr_t* const p_mgr, > IN const osm_switch_t* const p_sw, > - IN FILE *p_mcfdbFile ) > + IN FILE *file ) > { > osm_mcast_tbl_t* p_tbl; > int16_t mlid_ho = 0; > int16_t mlid_start_ho; > uint8_t position = 0; > int16_t block_num = 0; > - char line[OSM_REPORT_LINE_SIZE]; > boolean_t print_lid; > const osm_node_t* p_node; > uint16_t i, j; > @@ -1404,7 +1403,7 @@ mcast_mgr_dump_sw_routes( > > p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); > > - fprintf( p_mcfdbFile, "\nSwitch 0x%016" PRIx64 "\n" > + fprintf( file, "\nSwitch 0x%016" PRIx64 "\n" > "LID : Out Port(s)\n", > cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > while ( block_num <= p_tbl->max_block_in_use ) > @@ -1415,7 +1414,7 @@ mcast_mgr_dump_sw_routes( > mlid_ho = mlid_start_ho + i; > position = 0; > print_lid = FALSE; > - sprintf( line, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); > + fprintf( file, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); > while ( position <= p_tbl->max_position ) > { > mask_entry = cl_ntoh16((*p_tbl->p_mask_tbl)[mlid_ho][position]); > @@ -1428,13 +1427,13 @@ mcast_mgr_dump_sw_routes( > for (j = 0 ; j < 16 ; j++) > { > if ( (1 << j) & mask_entry ) > - sprintf( line, "%s 0x%03X ", line, j+(position*16) ); > + fprintf( file, " 0x%03X ", j+(position*16) ); > } > position++; > } > if (print_lid) > { > - fprintf( p_mcfdbFile, "%s\n", line ); > + fprintf( file, "\n" ); > } > } > block_num++; > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > index fef3cac..fb4f759 100644 > --- a/osm/opensm/osm_sm.c > +++ b/osm/opensm/osm_sm.c > @@ -256,9 +256,6 @@ osm_sm_destroy( > cl_event_destroy( &p_sm->signal ); > cl_event_destroy( &p_sm->subnet_up_event ); > > - if( p_sm->p_report_buf != NULL ) > - free( p_sm->p_report_buf ); > - > osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ > OSM_LOG_EXIT( p_sm->p_log ); > } > @@ -291,15 +288,6 @@ osm_sm_init( > p_sm->p_disp = p_disp; > p_sm->p_lock = p_lock; > > - p_sm->p_report_buf = malloc( OSM_REPORT_BUF_SIZE ); > - if( p_sm->p_report_buf == NULL ) > - { > - osm_log( p_sm->p_log, OSM_LOG_ERROR, > - "osm_sm_init: ERR 2E09: " > - "Can't allocate report buffer\n" ); > - status = IB_INSUFFICIENT_MEMORY; > - goto Exit; > - } > status = cl_event_init( &p_sm->signal, FALSE ); > if( status != CL_SUCCESS ) > goto Exit; > @@ -385,7 +373,6 @@ osm_sm_init( > status = osm_ucast_mgr_init( &p_sm->ucast_mgr, > &p_sm->req, > p_sm->p_subn, > - p_sm->p_report_buf, > p_sm->p_log, p_sm->p_lock ); > if( status != IB_SUCCESS ) > goto Exit; > @@ -409,7 +396,7 @@ osm_sm_init( > &p_sm->mad_ctrl, > p_sm->p_lock, > &p_sm->subnet_up_event, > - p_sm->p_report_buf, p_sm->p_log ); > + p_sm->p_log ); > if( status != IB_SUCCESS ) > goto Exit; > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > index d43e9fc..9c159df 100644 > --- a/osm/opensm/osm_state_mgr.c > +++ b/osm/opensm/osm_state_mgr.c > @@ -118,7 +118,6 @@ osm_state_mgr_init( > IN const osm_sm_mad_ctrl_t * const p_mad_ctrl, > IN cl_plock_t * const p_lock, > IN cl_event_t * const p_subnet_up_event, > - IN char *const p_report_buf, > IN osm_log_t * const p_log ) > { > cl_status_t status; > @@ -136,7 +135,6 @@ osm_state_mgr_init( > CL_ASSERT( p_sm_state_mgr ); > CL_ASSERT( p_mad_ctrl ); > CL_ASSERT( p_lock ); > - CL_ASSERT( p_report_buf ); > > osm_state_mgr_construct( p_mgr ); > > @@ -154,7 +152,6 @@ osm_state_mgr_init( > p_mgr->state = OSM_SM_STATE_IDLE; > p_mgr->p_lock = p_lock; > p_mgr->p_subnet_up_event = p_subnet_up_event; > - p_mgr->p_report_buf = p_report_buf; > p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; > p_mgr->next_stage_signal = OSM_SIGNAL_NONE; > > @@ -1247,16 +1244,19 @@ __osm_state_mgr_report( > uint8_t port_num; > uint8_t start_port; > uint32_t num_ports; > - char line[OSM_REPORT_LINE_SIZE]; > uint8_t node_type; > - uint32_t line_num = 0; > + > + if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > + return; > > OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report ); > > - if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > - { > - goto Exit; > - } > + fprintf( stdout, > + "\n===================================================" > + "====================================================" > + "\nVendor : Ty " > + ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " > + " : Neighbor Port (Port #)\n" ); > > p_tbl = &p_mgr->p_subn->port_guid_tbl; > > @@ -1286,91 +1286,56 @@ __osm_state_mgr_report( > num_ports = osm_port_get_num_physp( p_port ); > for( port_num = start_port; port_num < num_ports; port_num++ ) > { > - if( line_num == 0 ) > - { > - strcpy( p_mgr->p_report_buf, > - "\n===================================================" > - "====================================================" ); > - strcat( p_mgr->p_report_buf, > - "\nVendor : Ty " > - ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " > - " : Neighbor Port (Port #)\n" ); > - line_num++; > - } > - > p_physp = osm_port_get_phys_ptr( p_port, port_num ); > if( ( p_physp == NULL ) || ( !osm_physp_is_valid( p_physp ) ) ) > continue; > > - sprintf( line, "%s : %s : %02X :", > + fprintf( stdout, "%s : %s : %02X :", > osm_get_manufacturer_str( cl_ntoh64 > ( osm_node_get_node_guid > ( p_node ) ) ), > osm_get_node_type_str_fixed_width( node_type ), port_num ); > > - strcat( p_mgr->p_report_buf, line ); > - > p_pi = osm_physp_get_port_info_ptr( p_physp ); > > /* > * Port state is not defined for switch port 0 > */ > if( port_num == 0 ) > - strcat( p_mgr->p_report_buf, " :" ); > + fprintf( stdout, " :" ); > else > - { > - sprintf( line, " %s :", > + fprintf( stdout, " %s :", > osm_get_port_state_str_fixed_width > ( ib_port_info_get_port_state( p_pi ) ) ); > - strcat( p_mgr->p_report_buf, line ); > - } > > /* > * LID values are only meaningful in select cases. > */ > - if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) > - { > - if( ( ( node_type == IB_NODE_TYPE_SWITCH ) && ( port_num == 0 ) ) > - || ( node_type != IB_NODE_TYPE_SWITCH ) ) > - { > - sprintf( line, " %04X : %01X :", > - cl_ntoh16( p_pi->base_lid ), > - ib_port_info_get_lmc( p_pi ) ); > - > - strcat( p_mgr->p_report_buf, line ); > - } > - else > - strcat( p_mgr->p_report_buf, " : :" ); > - } > + if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN > + && ( ( node_type == IB_NODE_TYPE_SWITCH && port_num == 0 ) > + || node_type != IB_NODE_TYPE_SWITCH ) ) > + fprintf( stdout, " %04X : %01X :", > + cl_ntoh16( p_pi->base_lid ), > + ib_port_info_get_lmc( p_pi ) ); > else > - strcat( p_mgr->p_report_buf, " : :" ); > + fprintf( stdout, " : :" ); > > if( port_num != 0 ) > - { > - sprintf( line, " %s : %s : %s ", > + fprintf( stdout, " %s : %s : %s ", > osm_get_mtu_str( ib_port_info_get_neighbor_mtu( p_pi ) ), > osm_get_lwa_str( p_pi->link_width_active ), > osm_get_lsa_str( ib_port_info_get_link_speed_active > ( p_pi ) ) ); > - } > else > - { > - sprintf( line, " %s : %s : %s ", " ", " ", " " ); > - } > - strcat( p_mgr->p_report_buf, line ); > + fprintf( stdout, " : : " ); > > if( osm_physp_get_port_guid( p_physp ) == > p_mgr->p_subn->sm_port_guid ) > - { > - sprintf( line, "* %016" PRIx64 " *", > + fprintf( stdout, "* %016" PRIx64 " *", > cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); > - } > else > - { > - sprintf( line, ": %016" PRIx64 " :", > + fprintf( stdout, ": %016" PRIx64 " :", > cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); > - } > - strcat( p_mgr->p_report_buf, line ); > > if( port_num && > ( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) ) > @@ -1378,36 +1343,27 @@ __osm_state_mgr_report( > p_remote_physp = osm_physp_get_remote( p_physp ); > if( p_remote_physp && osm_physp_is_valid( p_remote_physp ) ) > { > - sprintf( line, " %016" PRIx64 " (%02X)", > + fprintf( stdout, " %016" PRIx64 " (%02X)", > cl_ntoh64( osm_physp_get_port_guid > ( p_remote_physp ) ), > osm_physp_get_port_num( p_remote_physp ) ); > - strcat( p_mgr->p_report_buf, line ); > } > else > - strcat( p_mgr->p_report_buf, " UNKNOWN" ); > + fprintf( stdout, " UNKNOWN" ); > } > > - strcat( p_mgr->p_report_buf, "\n" ); > - > - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) > - { > - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); > - line_num = 0; > - } > + fprintf( stdout, "\n" ); > } > - strcat( p_mgr->p_report_buf, > + > + fprintf( stdout, > "------------------------------------------------------" > "------------------------------------------------\n" ); > p_port = ( osm_port_t * ) cl_qmap_next( &p_port->map_item ); > } > > - CL_PLOCK_RELEASE( p_mgr->p_lock ); > - > - if( line_num != 0 ) > - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); > + fflush(stdout); > > - Exit: > + CL_PLOCK_RELEASE( p_mgr->p_lock ); > OSM_LOG_EXIT( p_mgr->p_log ); > } > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c > index 39d6899..da9e9f2 100644 > --- a/osm/opensm/osm_ucast_mgr.c > +++ b/osm/opensm/osm_ucast_mgr.c > @@ -103,7 +103,6 @@ osm_ucast_mgr_init( > IN osm_ucast_mgr_t* const p_mgr, > IN osm_req_t* const p_req, > IN osm_subn_t* const p_subn, > - IN char* const p_report_buf, > IN osm_log_t* const p_log, > IN cl_plock_t* const p_lock ) > { > @@ -121,7 +120,6 @@ osm_ucast_mgr_init( > p_mgr->p_subn = p_subn; > p_mgr->p_lock = p_lock; > p_mgr->p_req = p_req; > - p_mgr->p_report_buf = p_report_buf; > > OSM_LOG_EXIT( p_mgr->p_log ); > return( status ); > @@ -140,14 +138,13 @@ __osm_ucast_mgr_dump_path_distribution( > uint8_t num_ports; > uint32_t num_paths; > ib_net64_t remote_guid_ho; > - char line[OSM_REPORT_LINE_SIZE]; > > OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution ); > > p_node = osm_switch_get_node_ptr( p_sw ); > num_ports = osm_switch_get_num_ports( p_sw ); > > - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_path_distribution: " > + fprintf( stdout, "__osm_ucast_mgr_dump_path_distribution: " > "Switch 0x%" PRIx64 "\n" > "Port : Path Count Through Port", > cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > @@ -155,11 +152,10 @@ __osm_ucast_mgr_dump_path_distribution( > for( i = 0; i < num_ports; i++ ) > { > num_paths = osm_switch_path_count_get( p_sw , i ); > - sprintf( line, "\n %03u : %u", i, num_paths ); > - strcat( p_mgr->p_report_buf, line ); > + fprintf( stdout, "\n %03u : %u", i, num_paths ); > if( i == 0 ) > { > - strcat( p_mgr->p_report_buf, " (switch management port)" ); > + fprintf( stdout, " (switch management port)" ); > continue; > } > > @@ -172,26 +168,23 @@ __osm_ucast_mgr_dump_path_distribution( > switch( osm_node_get_remote_type( p_node, i ) ) > { > case IB_NODE_TYPE_SWITCH: > - strcat( p_mgr->p_report_buf, " (link to switch" ); > + fprintf( stdout, " (link to switch" ); > break; > case IB_NODE_TYPE_ROUTER: > - strcat( p_mgr->p_report_buf, " (link to router" ); > + fprintf( stdout, " (link to router" ); > break; > case IB_NODE_TYPE_CA: > - strcat( p_mgr->p_report_buf, " (link to CA" ); > + fprintf( stdout, " (link to CA" ); > break; > default: > - strcat( p_mgr->p_report_buf, " (link to unknown node type" ); > + fprintf( stdout, " (link to unknown node type" ); > break; > } > > - sprintf( line, " 0x%" PRIx64 ")", remote_guid_ho ); > - strcat( p_mgr->p_report_buf, line ); > + fprintf( stdout, " 0x%" PRIx64 ")", remote_guid_ho ); > } > > - strcat( p_mgr->p_report_buf, "\n" ); > - > - osm_log_raw( p_mgr->p_log, OSM_LOG_ROUTING, p_mgr->p_report_buf ); > + fprintf( stdout, "\n" ); > > OSM_LOG_EXIT( p_mgr->p_log ); > } > @@ -202,7 +195,7 @@ static void > __osm_ucast_mgr_dump_ucast_routes( > IN const osm_ucast_mgr_t* const p_mgr, > IN const osm_switch_t* const p_sw, > - IN FILE *p_fdbFile ) > + IN FILE *file ) > { > const osm_node_t* p_node; > uint8_t port_num; > @@ -211,8 +204,6 @@ __osm_ucast_mgr_dump_ucast_routes( > uint8_t best_port; > uint16_t max_lid_ho; > uint16_t lid_ho; > - char line[OSM_REPORT_LINE_SIZE]; > - uint32_t line_num = 0; > boolean_t ui_ucast_fdb_assign_func_defined; > > OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes ); > @@ -221,16 +212,13 @@ __osm_ucast_mgr_dump_ucast_routes( > > max_lid_ho = osm_switch_get_max_lid_ho( p_sw ); > > + fprintf( file, "__osm_ucast_mgr_dump_ucast_routes: " > + "Switch 0x%016" PRIx64 "\n" > + "LID : Port : Hops : Optimal\n", > + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) > { > - if( line_num == 0 ) > - { > - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_ucast_routes: " > - "Switch 0x%016" PRIx64 "\n" > - "LID : Port : Hops : Optimal\n", > - cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > - line_num++; > - } > + fprintf(file, "0x%04X : ", lid_ho); > > port_num = osm_switch_get_port_by_lid( p_sw, lid_ho ); > if( port_num == OSM_NO_PATH ) > @@ -241,9 +229,7 @@ __osm_ucast_mgr_dump_ucast_routes( > will reassign and compress the LID range. The > subnet should work fine either way. > */ > - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); > - strcat( p_mgr->p_report_buf, line ); > - line_num++; > + fprintf( file, "UNREACHABLE\n" ); > continue; > } > /* > @@ -255,19 +241,15 @@ __osm_ucast_mgr_dump_ucast_routes( > num_hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); > if( num_hops == OSM_NO_PATH ) > { > - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); > - strcat( p_mgr->p_report_buf, line ); > - line_num++; > + fprintf( file, "UNREACHABLE\n" ); > continue; > } > > best_hops = osm_switch_get_least_hops( p_sw, lid_ho ); > - sprintf( line, "0x%04X : %03u : %02u : ", > - lid_ho, port_num, num_hops ); > - strcat( p_mgr->p_report_buf, line ); > + fprintf( file, "%03u : %02u : ", port_num, num_hops ); > > if( best_hops == num_hops ) > - strcat( p_mgr->p_report_buf, "yes" ); > + fprintf( file, "yes" ); > else > { > if (p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign) > @@ -282,23 +264,13 @@ __osm_ucast_mgr_dump_ucast_routes( > p_sw, lid_ho, TRUE, > NULL, NULL, NULL, NULL, /* No LMC Optimization */ > ui_ucast_fdb_assign_func_defined ); > - sprintf( line, "No %u hop path possible via port %u!", > + fprintf( file, "No %u hop path possible via port %u!", > best_hops, best_port ); > - strcat( p_mgr->p_report_buf, line ); > } > > - strcat( p_mgr->p_report_buf, "\n" ); > - > - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) > - { > - fprintf(p_fdbFile,"%s",p_mgr->p_report_buf ); > - line_num = 0; > - } > + fprintf( file, "\n" ); > } > > - if( line_num != 0 ) > - fprintf(p_fdbFile,"%s\n",p_mgr->p_report_buf ); > - > OSM_LOG_EXIT( p_mgr->p_log ); > } > From eitan at mellanox.co.il Mon Oct 23 00:20:39 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 23 Oct 2006 09:20:39 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <20061022102429.GB29681@sashak.voltaire.com> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> <20061022102429.GB29681@sashak.voltaire.com> Message-ID: <453C6D47.307@mellanox.co.il> Hi Sasha, If we would like to change OpenSM coding style to not include __osm prefix for all static functions we should do it all over the code. Meanwhile lets keep the style as it is. I thought we all agreed to this in the past. It does not make sense to me to have a creeping style change one for every developer involved. Should we start the thread for what should be our target style and convert all files now? If we do then lets agree on that - and then change. Thanks Eitan Sasha Khapyorsky wrote: > On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: > >> Hi Sasha. >> >> One small comments: >> >> [snip] >> >>> osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); >>> ... >>> osm_updn_find_root_nodes_by_min_hop( >>> ... >>> osm_subn_set_up_down_min_hop_table( >>> ... >>> osm_subn_calc_up_down_min_hop_table( >>> ... >>> >> >> Please add the "__" prefix to the static function names. >> > > Then would be better to remove 'osm_' and '__osm_' prefixes in static > names, but this will be function renaming, not just 'make static'. > > Sasha > > >> Thanks. >> >> -- >> Yevgeny >> >> Sasha Khapyorsky wrote: >> >>> This makes local functions static and moves definitions of locally used >>> types to .c file. >>> >>> Signed-off-by: Sasha Khapyorsky >>> --- >>> osm/include/opensm/osm_opensm.h | 1 - >>> osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- >>> osm/opensm/osm_ucast_updn.c | 81 +++++++- >>> 3 files changed, 70 insertions(+), 361 deletions(-) >>> >>> diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h >>> index cb216a4..5557dbd 100644 >>> --- a/osm/include/opensm/osm_opensm.h >>> +++ b/osm/include/opensm/osm_opensm.h >>> @@ -62,7 +62,6 @@ #include >>> #include >>> #include >>> #include >>> -#include >>> >>> #ifdef __cplusplus >>> # define BEGIN_C_DECLS extern "C" { >>> diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h >>> index 4609e1b..c2a4376 100644 >>> --- a/osm/include/opensm/osm_ucast_updn.h >>> +++ b/osm/include/opensm/osm_ucast_updn.h >>> @@ -71,363 +71,14 @@ BEGIN_C_DECLS >>> /* ENUM TypeDefs */ >>> /* /////////////////////////// */ >>> >>> -/* >>> -* DESCRIPTION >>> -* This enum respresent available directions of arcs in the graph >>> -* SYNOPSIS >>> -*/ >>> -typedef enum _updn_switch_dir >>> -{ >>> - UP = 0, >>> - DOWN >>> -} updn_switch_dir_t; >>> - >>> -/* >>> - * TYPE DEFINITIONS >>> - * UP >>> - * Current switch direction in propogating the subnet is up >>> - * DOWN >>> - * Current switch direction in propogating the subnet is down >>> - * >>> - */ >>> - >>> -/* >>> -* DESCRIPTION >>> -* This enum respresent available states in the UPDN algorithm >>> -* SYNOPSIS >>> -*/ >>> -typedef enum _updn_state >>> -{ >>> - UPDN_INIT = 0, >>> - UPDN_RANK, >>> - UPDN_MIN_HOP_CALC, >>> -} updn_state_t; >>> - >>> -/* >>> - * TYPE DEFINITIONS >>> - * UPDN_INIT - loading the package but still not performing anything >>> - * UPDN_RANK - post ranking algorithm >>> - * UPDN_MIN_HOP_CALC - post min hop table calculation >>> - */ >>> - >>> /* ////////////////////////////////// */ >>> /* Struct TypeDefs */ >>> /* ///////////////////////////////// */ >>> >>> -/****s* UPDN: Rank element/updn_rank_t >>> -* NAME >>> -* updn_rank_t >>> -* >>> -* DESCRIPTION >>> -* This object represents a rank type element in a list >>> -* >>> -* The updn_rank_t object should be treated as opaque and should >>> -* be manipulated only through the provided functions. >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -typedef struct _updn_rank >>> -{ >>> - cl_map_item_t map_item; >>> - uint8_t rank; >>> -} updn_rank_t; >>> - >>> -/* >>> -* FIELDS >>> -* map_item >>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! >>> -* >>> -* rank >>> -* Rank value of this node >>> -* >>> -*/ >>> - >>> -/****s* UPDN: Histogram element/updn_hist_t >>> -* NAME >>> -* updn_hist_t >>> -* >>> -* DESCRIPTION >>> -* This object represents a histogram type element in a list >>> -* >>> -* The updn_hist_t object should be treated as opaque and should >>> -* be manipulated only through the provided functions. >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -typedef struct _updn_hist >>> -{ >>> - cl_map_item_t map_item; >>> - uint32_t bar_value; >>> -} updn_hist_t; >>> - >>> -/* >>> -* FIELDS >>> -* map_item >>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! >>> -* >>> -* bar_value >>> -* The number of occurences of the same hop value >>> -* >>> -*/ >>> - >>> -typedef struct _updn_next_step >>> -{ >>> - updn_switch_dir_t state; >>> - osm_switch_t *p_sw; >>> -} updn_next_step_t; >>> - >>> -/*****s* updn: updn/updn_input_t >>> -* NAME updn_t >>> -* >>> -* >>> -* DESCRIPTION >>> -* updn input fields structure. >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -typedef struct _updn_input >>> -{ >>> - uint32_t num_guids; >>> - uint64_t * guid_list; >>> -} updn_input_t; >>> - >>> -/* >>> -* FIELDS >>> -* num_guids >>> -* number of guids given at the UI >>> -* >>> -* guid_list >>> -* guids specified as an array (converted from a list given in the UI) >>> -* >>> -* >>> -* SEE ALSO >>> -* >>> -*********/ >>> - >>> -/*****s* updn: updn/updn_t >>> -* NAME updn_t >>> -* >>> -* >>> -* DESCRIPTION >>> -* updn structure. >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -typedef struct _updn >>> -{ >>> - updn_state_t state; >>> - boolean_t auto_detect_root_nodes; >>> - cl_qmap_t guid_rank_tbl; >>> - updn_input_t updn_ucast_reg_inputs; >>> - cl_list_t * p_root_nodes; >>> -} updn_t; >>> - >>> -/* >>> -* FIELDS >>> -* state >>> -* state of the updn algorithm which basically should pass through Init >>> -* - Ranking - UpDn algorithm >>> -* >>> -* guid_rank_tbl >>> -* guid 2 rank mapping vector , indexed by guid in network order >>> -* >>> -* >>> -* SEE ALSO >>> -* >>> -*********/ >>> - >>> /* ////////////////////////////// */ >>> /* Function */ >>> /* ////////////////////////////// */ >>> >>> -/***f** OpenSM: Updn/updn_construct >>> -* NAME >>> -* updn_construct >>> -* >>> -* DESCRIPTION >>> -* Allocation of updn_t struct >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -updn_t* >>> -updn_construct(void); >>> - >>> -/* >>> -* PARAMETERS >>> -* >>> -* >>> -* RETURN VALUE >>> -* Return a pointer to an updn struct. Null if fails to do so. >>> -* >>> -* NOTES >>> -* First step of the creation of updn_t >>> -*/ >>> - >>> -/****s* OpenSM: Updn/updn_destroy >>> -* NAME >>> -* updn_destroy >>> -* >>> -* DESCRIPTION >>> -* release of updn_t struct >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -void >>> -updn_destroy( >>> - IN updn_t* const p_updn ); >>> - >>> -/* >>> -* PARAMETERS >>> -* p_updn >>> -* A pointer to the updn_t struct that is goining to be released >>> -* >>> -* RETURN VALUE >>> -* >>> -* NOTES >>> -* Final step of the releasing of updn_t >>> -* >>> -* SEE ALSO >>> -* updn_construct >>> -*********/ >>> - >>> -/****f* OpenSM: Updn/updn_init >>> -* NAME >>> -* updn_init >>> -* >>> -* DESCRIPTION >>> -* Initialization of an updn_t struct >>> -* >>> -* SYNOPSIS >>> -*/ >>> -cl_status_t >>> -updn_init( >>> - IN updn_t* const p_updn ); >>> - >>> -/* >>> -* PARAMETERS >>> -* p_updn >>> -* A pointer to the updn_t struct that is goining to be initilized >>> -* >>> -* RETURN VALUE >>> -* The status of the function. >>> -* >>> -* NOTES >>> -* >>> -* SEE ALSO >>> -* updn_construct >>> -********/ >>> - >>> -/****** OpenSM: Updn/updn_subn_rank >>> -* NAME >>> -* updn_subn_rank >>> -* >>> -* DESCRIPTION >>> -* This function ranks the subnet for credit loop free algorithm >>> -* >>> -* SYNOPSIS >>> -*/ >>> -int >>> -updn_subn_rank( >>> - IN uint64_t root_guid , >>> - IN uint8_t base_rank, >>> - IN updn_t* p_updn ); >>> - >>> -/* >>> -* PARAMETERS >>> -* p_subn >>> -* [in] Pointer to a Subnet object to construct. >>> -* >>> -* base_rank >>> -* [in] The base ranking value (lowest value) >>> -* >>> -* p_updn >>> -* [in] Pointer to updn structure which includes state & lid2rank table >>> -* >>> -* RETURN VALUE >>> -* This function returns 0 when rankning has succeded , otherwise 1. >>> -******/ >>> - >>> -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table >>> -* NAME >>> -* osm_subn_set_up_down_min_hop_table >>> -* >>> -* DESCRIPTION >>> -* This function set min hop table of all switches by BFS through each >>> -* port guid at the subnet using ranking done before. >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -int >>> -osm_subn_set_up_down_min_hop_table( >>> - IN updn_t* p_updn ); >>> - >>> -/* >>> -* PARAMETERS >>> -* p_updn >>> -* [in] Pointer to updn structure which includes state & lid2rank table >>> -* >>> -* RETURN VALUE >>> -* This function returns 0 when rankning has succeded , otherwise 1. >>> -******/ >>> - >>> -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table >>> -* NAME >>> -* osm_subn_calc_up_down_min_hop_table >>> -* >>> -* DESCRIPTION >>> -* This function perform ranking and setting of all switches' min hop table >>> -* by UP DOWN algorithm >>> -* >>> -* SYNOPSIS >>> -*/ >>> - >>> -int >>> -osm_subn_calc_up_down_min_hop_table( >>> - IN uint32_t num_guids, >>> - IN uint64_t* guid_list, >>> - IN updn_t* p_updn ); >>> - >>> -/* >>> -* PARAMETERS >>> -* >>> -* guid_list >>> -* [in] Guid list from which to start ranking . >>> -* >>> -* p_updn >>> -* [in] Pointer to updn structure which includes state & lid2rank table >>> -* RETURN VALUE >>> -* This function returns 0 when rankning has succeded , otherwise 1. >>> -******/ >>> - >>> -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop >>> -* NAME >>> -* osm_updn_find_root_nodes_by_min_hop >>> -* >>> -* DESCRIPTION >>> -* This function perform auto identification of root nodes for UPDN ranking phase >>> -* >>> -* SYNOPSIS >>> -*/ >>> -int >>> -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); >>> - >>> -/* >>> -* PARAMETERS >>> -* p_root_nodes_list >>> -* >>> -* [out] Pointer to the root nodes list found in the subnet >>> -* >>> -* RETURN VALUE >>> -* This function returns 0 when auto identification had succeeded >>> -******/ >>> - >>> END_C_DECLS >>> >>> #endif /* _OSM_UCAST_UPDN_H_ */ >>> diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c >>> index 86ac3ad..0121e6e 100644 >>> --- a/osm/opensm/osm_ucast_updn.c >>> +++ b/osm/opensm/osm_ucast_updn.c >>> @@ -55,8 +55,62 @@ #include >>> #include >>> #include >>> #include >>> -#include >>> -#include >>> + >>> +/* //////////////////////////// */ >>> +/* Local types */ >>> +/* /////////////////////////// */ >>> + >>> +/* direction */ >>> +typedef enum _updn_switch_dir >>> +{ >>> + UP = 0, >>> + DOWN >>> +} updn_switch_dir_t; >>> + >>> +/* This enum respresent available states in the UPDN algorithm */ >>> +typedef enum _updn_state >>> +{ >>> + UPDN_INIT = 0, >>> + UPDN_RANK, >>> + UPDN_MIN_HOP_CALC, >>> +} updn_state_t; >>> + >>> +/* Rank value of this node */ >>> +typedef struct _updn_rank >>> +{ >>> + cl_map_item_t map_item; >>> + uint8_t rank; >>> +} updn_rank_t; >>> + >>> +/* Histogram element - the number of occurences of the same hop value */ >>> +typedef struct _updn_hist >>> +{ >>> + cl_map_item_t map_item; >>> + uint32_t bar_value; >>> +} updn_hist_t; >>> + >>> +typedef struct _updn_next_step >>> +{ >>> + updn_switch_dir_t state; >>> + osm_switch_t *p_sw; >>> +} updn_next_step_t; >>> + >>> +/* guids list */ >>> +typedef struct _updn_input >>> +{ >>> + uint32_t num_guids; >>> + uint64_t * guid_list; >>> +} updn_input_t; >>> + >>> +/* updn structure */ >>> +typedef struct _updn >>> +{ >>> + updn_state_t state; >>> + boolean_t auto_detect_root_nodes; >>> + cl_qmap_t guid_rank_tbl; >>> + updn_input_t updn_ucast_reg_inputs; >>> + cl_list_t * p_root_nodes; >>> +} updn_t; >>> >>> >>> /* ///////////////////////////////// */ >>> @@ -65,6 +119,11 @@ #include >>> /* This var is predefined and initialized */ >>> extern osm_opensm_t osm; >>> >>> +/* ///////////////////////////////// */ >>> +/* Statics */ >>> +/* ///////////////////////////////// */ >>> +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); >>> + >>> /********************************************************************** >>> **********************************************************************/ >>> /* This function returns direction based on rank and guid info of current & >>> @@ -471,7 +530,7 @@ __updn_bfs_by_node( >>> >>> /********************************************************************** >>> **********************************************************************/ >>> -void >>> +static void >>> updn_destroy( >>> IN updn_t* const p_updn ) >>> { >>> @@ -508,7 +567,7 @@ updn_destroy( >>> >>> /********************************************************************** >>> **********************************************************************/ >>> -updn_t* >>> +static updn_t* >>> updn_construct(void) >>> { >>> updn_t* p_updn; >>> @@ -523,7 +582,7 @@ updn_construct(void) >>> >>> /********************************************************************** >>> **********************************************************************/ >>> -cl_status_t >>> +static cl_status_t >>> updn_init( >>> IN updn_t* const p_updn ) >>> { >>> @@ -635,7 +694,7 @@ updn_init( >>> **********************************************************************/ >>> /* NOTE : PLS check if we need to decide that the first */ >>> /* rank is a SWITCH for BFS purpose */ >>> -int >>> +static int >>> updn_subn_rank( >>> IN uint64_t root_guid, >>> IN uint8_t base_rank, >>> @@ -795,7 +854,7 @@ updn_subn_rank( >>> >>> /********************************************************************** >>> **********************************************************************/ >>> -int >>> +static int >>> osm_subn_set_up_down_min_hop_table( >>> IN updn_t* p_updn ) >>> { >>> @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( >>> >>> /********************************************************************** >>> **********************************************************************/ >>> -int >>> +static int >>> osm_subn_calc_up_down_min_hop_table( >>> IN uint32_t num_guids, >>> IN uint64_t * guid_list, >>> @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( >>> /********************************************************************** >>> **********************************************************************/ >>> /* UPDN callback function */ >>> -int __osm_updn_call( >>> +static int __osm_updn_call( >>> void *ctx ) >>> { >>> OSM_LOG_ENTER(&(osm.log), __osm_updn_call); >>> @@ -969,7 +1028,7 @@ int __osm_updn_call( >>> /********************************************************************** >>> **********************************************************************/ >>> /* UPDN convert cl_list to guid array in updn struct */ >>> -void __osm_updn_convert_list2array( >>> +static void __osm_updn_convert_list2array( >>> IN updn_t * p_updn ) >>> { >>> uint32_t i = 0, max_num = 0; >>> @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( >>> /********************************************************************** >>> **********************************************************************/ >>> /* Find Root nodes automatically by Min Hop Table info */ >>> -int >>> +static int >>> osm_updn_find_root_nodes_by_min_hop( >>> OUT updn_t * p_updn ) >>> { >>> > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tziporet at dev.mellanox.co.il Mon Oct 23 00:28:46 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 23 Oct 2006 09:28:46 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: References: Message-ID: <453C6F2E.8040806@dev.mellanox.co.il> Hoang-Nam Nguyen wrote: > Now libehca is broken in ofed-1.1. How can we fix that until 1.2? > > I suggest that you fix what you want and create 1.0.1.1. You can place it in the svn releases area and direct people that need ehca to this version. We did something similar when we published 1.0.1 for which we added SLES9 SP3 support. Please place it on https://openib.org/svn/gen2/branches/1.1/ofed/releases/ (I assume you have a check-in permission to svn). Tziporet From dotanb at dev.mellanox.co.il Mon Oct 23 01:10:04 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 23 Oct 2006 10:10:04 +0200 Subject: [openib-general] [PATCH] [cma] qp_access_flags was changed to zero Message-ID: <1161591004.22381.1.camel@mtls05.yok.mtl.com> qp_access_flags was changed to zero (this attribute is for remote access permissions only and local write is invalid value). Signed-off-by: Dotan Barak --- Index: last_stable/drivers/infiniband/core/cma.c =================================================================== --- last_stable.orig/drivers/infiniband/core/cma.c 2006-06-03 12:38:27.000000000 +0300 +++ last_stable/drivers/infiniband/core/cma.c 2006-06-04 17:34:36.000000000 +0300 @@ -343,7 +343,7 @@ static int cma_init_ib_qp(struct rdma_id return ret; qp_attr.qp_state = IB_QPS_INIT; - qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + qp_attr.qp_access_flags = 0; qp_attr.port_num = id_priv->id.port_num; return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT); From halr at voltaire.com Mon Oct 23 02:13:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Oct 2006 05:13:49 -0400 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <453C6D47.307@mellanox.co.il> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> <20061022102429.GB29681@sashak.voltaire.com> <453C6D47.307@mellanox.co.il> Message-ID: <1161594784.25985.292177.camel@hal.voltaire.com> Eitan, On Mon, 2006-10-23 at 03:20, Eitan Zahavi wrote: > Hi Sasha, > > If we would like to change OpenSM coding style to not include __osm > prefix for > all static functions we should do it all over the code. Is there any value to __osm_ in the local function names ? If not, I don't really see the harm here. > Meanwhile lets keep the style as it is. I thought we all agreed to this > in the past. > It does not make sense to me to have a creeping style change one for > every developer involved. > > Should we start the thread for what should be our target style and > convert all files now? > If we do then lets agree on that - and then change. Do all such changes need to be hung on the yet to be determined coding style ? -- Hal > Thanks > > Eitan > > Sasha Khapyorsky wrote: > > On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: > > > >> Hi Sasha. > >> > >> One small comments: > >> > >> [snip] > >> > >>> osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > >>> ... > >>> osm_updn_find_root_nodes_by_min_hop( > >>> ... > >>> osm_subn_set_up_down_min_hop_table( > >>> ... > >>> osm_subn_calc_up_down_min_hop_table( > >>> ... > >>> > >> > >> Please add the "__" prefix to the static function names. > >> > > > > Then would be better to remove 'osm_' and '__osm_' prefixes in static > > names, but this will be function renaming, not just 'make static'. > > > > Sasha > > > > > >> Thanks. > >> > >> -- > >> Yevgeny > >> > >> Sasha Khapyorsky wrote: > >> > >>> This makes local functions static and moves definitions of locally used > >>> types to .c file. > >>> > >>> Signed-off-by: Sasha Khapyorsky > >>> --- > >>> osm/include/opensm/osm_opensm.h | 1 - > >>> osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- > >>> osm/opensm/osm_ucast_updn.c | 81 +++++++- > >>> 3 files changed, 70 insertions(+), 361 deletions(-) > >>> > >>> diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h > >>> index cb216a4..5557dbd 100644 > >>> --- a/osm/include/opensm/osm_opensm.h > >>> +++ b/osm/include/opensm/osm_opensm.h > >>> @@ -62,7 +62,6 @@ #include > >>> #include > >>> #include > >>> #include > >>> -#include > >>> > >>> #ifdef __cplusplus > >>> # define BEGIN_C_DECLS extern "C" { > >>> diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h > >>> index 4609e1b..c2a4376 100644 > >>> --- a/osm/include/opensm/osm_ucast_updn.h > >>> +++ b/osm/include/opensm/osm_ucast_updn.h > >>> @@ -71,363 +71,14 @@ BEGIN_C_DECLS > >>> /* ENUM TypeDefs */ > >>> /* /////////////////////////// */ > >>> > >>> -/* > >>> -* DESCRIPTION > >>> -* This enum respresent available directions of arcs in the graph > >>> -* SYNOPSIS > >>> -*/ > >>> -typedef enum _updn_switch_dir > >>> -{ > >>> - UP = 0, > >>> - DOWN > >>> -} updn_switch_dir_t; > >>> - > >>> -/* > >>> - * TYPE DEFINITIONS > >>> - * UP > >>> - * Current switch direction in propogating the subnet is up > >>> - * DOWN > >>> - * Current switch direction in propogating the subnet is down > >>> - * > >>> - */ > >>> - > >>> -/* > >>> -* DESCRIPTION > >>> -* This enum respresent available states in the UPDN algorithm > >>> -* SYNOPSIS > >>> -*/ > >>> -typedef enum _updn_state > >>> -{ > >>> - UPDN_INIT = 0, > >>> - UPDN_RANK, > >>> - UPDN_MIN_HOP_CALC, > >>> -} updn_state_t; > >>> - > >>> -/* > >>> - * TYPE DEFINITIONS > >>> - * UPDN_INIT - loading the package but still not performing anything > >>> - * UPDN_RANK - post ranking algorithm > >>> - * UPDN_MIN_HOP_CALC - post min hop table calculation > >>> - */ > >>> - > >>> /* ////////////////////////////////// */ > >>> /* Struct TypeDefs */ > >>> /* ///////////////////////////////// */ > >>> > >>> -/****s* UPDN: Rank element/updn_rank_t > >>> -* NAME > >>> -* updn_rank_t > >>> -* > >>> -* DESCRIPTION > >>> -* This object represents a rank type element in a list > >>> -* > >>> -* The updn_rank_t object should be treated as opaque and should > >>> -* be manipulated only through the provided functions. > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -typedef struct _updn_rank > >>> -{ > >>> - cl_map_item_t map_item; > >>> - uint8_t rank; > >>> -} updn_rank_t; > >>> - > >>> -/* > >>> -* FIELDS > >>> -* map_item > >>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > >>> -* > >>> -* rank > >>> -* Rank value of this node > >>> -* > >>> -*/ > >>> - > >>> -/****s* UPDN: Histogram element/updn_hist_t > >>> -* NAME > >>> -* updn_hist_t > >>> -* > >>> -* DESCRIPTION > >>> -* This object represents a histogram type element in a list > >>> -* > >>> -* The updn_hist_t object should be treated as opaque and should > >>> -* be manipulated only through the provided functions. > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -typedef struct _updn_hist > >>> -{ > >>> - cl_map_item_t map_item; > >>> - uint32_t bar_value; > >>> -} updn_hist_t; > >>> - > >>> -/* > >>> -* FIELDS > >>> -* map_item > >>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > >>> -* > >>> -* bar_value > >>> -* The number of occurences of the same hop value > >>> -* > >>> -*/ > >>> - > >>> -typedef struct _updn_next_step > >>> -{ > >>> - updn_switch_dir_t state; > >>> - osm_switch_t *p_sw; > >>> -} updn_next_step_t; > >>> - > >>> -/*****s* updn: updn/updn_input_t > >>> -* NAME updn_t > >>> -* > >>> -* > >>> -* DESCRIPTION > >>> -* updn input fields structure. > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -typedef struct _updn_input > >>> -{ > >>> - uint32_t num_guids; > >>> - uint64_t * guid_list; > >>> -} updn_input_t; > >>> - > >>> -/* > >>> -* FIELDS > >>> -* num_guids > >>> -* number of guids given at the UI > >>> -* > >>> -* guid_list > >>> -* guids specified as an array (converted from a list given in the UI) > >>> -* > >>> -* > >>> -* SEE ALSO > >>> -* > >>> -*********/ > >>> - > >>> -/*****s* updn: updn/updn_t > >>> -* NAME updn_t > >>> -* > >>> -* > >>> -* DESCRIPTION > >>> -* updn structure. > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -typedef struct _updn > >>> -{ > >>> - updn_state_t state; > >>> - boolean_t auto_detect_root_nodes; > >>> - cl_qmap_t guid_rank_tbl; > >>> - updn_input_t updn_ucast_reg_inputs; > >>> - cl_list_t * p_root_nodes; > >>> -} updn_t; > >>> - > >>> -/* > >>> -* FIELDS > >>> -* state > >>> -* state of the updn algorithm which basically should pass through Init > >>> -* - Ranking - UpDn algorithm > >>> -* > >>> -* guid_rank_tbl > >>> -* guid 2 rank mapping vector , indexed by guid in network order > >>> -* > >>> -* > >>> -* SEE ALSO > >>> -* > >>> -*********/ > >>> - > >>> /* ////////////////////////////// */ > >>> /* Function */ > >>> /* ////////////////////////////// */ > >>> > >>> -/***f** OpenSM: Updn/updn_construct > >>> -* NAME > >>> -* updn_construct > >>> -* > >>> -* DESCRIPTION > >>> -* Allocation of updn_t struct > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -updn_t* > >>> -updn_construct(void); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* > >>> -* > >>> -* RETURN VALUE > >>> -* Return a pointer to an updn struct. Null if fails to do so. > >>> -* > >>> -* NOTES > >>> -* First step of the creation of updn_t > >>> -*/ > >>> - > >>> -/****s* OpenSM: Updn/updn_destroy > >>> -* NAME > >>> -* updn_destroy > >>> -* > >>> -* DESCRIPTION > >>> -* release of updn_t struct > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -void > >>> -updn_destroy( > >>> - IN updn_t* const p_updn ); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* p_updn > >>> -* A pointer to the updn_t struct that is goining to be released > >>> -* > >>> -* RETURN VALUE > >>> -* > >>> -* NOTES > >>> -* Final step of the releasing of updn_t > >>> -* > >>> -* SEE ALSO > >>> -* updn_construct > >>> -*********/ > >>> - > >>> -/****f* OpenSM: Updn/updn_init > >>> -* NAME > >>> -* updn_init > >>> -* > >>> -* DESCRIPTION > >>> -* Initialization of an updn_t struct > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> -cl_status_t > >>> -updn_init( > >>> - IN updn_t* const p_updn ); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* p_updn > >>> -* A pointer to the updn_t struct that is goining to be initilized > >>> -* > >>> -* RETURN VALUE > >>> -* The status of the function. > >>> -* > >>> -* NOTES > >>> -* > >>> -* SEE ALSO > >>> -* updn_construct > >>> -********/ > >>> - > >>> -/****** OpenSM: Updn/updn_subn_rank > >>> -* NAME > >>> -* updn_subn_rank > >>> -* > >>> -* DESCRIPTION > >>> -* This function ranks the subnet for credit loop free algorithm > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> -int > >>> -updn_subn_rank( > >>> - IN uint64_t root_guid , > >>> - IN uint8_t base_rank, > >>> - IN updn_t* p_updn ); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* p_subn > >>> -* [in] Pointer to a Subnet object to construct. > >>> -* > >>> -* base_rank > >>> -* [in] The base ranking value (lowest value) > >>> -* > >>> -* p_updn > >>> -* [in] Pointer to updn structure which includes state & lid2rank table > >>> -* > >>> -* RETURN VALUE > >>> -* This function returns 0 when rankning has succeded , otherwise 1. > >>> -******/ > >>> - > >>> -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table > >>> -* NAME > >>> -* osm_subn_set_up_down_min_hop_table > >>> -* > >>> -* DESCRIPTION > >>> -* This function set min hop table of all switches by BFS through each > >>> -* port guid at the subnet using ranking done before. > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -int > >>> -osm_subn_set_up_down_min_hop_table( > >>> - IN updn_t* p_updn ); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* p_updn > >>> -* [in] Pointer to updn structure which includes state & lid2rank table > >>> -* > >>> -* RETURN VALUE > >>> -* This function returns 0 when rankning has succeded , otherwise 1. > >>> -******/ > >>> - > >>> -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table > >>> -* NAME > >>> -* osm_subn_calc_up_down_min_hop_table > >>> -* > >>> -* DESCRIPTION > >>> -* This function perform ranking and setting of all switches' min hop table > >>> -* by UP DOWN algorithm > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> - > >>> -int > >>> -osm_subn_calc_up_down_min_hop_table( > >>> - IN uint32_t num_guids, > >>> - IN uint64_t* guid_list, > >>> - IN updn_t* p_updn ); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* > >>> -* guid_list > >>> -* [in] Guid list from which to start ranking . > >>> -* > >>> -* p_updn > >>> -* [in] Pointer to updn structure which includes state & lid2rank table > >>> -* RETURN VALUE > >>> -* This function returns 0 when rankning has succeded , otherwise 1. > >>> -******/ > >>> - > >>> -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop > >>> -* NAME > >>> -* osm_updn_find_root_nodes_by_min_hop > >>> -* > >>> -* DESCRIPTION > >>> -* This function perform auto identification of root nodes for UPDN ranking phase > >>> -* > >>> -* SYNOPSIS > >>> -*/ > >>> -int > >>> -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); > >>> - > >>> -/* > >>> -* PARAMETERS > >>> -* p_root_nodes_list > >>> -* > >>> -* [out] Pointer to the root nodes list found in the subnet > >>> -* > >>> -* RETURN VALUE > >>> -* This function returns 0 when auto identification had succeeded > >>> -******/ > >>> - > >>> END_C_DECLS > >>> > >>> #endif /* _OSM_UCAST_UPDN_H_ */ > >>> diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > >>> index 86ac3ad..0121e6e 100644 > >>> --- a/osm/opensm/osm_ucast_updn.c > >>> +++ b/osm/opensm/osm_ucast_updn.c > >>> @@ -55,8 +55,62 @@ #include > >>> #include > >>> #include > >>> #include > >>> -#include > >>> -#include > >>> + > >>> +/* //////////////////////////// */ > >>> +/* Local types */ > >>> +/* /////////////////////////// */ > >>> + > >>> +/* direction */ > >>> +typedef enum _updn_switch_dir > >>> +{ > >>> + UP = 0, > >>> + DOWN > >>> +} updn_switch_dir_t; > >>> + > >>> +/* This enum respresent available states in the UPDN algorithm */ > >>> +typedef enum _updn_state > >>> +{ > >>> + UPDN_INIT = 0, > >>> + UPDN_RANK, > >>> + UPDN_MIN_HOP_CALC, > >>> +} updn_state_t; > >>> + > >>> +/* Rank value of this node */ > >>> +typedef struct _updn_rank > >>> +{ > >>> + cl_map_item_t map_item; > >>> + uint8_t rank; > >>> +} updn_rank_t; > >>> + > >>> +/* Histogram element - the number of occurences of the same hop value */ > >>> +typedef struct _updn_hist > >>> +{ > >>> + cl_map_item_t map_item; > >>> + uint32_t bar_value; > >>> +} updn_hist_t; > >>> + > >>> +typedef struct _updn_next_step > >>> +{ > >>> + updn_switch_dir_t state; > >>> + osm_switch_t *p_sw; > >>> +} updn_next_step_t; > >>> + > >>> +/* guids list */ > >>> +typedef struct _updn_input > >>> +{ > >>> + uint32_t num_guids; > >>> + uint64_t * guid_list; > >>> +} updn_input_t; > >>> + > >>> +/* updn structure */ > >>> +typedef struct _updn > >>> +{ > >>> + updn_state_t state; > >>> + boolean_t auto_detect_root_nodes; > >>> + cl_qmap_t guid_rank_tbl; > >>> + updn_input_t updn_ucast_reg_inputs; > >>> + cl_list_t * p_root_nodes; > >>> +} updn_t; > >>> > >>> > >>> /* ///////////////////////////////// */ > >>> @@ -65,6 +119,11 @@ #include > >>> /* This var is predefined and initialized */ > >>> extern osm_opensm_t osm; > >>> > >>> +/* ///////////////////////////////// */ > >>> +/* Statics */ > >>> +/* ///////////////////////////////// */ > >>> +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > >>> + > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* This function returns direction based on rank and guid info of current & > >>> @@ -471,7 +530,7 @@ __updn_bfs_by_node( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>> -void > >>> +static void > >>> updn_destroy( > >>> IN updn_t* const p_updn ) > >>> { > >>> @@ -508,7 +567,7 @@ updn_destroy( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>> -updn_t* > >>> +static updn_t* > >>> updn_construct(void) > >>> { > >>> updn_t* p_updn; > >>> @@ -523,7 +582,7 @@ updn_construct(void) > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>> -cl_status_t > >>> +static cl_status_t > >>> updn_init( > >>> IN updn_t* const p_updn ) > >>> { > >>> @@ -635,7 +694,7 @@ updn_init( > >>> **********************************************************************/ > >>> /* NOTE : PLS check if we need to decide that the first */ > >>> /* rank is a SWITCH for BFS purpose */ > >>> -int > >>> +static int > >>> updn_subn_rank( > >>> IN uint64_t root_guid, > >>> IN uint8_t base_rank, > >>> @@ -795,7 +854,7 @@ updn_subn_rank( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>> -int > >>> +static int > >>> osm_subn_set_up_down_min_hop_table( > >>> IN updn_t* p_updn ) > >>> { > >>> @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>> -int > >>> +static int > >>> osm_subn_calc_up_down_min_hop_table( > >>> IN uint32_t num_guids, > >>> IN uint64_t * guid_list, > >>> @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* UPDN callback function */ > >>> -int __osm_updn_call( > >>> +static int __osm_updn_call( > >>> void *ctx ) > >>> { > >>> OSM_LOG_ENTER(&(osm.log), __osm_updn_call); > >>> @@ -969,7 +1028,7 @@ int __osm_updn_call( > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* UPDN convert cl_list to guid array in updn struct */ > >>> -void __osm_updn_convert_list2array( > >>> +static void __osm_updn_convert_list2array( > >>> IN updn_t * p_updn ) > >>> { > >>> uint32_t i = 0, max_num = 0; > >>> @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* Find Root nodes automatically by Min Hop Table info */ > >>> -int > >>> +static int > >>> osm_updn_find_root_nodes_by_min_hop( > >>> OUT updn_t * p_updn ) > >>> { > >>> > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From eitan at mellanox.co.il Mon Oct 23 03:07:55 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 23 Oct 2006 12:07:55 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <1161594784.25985.292177.camel@hal.voltaire.com> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> <20061022102429.GB29681@sashak.voltaire.com> <453C6D47.307@mellanox.co.il> <1161594784.25985.292177.camel@hal.voltaire.com> Message-ID: <453C947B.7070903@mellanox.co.il> Hal Rosenstock wrote: > Eitan, > > On Mon, 2006-10-23 at 03:20, Eitan Zahavi wrote: > >> Hi Sasha, >> >> If we would like to change OpenSM coding style to not include __osm >> prefix for >> all static functions we should do it all over the code. >> > > Is there any value to __osm_ in the local function names ? If not, I > don't really see the harm here. > Yes there is value in keeping a consistent code style across a project. Every project I know has a style. OpenSM style is there for many years. We can change it if we like but let us do it consciously and on the entire tree. > >> Meanwhile lets keep the style as it is. I thought we all agreed to this >> in the past. >> It does not make sense to me to have a creeping style change one for >> every developer involved. >> >> Should we start the thread for what should be our target style and >> convert all files now? >> If we do then lets agree on that - and then change. >> > > Do all such changes need to be hung on the yet to be determined coding > style ? > YES - No coding style changes should be allowed on a per checkin basis. Otherwise we turn the coding style into a mess. > -- Hal > > >> Thanks >> >> Eitan >> >> Sasha Khapyorsky wrote: >> >>> On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: >>> >>> >>>> Hi Sasha. >>>> >>>> One small comments: >>>> >>>> [snip] >>>> >>>> >>>>> osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); >>>>> ... >>>>> osm_updn_find_root_nodes_by_min_hop( >>>>> ... >>>>> osm_subn_set_up_down_min_hop_table( >>>>> ... >>>>> osm_subn_calc_up_down_min_hop_table( >>>>> ... >>>>> >>>>> >>>> >>>> Please add the "__" prefix to the static function names. >>>> >>>> >>> Then would be better to remove 'osm_' and '__osm_' prefixes in static >>> names, but this will be function renaming, not just 'make static'. >>> >>> Sasha >>> >>> >>> >>>> Thanks. >>>> >>>> -- >>>> Yevgeny >>>> >>>> Sasha Khapyorsky wrote: >>>> >>>> >>>>> This makes local functions static and moves definitions of locally used >>>>> types to .c file. >>>>> >>>>> Signed-off-by: Sasha Khapyorsky >>>>> --- >>>>> osm/include/opensm/osm_opensm.h | 1 - >>>>> osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- >>>>> osm/opensm/osm_ucast_updn.c | 81 +++++++- >>>>> 3 files changed, 70 insertions(+), 361 deletions(-) >>>>> >>>>> diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h >>>>> index cb216a4..5557dbd 100644 >>>>> --- a/osm/include/opensm/osm_opensm.h >>>>> +++ b/osm/include/opensm/osm_opensm.h >>>>> @@ -62,7 +62,6 @@ #include >>>>> #include >>>>> #include >>>>> #include >>>>> -#include >>>>> >>>>> #ifdef __cplusplus >>>>> # define BEGIN_C_DECLS extern "C" { >>>>> diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h >>>>> index 4609e1b..c2a4376 100644 >>>>> --- a/osm/include/opensm/osm_ucast_updn.h >>>>> +++ b/osm/include/opensm/osm_ucast_updn.h >>>>> @@ -71,363 +71,14 @@ BEGIN_C_DECLS >>>>> /* ENUM TypeDefs */ >>>>> /* /////////////////////////// */ >>>>> >>>>> -/* >>>>> -* DESCRIPTION >>>>> -* This enum respresent available directions of arcs in the graph >>>>> -* SYNOPSIS >>>>> -*/ >>>>> -typedef enum _updn_switch_dir >>>>> -{ >>>>> - UP = 0, >>>>> - DOWN >>>>> -} updn_switch_dir_t; >>>>> - >>>>> -/* >>>>> - * TYPE DEFINITIONS >>>>> - * UP >>>>> - * Current switch direction in propogating the subnet is up >>>>> - * DOWN >>>>> - * Current switch direction in propogating the subnet is down >>>>> - * >>>>> - */ >>>>> - >>>>> -/* >>>>> -* DESCRIPTION >>>>> -* This enum respresent available states in the UPDN algorithm >>>>> -* SYNOPSIS >>>>> -*/ >>>>> -typedef enum _updn_state >>>>> -{ >>>>> - UPDN_INIT = 0, >>>>> - UPDN_RANK, >>>>> - UPDN_MIN_HOP_CALC, >>>>> -} updn_state_t; >>>>> - >>>>> -/* >>>>> - * TYPE DEFINITIONS >>>>> - * UPDN_INIT - loading the package but still not performing anything >>>>> - * UPDN_RANK - post ranking algorithm >>>>> - * UPDN_MIN_HOP_CALC - post min hop table calculation >>>>> - */ >>>>> - >>>>> /* ////////////////////////////////// */ >>>>> /* Struct TypeDefs */ >>>>> /* ///////////////////////////////// */ >>>>> >>>>> -/****s* UPDN: Rank element/updn_rank_t >>>>> -* NAME >>>>> -* updn_rank_t >>>>> -* >>>>> -* DESCRIPTION >>>>> -* This object represents a rank type element in a list >>>>> -* >>>>> -* The updn_rank_t object should be treated as opaque and should >>>>> -* be manipulated only through the provided functions. >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -typedef struct _updn_rank >>>>> -{ >>>>> - cl_map_item_t map_item; >>>>> - uint8_t rank; >>>>> -} updn_rank_t; >>>>> - >>>>> -/* >>>>> -* FIELDS >>>>> -* map_item >>>>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! >>>>> -* >>>>> -* rank >>>>> -* Rank value of this node >>>>> -* >>>>> -*/ >>>>> - >>>>> -/****s* UPDN: Histogram element/updn_hist_t >>>>> -* NAME >>>>> -* updn_hist_t >>>>> -* >>>>> -* DESCRIPTION >>>>> -* This object represents a histogram type element in a list >>>>> -* >>>>> -* The updn_hist_t object should be treated as opaque and should >>>>> -* be manipulated only through the provided functions. >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -typedef struct _updn_hist >>>>> -{ >>>>> - cl_map_item_t map_item; >>>>> - uint32_t bar_value; >>>>> -} updn_hist_t; >>>>> - >>>>> -/* >>>>> -* FIELDS >>>>> -* map_item >>>>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! >>>>> -* >>>>> -* bar_value >>>>> -* The number of occurences of the same hop value >>>>> -* >>>>> -*/ >>>>> - >>>>> -typedef struct _updn_next_step >>>>> -{ >>>>> - updn_switch_dir_t state; >>>>> - osm_switch_t *p_sw; >>>>> -} updn_next_step_t; >>>>> - >>>>> -/*****s* updn: updn/updn_input_t >>>>> -* NAME updn_t >>>>> -* >>>>> -* >>>>> -* DESCRIPTION >>>>> -* updn input fields structure. >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -typedef struct _updn_input >>>>> -{ >>>>> - uint32_t num_guids; >>>>> - uint64_t * guid_list; >>>>> -} updn_input_t; >>>>> - >>>>> -/* >>>>> -* FIELDS >>>>> -* num_guids >>>>> -* number of guids given at the UI >>>>> -* >>>>> -* guid_list >>>>> -* guids specified as an array (converted from a list given in the UI) >>>>> -* >>>>> -* >>>>> -* SEE ALSO >>>>> -* >>>>> -*********/ >>>>> - >>>>> -/*****s* updn: updn/updn_t >>>>> -* NAME updn_t >>>>> -* >>>>> -* >>>>> -* DESCRIPTION >>>>> -* updn structure. >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -typedef struct _updn >>>>> -{ >>>>> - updn_state_t state; >>>>> - boolean_t auto_detect_root_nodes; >>>>> - cl_qmap_t guid_rank_tbl; >>>>> - updn_input_t updn_ucast_reg_inputs; >>>>> - cl_list_t * p_root_nodes; >>>>> -} updn_t; >>>>> - >>>>> -/* >>>>> -* FIELDS >>>>> -* state >>>>> -* state of the updn algorithm which basically should pass through Init >>>>> -* - Ranking - UpDn algorithm >>>>> -* >>>>> -* guid_rank_tbl >>>>> -* guid 2 rank mapping vector , indexed by guid in network order >>>>> -* >>>>> -* >>>>> -* SEE ALSO >>>>> -* >>>>> -*********/ >>>>> - >>>>> /* ////////////////////////////// */ >>>>> /* Function */ >>>>> /* ////////////////////////////// */ >>>>> >>>>> -/***f** OpenSM: Updn/updn_construct >>>>> -* NAME >>>>> -* updn_construct >>>>> -* >>>>> -* DESCRIPTION >>>>> -* Allocation of updn_t struct >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -updn_t* >>>>> -updn_construct(void); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* >>>>> -* >>>>> -* RETURN VALUE >>>>> -* Return a pointer to an updn struct. Null if fails to do so. >>>>> -* >>>>> -* NOTES >>>>> -* First step of the creation of updn_t >>>>> -*/ >>>>> - >>>>> -/****s* OpenSM: Updn/updn_destroy >>>>> -* NAME >>>>> -* updn_destroy >>>>> -* >>>>> -* DESCRIPTION >>>>> -* release of updn_t struct >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -void >>>>> -updn_destroy( >>>>> - IN updn_t* const p_updn ); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* p_updn >>>>> -* A pointer to the updn_t struct that is goining to be released >>>>> -* >>>>> -* RETURN VALUE >>>>> -* >>>>> -* NOTES >>>>> -* Final step of the releasing of updn_t >>>>> -* >>>>> -* SEE ALSO >>>>> -* updn_construct >>>>> -*********/ >>>>> - >>>>> -/****f* OpenSM: Updn/updn_init >>>>> -* NAME >>>>> -* updn_init >>>>> -* >>>>> -* DESCRIPTION >>>>> -* Initialization of an updn_t struct >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> -cl_status_t >>>>> -updn_init( >>>>> - IN updn_t* const p_updn ); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* p_updn >>>>> -* A pointer to the updn_t struct that is goining to be initilized >>>>> -* >>>>> -* RETURN VALUE >>>>> -* The status of the function. >>>>> -* >>>>> -* NOTES >>>>> -* >>>>> -* SEE ALSO >>>>> -* updn_construct >>>>> -********/ >>>>> - >>>>> -/****** OpenSM: Updn/updn_subn_rank >>>>> -* NAME >>>>> -* updn_subn_rank >>>>> -* >>>>> -* DESCRIPTION >>>>> -* This function ranks the subnet for credit loop free algorithm >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> -int >>>>> -updn_subn_rank( >>>>> - IN uint64_t root_guid , >>>>> - IN uint8_t base_rank, >>>>> - IN updn_t* p_updn ); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* p_subn >>>>> -* [in] Pointer to a Subnet object to construct. >>>>> -* >>>>> -* base_rank >>>>> -* [in] The base ranking value (lowest value) >>>>> -* >>>>> -* p_updn >>>>> -* [in] Pointer to updn structure which includes state & lid2rank table >>>>> -* >>>>> -* RETURN VALUE >>>>> -* This function returns 0 when rankning has succeded , otherwise 1. >>>>> -******/ >>>>> - >>>>> -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table >>>>> -* NAME >>>>> -* osm_subn_set_up_down_min_hop_table >>>>> -* >>>>> -* DESCRIPTION >>>>> -* This function set min hop table of all switches by BFS through each >>>>> -* port guid at the subnet using ranking done before. >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -int >>>>> -osm_subn_set_up_down_min_hop_table( >>>>> - IN updn_t* p_updn ); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* p_updn >>>>> -* [in] Pointer to updn structure which includes state & lid2rank table >>>>> -* >>>>> -* RETURN VALUE >>>>> -* This function returns 0 when rankning has succeded , otherwise 1. >>>>> -******/ >>>>> - >>>>> -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table >>>>> -* NAME >>>>> -* osm_subn_calc_up_down_min_hop_table >>>>> -* >>>>> -* DESCRIPTION >>>>> -* This function perform ranking and setting of all switches' min hop table >>>>> -* by UP DOWN algorithm >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> - >>>>> -int >>>>> -osm_subn_calc_up_down_min_hop_table( >>>>> - IN uint32_t num_guids, >>>>> - IN uint64_t* guid_list, >>>>> - IN updn_t* p_updn ); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* >>>>> -* guid_list >>>>> -* [in] Guid list from which to start ranking . >>>>> -* >>>>> -* p_updn >>>>> -* [in] Pointer to updn structure which includes state & lid2rank table >>>>> -* RETURN VALUE >>>>> -* This function returns 0 when rankning has succeded , otherwise 1. >>>>> -******/ >>>>> - >>>>> -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop >>>>> -* NAME >>>>> -* osm_updn_find_root_nodes_by_min_hop >>>>> -* >>>>> -* DESCRIPTION >>>>> -* This function perform auto identification of root nodes for UPDN ranking phase >>>>> -* >>>>> -* SYNOPSIS >>>>> -*/ >>>>> -int >>>>> -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); >>>>> - >>>>> -/* >>>>> -* PARAMETERS >>>>> -* p_root_nodes_list >>>>> -* >>>>> -* [out] Pointer to the root nodes list found in the subnet >>>>> -* >>>>> -* RETURN VALUE >>>>> -* This function returns 0 when auto identification had succeeded >>>>> -******/ >>>>> - >>>>> END_C_DECLS >>>>> >>>>> #endif /* _OSM_UCAST_UPDN_H_ */ >>>>> diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c >>>>> index 86ac3ad..0121e6e 100644 >>>>> --- a/osm/opensm/osm_ucast_updn.c >>>>> +++ b/osm/opensm/osm_ucast_updn.c >>>>> @@ -55,8 +55,62 @@ #include >>>>> #include >>>>> #include >>>>> #include >>>>> -#include >>>>> -#include >>>>> + >>>>> +/* //////////////////////////// */ >>>>> +/* Local types */ >>>>> +/* /////////////////////////// */ >>>>> + >>>>> +/* direction */ >>>>> +typedef enum _updn_switch_dir >>>>> +{ >>>>> + UP = 0, >>>>> + DOWN >>>>> +} updn_switch_dir_t; >>>>> + >>>>> +/* This enum respresent available states in the UPDN algorithm */ >>>>> +typedef enum _updn_state >>>>> +{ >>>>> + UPDN_INIT = 0, >>>>> + UPDN_RANK, >>>>> + UPDN_MIN_HOP_CALC, >>>>> +} updn_state_t; >>>>> + >>>>> +/* Rank value of this node */ >>>>> +typedef struct _updn_rank >>>>> +{ >>>>> + cl_map_item_t map_item; >>>>> + uint8_t rank; >>>>> +} updn_rank_t; >>>>> + >>>>> +/* Histogram element - the number of occurences of the same hop value */ >>>>> +typedef struct _updn_hist >>>>> +{ >>>>> + cl_map_item_t map_item; >>>>> + uint32_t bar_value; >>>>> +} updn_hist_t; >>>>> + >>>>> +typedef struct _updn_next_step >>>>> +{ >>>>> + updn_switch_dir_t state; >>>>> + osm_switch_t *p_sw; >>>>> +} updn_next_step_t; >>>>> + >>>>> +/* guids list */ >>>>> +typedef struct _updn_input >>>>> +{ >>>>> + uint32_t num_guids; >>>>> + uint64_t * guid_list; >>>>> +} updn_input_t; >>>>> + >>>>> +/* updn structure */ >>>>> +typedef struct _updn >>>>> +{ >>>>> + updn_state_t state; >>>>> + boolean_t auto_detect_root_nodes; >>>>> + cl_qmap_t guid_rank_tbl; >>>>> + updn_input_t updn_ucast_reg_inputs; >>>>> + cl_list_t * p_root_nodes; >>>>> +} updn_t; >>>>> >>>>> >>>>> /* ///////////////////////////////// */ >>>>> @@ -65,6 +119,11 @@ #include >>>>> /* This var is predefined and initialized */ >>>>> extern osm_opensm_t osm; >>>>> >>>>> +/* ///////////////////////////////// */ >>>>> +/* Statics */ >>>>> +/* ///////////////////////////////// */ >>>>> +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); >>>>> + >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> /* This function returns direction based on rank and guid info of current & >>>>> @@ -471,7 +530,7 @@ __updn_bfs_by_node( >>>>> >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> -void >>>>> +static void >>>>> updn_destroy( >>>>> IN updn_t* const p_updn ) >>>>> { >>>>> @@ -508,7 +567,7 @@ updn_destroy( >>>>> >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> -updn_t* >>>>> +static updn_t* >>>>> updn_construct(void) >>>>> { >>>>> updn_t* p_updn; >>>>> @@ -523,7 +582,7 @@ updn_construct(void) >>>>> >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> -cl_status_t >>>>> +static cl_status_t >>>>> updn_init( >>>>> IN updn_t* const p_updn ) >>>>> { >>>>> @@ -635,7 +694,7 @@ updn_init( >>>>> **********************************************************************/ >>>>> /* NOTE : PLS check if we need to decide that the first */ >>>>> /* rank is a SWITCH for BFS purpose */ >>>>> -int >>>>> +static int >>>>> updn_subn_rank( >>>>> IN uint64_t root_guid, >>>>> IN uint8_t base_rank, >>>>> @@ -795,7 +854,7 @@ updn_subn_rank( >>>>> >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> -int >>>>> +static int >>>>> osm_subn_set_up_down_min_hop_table( >>>>> IN updn_t* p_updn ) >>>>> { >>>>> @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( >>>>> >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> -int >>>>> +static int >>>>> osm_subn_calc_up_down_min_hop_table( >>>>> IN uint32_t num_guids, >>>>> IN uint64_t * guid_list, >>>>> @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> /* UPDN callback function */ >>>>> -int __osm_updn_call( >>>>> +static int __osm_updn_call( >>>>> void *ctx ) >>>>> { >>>>> OSM_LOG_ENTER(&(osm.log), __osm_updn_call); >>>>> @@ -969,7 +1028,7 @@ int __osm_updn_call( >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> /* UPDN convert cl_list to guid array in updn struct */ >>>>> -void __osm_updn_convert_list2array( >>>>> +static void __osm_updn_convert_list2array( >>>>> IN updn_t * p_updn ) >>>>> { >>>>> uint32_t i = 0, max_num = 0; >>>>> @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( >>>>> /********************************************************************** >>>>> **********************************************************************/ >>>>> /* Find Root nodes automatically by Min Hop Table info */ >>>>> -int >>>>> +static int >>>>> osm_updn_find_root_nodes_by_min_hop( >>>>> OUT updn_t * p_updn ) >>>>> { >>>>> >>>>> >>> _______________________________________________ >>> openib-general mailing list >>> openib-general at openib.org >>> http://openib.org/mailman/listinfo/openib-general >>> >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>> >>> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Mon Oct 23 03:31:32 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Oct 2006 06:31:32 -0400 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <453C947B.7070903@mellanox.co.il> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> <20061022102429.GB29681@sashak.voltaire.com> <453C6D47.307@mellanox.co.il> <1161594784.25985.292177.camel@hal.voltaire.com> <453C947B.7070903@mellanox.co.il> Message-ID: <1161599481.25985.295079.camel@hal.voltaire.com> On Mon, 2006-10-23 at 06:07, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > Eitan, > > > > On Mon, 2006-10-23 at 03:20, Eitan Zahavi wrote: > > > >> Hi Sasha, > >> > >> If we would like to change OpenSM coding style to not include __osm > >> prefix for > >> all static functions we should do it all over the code. > >> > > > > Is there any value to __osm_ in the local function names ? If not, I > > don't really see the harm here. > > > Yes there is value in keeping a consistent code style across a project. > Every project I know has a style. > OpenSM style is there for many years. Sure but this was a much smaller question to which I would like to hear your answer. In this particular case, (other than consistency with the current defacto coding style as I am not sure where this aspect is documented), is there value in adding __osm_ to local function names ? > We can change it if we like but > let us do it consciously and on the entire tree. Sure but it can be done incrementally rather than all at once as many things have done in the past. > >> Meanwhile lets keep the style as it is. I thought we all agreed to this > >> in the past. > >> It does not make sense to me to have a creeping style change one for > >> every developer involved. > >> > >> Should we start the thread for what should be our target style and > >> convert all files now? > >> If we do then lets agree on that - and then change. > >> > > > > Do all such changes need to be hung on the yet to be determined coding > > style ? > > > YES - No coding style changes should be allowed on a per checkin basis. > Otherwise we turn the coding style into a mess. This is a little too heavy handed. Certain things can be accomplished without the grand vision of an agreed upon (updated) coding style. -- Hal > > > -- Hal > > > > > >> Thanks > >> > >> Eitan > >> > >> Sasha Khapyorsky wrote: > >> > >>> On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: > >>> > >>> > >>>> Hi Sasha. > >>>> > >>>> One small comments: > >>>> > >>>> [snip] > >>>> > >>>> > >>>>> osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > >>>>> ... > >>>>> osm_updn_find_root_nodes_by_min_hop( > >>>>> ... > >>>>> osm_subn_set_up_down_min_hop_table( > >>>>> ... > >>>>> osm_subn_calc_up_down_min_hop_table( > >>>>> ... > >>>>> > >>>>> > >>>> > >>>> Please add the "__" prefix to the static function names. > >>>> > >>>> > >>> Then would be better to remove 'osm_' and '__osm_' prefixes in static > >>> names, but this will be function renaming, not just 'make static'. > >>> > >>> Sasha > >>> > >>> > >>> > >>>> Thanks. > >>>> > >>>> -- > >>>> Yevgeny > >>>> > >>>> Sasha Khapyorsky wrote: > >>>> > >>>> > >>>>> This makes local functions static and moves definitions of locally used > >>>>> types to .c file. > >>>>> > >>>>> Signed-off-by: Sasha Khapyorsky > >>>>> --- > >>>>> osm/include/opensm/osm_opensm.h | 1 - > >>>>> osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- > >>>>> osm/opensm/osm_ucast_updn.c | 81 +++++++- > >>>>> 3 files changed, 70 insertions(+), 361 deletions(-) > >>>>> > >>>>> diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h > >>>>> index cb216a4..5557dbd 100644 > >>>>> --- a/osm/include/opensm/osm_opensm.h > >>>>> +++ b/osm/include/opensm/osm_opensm.h > >>>>> @@ -62,7 +62,6 @@ #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> -#include > >>>>> > >>>>> #ifdef __cplusplus > >>>>> # define BEGIN_C_DECLS extern "C" { > >>>>> diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h > >>>>> index 4609e1b..c2a4376 100644 > >>>>> --- a/osm/include/opensm/osm_ucast_updn.h > >>>>> +++ b/osm/include/opensm/osm_ucast_updn.h > >>>>> @@ -71,363 +71,14 @@ BEGIN_C_DECLS > >>>>> /* ENUM TypeDefs */ > >>>>> /* /////////////////////////// */ > >>>>> > >>>>> -/* > >>>>> -* DESCRIPTION > >>>>> -* This enum respresent available directions of arcs in the graph > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> -typedef enum _updn_switch_dir > >>>>> -{ > >>>>> - UP = 0, > >>>>> - DOWN > >>>>> -} updn_switch_dir_t; > >>>>> - > >>>>> -/* > >>>>> - * TYPE DEFINITIONS > >>>>> - * UP > >>>>> - * Current switch direction in propogating the subnet is up > >>>>> - * DOWN > >>>>> - * Current switch direction in propogating the subnet is down > >>>>> - * > >>>>> - */ > >>>>> - > >>>>> -/* > >>>>> -* DESCRIPTION > >>>>> -* This enum respresent available states in the UPDN algorithm > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> -typedef enum _updn_state > >>>>> -{ > >>>>> - UPDN_INIT = 0, > >>>>> - UPDN_RANK, > >>>>> - UPDN_MIN_HOP_CALC, > >>>>> -} updn_state_t; > >>>>> - > >>>>> -/* > >>>>> - * TYPE DEFINITIONS > >>>>> - * UPDN_INIT - loading the package but still not performing anything > >>>>> - * UPDN_RANK - post ranking algorithm > >>>>> - * UPDN_MIN_HOP_CALC - post min hop table calculation > >>>>> - */ > >>>>> - > >>>>> /* ////////////////////////////////// */ > >>>>> /* Struct TypeDefs */ > >>>>> /* ///////////////////////////////// */ > >>>>> > >>>>> -/****s* UPDN: Rank element/updn_rank_t > >>>>> -* NAME > >>>>> -* updn_rank_t > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* This object represents a rank type element in a list > >>>>> -* > >>>>> -* The updn_rank_t object should be treated as opaque and should > >>>>> -* be manipulated only through the provided functions. > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -typedef struct _updn_rank > >>>>> -{ > >>>>> - cl_map_item_t map_item; > >>>>> - uint8_t rank; > >>>>> -} updn_rank_t; > >>>>> - > >>>>> -/* > >>>>> -* FIELDS > >>>>> -* map_item > >>>>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > >>>>> -* > >>>>> -* rank > >>>>> -* Rank value of this node > >>>>> -* > >>>>> -*/ > >>>>> - > >>>>> -/****s* UPDN: Histogram element/updn_hist_t > >>>>> -* NAME > >>>>> -* updn_hist_t > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* This object represents a histogram type element in a list > >>>>> -* > >>>>> -* The updn_hist_t object should be treated as opaque and should > >>>>> -* be manipulated only through the provided functions. > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -typedef struct _updn_hist > >>>>> -{ > >>>>> - cl_map_item_t map_item; > >>>>> - uint32_t bar_value; > >>>>> -} updn_hist_t; > >>>>> - > >>>>> -/* > >>>>> -* FIELDS > >>>>> -* map_item > >>>>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > >>>>> -* > >>>>> -* bar_value > >>>>> -* The number of occurences of the same hop value > >>>>> -* > >>>>> -*/ > >>>>> - > >>>>> -typedef struct _updn_next_step > >>>>> -{ > >>>>> - updn_switch_dir_t state; > >>>>> - osm_switch_t *p_sw; > >>>>> -} updn_next_step_t; > >>>>> - > >>>>> -/*****s* updn: updn/updn_input_t > >>>>> -* NAME updn_t > >>>>> -* > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* updn input fields structure. > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -typedef struct _updn_input > >>>>> -{ > >>>>> - uint32_t num_guids; > >>>>> - uint64_t * guid_list; > >>>>> -} updn_input_t; > >>>>> - > >>>>> -/* > >>>>> -* FIELDS > >>>>> -* num_guids > >>>>> -* number of guids given at the UI > >>>>> -* > >>>>> -* guid_list > >>>>> -* guids specified as an array (converted from a list given in the UI) > >>>>> -* > >>>>> -* > >>>>> -* SEE ALSO > >>>>> -* > >>>>> -*********/ > >>>>> - > >>>>> -/*****s* updn: updn/updn_t > >>>>> -* NAME updn_t > >>>>> -* > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* updn structure. > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -typedef struct _updn > >>>>> -{ > >>>>> - updn_state_t state; > >>>>> - boolean_t auto_detect_root_nodes; > >>>>> - cl_qmap_t guid_rank_tbl; > >>>>> - updn_input_t updn_ucast_reg_inputs; > >>>>> - cl_list_t * p_root_nodes; > >>>>> -} updn_t; > >>>>> - > >>>>> -/* > >>>>> -* FIELDS > >>>>> -* state > >>>>> -* state of the updn algorithm which basically should pass through Init > >>>>> -* - Ranking - UpDn algorithm > >>>>> -* > >>>>> -* guid_rank_tbl > >>>>> -* guid 2 rank mapping vector , indexed by guid in network order > >>>>> -* > >>>>> -* > >>>>> -* SEE ALSO > >>>>> -* > >>>>> -*********/ > >>>>> - > >>>>> /* ////////////////////////////// */ > >>>>> /* Function */ > >>>>> /* ////////////////////////////// */ > >>>>> > >>>>> -/***f** OpenSM: Updn/updn_construct > >>>>> -* NAME > >>>>> -* updn_construct > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* Allocation of updn_t struct > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -updn_t* > >>>>> -updn_construct(void); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* > >>>>> -* > >>>>> -* RETURN VALUE > >>>>> -* Return a pointer to an updn struct. Null if fails to do so. > >>>>> -* > >>>>> -* NOTES > >>>>> -* First step of the creation of updn_t > >>>>> -*/ > >>>>> - > >>>>> -/****s* OpenSM: Updn/updn_destroy > >>>>> -* NAME > >>>>> -* updn_destroy > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* release of updn_t struct > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -void > >>>>> -updn_destroy( > >>>>> - IN updn_t* const p_updn ); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* p_updn > >>>>> -* A pointer to the updn_t struct that is goining to be released > >>>>> -* > >>>>> -* RETURN VALUE > >>>>> -* > >>>>> -* NOTES > >>>>> -* Final step of the releasing of updn_t > >>>>> -* > >>>>> -* SEE ALSO > >>>>> -* updn_construct > >>>>> -*********/ > >>>>> - > >>>>> -/****f* OpenSM: Updn/updn_init > >>>>> -* NAME > >>>>> -* updn_init > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* Initialization of an updn_t struct > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> -cl_status_t > >>>>> -updn_init( > >>>>> - IN updn_t* const p_updn ); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* p_updn > >>>>> -* A pointer to the updn_t struct that is goining to be initilized > >>>>> -* > >>>>> -* RETURN VALUE > >>>>> -* The status of the function. > >>>>> -* > >>>>> -* NOTES > >>>>> -* > >>>>> -* SEE ALSO > >>>>> -* updn_construct > >>>>> -********/ > >>>>> - > >>>>> -/****** OpenSM: Updn/updn_subn_rank > >>>>> -* NAME > >>>>> -* updn_subn_rank > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* This function ranks the subnet for credit loop free algorithm > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> -int > >>>>> -updn_subn_rank( > >>>>> - IN uint64_t root_guid , > >>>>> - IN uint8_t base_rank, > >>>>> - IN updn_t* p_updn ); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* p_subn > >>>>> -* [in] Pointer to a Subnet object to construct. > >>>>> -* > >>>>> -* base_rank > >>>>> -* [in] The base ranking value (lowest value) > >>>>> -* > >>>>> -* p_updn > >>>>> -* [in] Pointer to updn structure which includes state & lid2rank table > >>>>> -* > >>>>> -* RETURN VALUE > >>>>> -* This function returns 0 when rankning has succeded , otherwise 1. > >>>>> -******/ > >>>>> - > >>>>> -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table > >>>>> -* NAME > >>>>> -* osm_subn_set_up_down_min_hop_table > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* This function set min hop table of all switches by BFS through each > >>>>> -* port guid at the subnet using ranking done before. > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -int > >>>>> -osm_subn_set_up_down_min_hop_table( > >>>>> - IN updn_t* p_updn ); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* p_updn > >>>>> -* [in] Pointer to updn structure which includes state & lid2rank table > >>>>> -* > >>>>> -* RETURN VALUE > >>>>> -* This function returns 0 when rankning has succeded , otherwise 1. > >>>>> -******/ > >>>>> - > >>>>> -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table > >>>>> -* NAME > >>>>> -* osm_subn_calc_up_down_min_hop_table > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* This function perform ranking and setting of all switches' min hop table > >>>>> -* by UP DOWN algorithm > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> - > >>>>> -int > >>>>> -osm_subn_calc_up_down_min_hop_table( > >>>>> - IN uint32_t num_guids, > >>>>> - IN uint64_t* guid_list, > >>>>> - IN updn_t* p_updn ); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* > >>>>> -* guid_list > >>>>> -* [in] Guid list from which to start ranking . > >>>>> -* > >>>>> -* p_updn > >>>>> -* [in] Pointer to updn structure which includes state & lid2rank table > >>>>> -* RETURN VALUE > >>>>> -* This function returns 0 when rankning has succeded , otherwise 1. > >>>>> -******/ > >>>>> - > >>>>> -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop > >>>>> -* NAME > >>>>> -* osm_updn_find_root_nodes_by_min_hop > >>>>> -* > >>>>> -* DESCRIPTION > >>>>> -* This function perform auto identification of root nodes for UPDN ranking phase > >>>>> -* > >>>>> -* SYNOPSIS > >>>>> -*/ > >>>>> -int > >>>>> -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); > >>>>> - > >>>>> -/* > >>>>> -* PARAMETERS > >>>>> -* p_root_nodes_list > >>>>> -* > >>>>> -* [out] Pointer to the root nodes list found in the subnet > >>>>> -* > >>>>> -* RETURN VALUE > >>>>> -* This function returns 0 when auto identification had succeeded > >>>>> -******/ > >>>>> - > >>>>> END_C_DECLS > >>>>> > >>>>> #endif /* _OSM_UCAST_UPDN_H_ */ > >>>>> diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > >>>>> index 86ac3ad..0121e6e 100644 > >>>>> --- a/osm/opensm/osm_ucast_updn.c > >>>>> +++ b/osm/opensm/osm_ucast_updn.c > >>>>> @@ -55,8 +55,62 @@ #include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> -#include > >>>>> -#include > >>>>> + > >>>>> +/* //////////////////////////// */ > >>>>> +/* Local types */ > >>>>> +/* /////////////////////////// */ > >>>>> + > >>>>> +/* direction */ > >>>>> +typedef enum _updn_switch_dir > >>>>> +{ > >>>>> + UP = 0, > >>>>> + DOWN > >>>>> +} updn_switch_dir_t; > >>>>> + > >>>>> +/* This enum respresent available states in the UPDN algorithm */ > >>>>> +typedef enum _updn_state > >>>>> +{ > >>>>> + UPDN_INIT = 0, > >>>>> + UPDN_RANK, > >>>>> + UPDN_MIN_HOP_CALC, > >>>>> +} updn_state_t; > >>>>> + > >>>>> +/* Rank value of this node */ > >>>>> +typedef struct _updn_rank > >>>>> +{ > >>>>> + cl_map_item_t map_item; > >>>>> + uint8_t rank; > >>>>> +} updn_rank_t; > >>>>> + > >>>>> +/* Histogram element - the number of occurences of the same hop value */ > >>>>> +typedef struct _updn_hist > >>>>> +{ > >>>>> + cl_map_item_t map_item; > >>>>> + uint32_t bar_value; > >>>>> +} updn_hist_t; > >>>>> + > >>>>> +typedef struct _updn_next_step > >>>>> +{ > >>>>> + updn_switch_dir_t state; > >>>>> + osm_switch_t *p_sw; > >>>>> +} updn_next_step_t; > >>>>> + > >>>>> +/* guids list */ > >>>>> +typedef struct _updn_input > >>>>> +{ > >>>>> + uint32_t num_guids; > >>>>> + uint64_t * guid_list; > >>>>> +} updn_input_t; > >>>>> + > >>>>> +/* updn structure */ > >>>>> +typedef struct _updn > >>>>> +{ > >>>>> + updn_state_t state; > >>>>> + boolean_t auto_detect_root_nodes; > >>>>> + cl_qmap_t guid_rank_tbl; > >>>>> + updn_input_t updn_ucast_reg_inputs; > >>>>> + cl_list_t * p_root_nodes; > >>>>> +} updn_t; > >>>>> > >>>>> > >>>>> /* ///////////////////////////////// */ > >>>>> @@ -65,6 +119,11 @@ #include > >>>>> /* This var is predefined and initialized */ > >>>>> extern osm_opensm_t osm; > >>>>> > >>>>> +/* ///////////////////////////////// */ > >>>>> +/* Statics */ > >>>>> +/* ///////////////////////////////// */ > >>>>> +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > >>>>> + > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> /* This function returns direction based on rank and guid info of current & > >>>>> @@ -471,7 +530,7 @@ __updn_bfs_by_node( > >>>>> > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> -void > >>>>> +static void > >>>>> updn_destroy( > >>>>> IN updn_t* const p_updn ) > >>>>> { > >>>>> @@ -508,7 +567,7 @@ updn_destroy( > >>>>> > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> -updn_t* > >>>>> +static updn_t* > >>>>> updn_construct(void) > >>>>> { > >>>>> updn_t* p_updn; > >>>>> @@ -523,7 +582,7 @@ updn_construct(void) > >>>>> > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> -cl_status_t > >>>>> +static cl_status_t > >>>>> updn_init( > >>>>> IN updn_t* const p_updn ) > >>>>> { > >>>>> @@ -635,7 +694,7 @@ updn_init( > >>>>> **********************************************************************/ > >>>>> /* NOTE : PLS check if we need to decide that the first */ > >>>>> /* rank is a SWITCH for BFS purpose */ > >>>>> -int > >>>>> +static int > >>>>> updn_subn_rank( > >>>>> IN uint64_t root_guid, > >>>>> IN uint8_t base_rank, > >>>>> @@ -795,7 +854,7 @@ updn_subn_rank( > >>>>> > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> -int > >>>>> +static int > >>>>> osm_subn_set_up_down_min_hop_table( > >>>>> IN updn_t* p_updn ) > >>>>> { > >>>>> @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( > >>>>> > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> -int > >>>>> +static int > >>>>> osm_subn_calc_up_down_min_hop_table( > >>>>> IN uint32_t num_guids, > >>>>> IN uint64_t * guid_list, > >>>>> @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> /* UPDN callback function */ > >>>>> -int __osm_updn_call( > >>>>> +static int __osm_updn_call( > >>>>> void *ctx ) > >>>>> { > >>>>> OSM_LOG_ENTER(&(osm.log), __osm_updn_call); > >>>>> @@ -969,7 +1028,7 @@ int __osm_updn_call( > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> /* UPDN convert cl_list to guid array in updn struct */ > >>>>> -void __osm_updn_convert_list2array( > >>>>> +static void __osm_updn_convert_list2array( > >>>>> IN updn_t * p_updn ) > >>>>> { > >>>>> uint32_t i = 0, max_num = 0; > >>>>> @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( > >>>>> /********************************************************************** > >>>>> **********************************************************************/ > >>>>> /* Find Root nodes automatically by Min Hop Table info */ > >>>>> -int > >>>>> +static int > >>>>> osm_updn_find_root_nodes_by_min_hop( > >>>>> OUT updn_t * p_updn ) > >>>>> { > >>>>> > >>>>> > >>> _______________________________________________ > >>> openib-general mailing list > >>> openib-general at openib.org > >>> http://openib.org/mailman/listinfo/openib-general > >>> > >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >>> > >>> > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From sashak at voltaire.com Mon Oct 23 04:16:52 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 23 Oct 2006 13:16:52 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <453C6D47.307@mellanox.co.il> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> <20061022102429.GB29681@sashak.voltaire.com> <453C6D47.307@mellanox.co.il> Message-ID: <20061023111652.GA18837@sashak.voltaire.com> Hi Eitan, On 09:20 Mon 23 Oct , Eitan Zahavi wrote: > Hi Sasha, > > If we would like to change OpenSM coding style to not include __osm > prefix for > all static functions we should do it all over the code. Maybe I was unclear - this patch does nothing with function or types renaming. This is "make static" patch. > Meanwhile lets keep the style as it is. Nobody cared to unify up/down code before, and it clearly was not goal of this patch too. In general I think it is not bad thing to define recommended coding style, but this is different issue (looks we will need to write something about it too). Sasha > I thought we all agreed to this > in the past. > It does not make sense to me to have a creeping style change one for > every developer involved. > > Should we start the thread for what should be our target style and > convert all files now? > If we do then lets agree on that - and then change. > > Thanks > > Eitan > > Sasha Khapyorsky wrote: > >On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: > > > >>Hi Sasha. > >> > >>One small comments: > >> > >>[snip] > >> > >>> osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > >>> ... > >>> osm_updn_find_root_nodes_by_min_hop( > >>> ... > >>> osm_subn_set_up_down_min_hop_table( > >>> ... > >>> osm_subn_calc_up_down_min_hop_table( > >>> ... > >>> > >> > >>Please add the "__" prefix to the static function names. > >> > > > >Then would be better to remove 'osm_' and '__osm_' prefixes in static > >names, but this will be function renaming, not just 'make static'. > > > >Sasha > > > > > >>Thanks. > >> > >>-- > >>Yevgeny > >> > >>Sasha Khapyorsky wrote: > >> > >>>This makes local functions static and moves definitions of locally used > >>>types to .c file. > >>> > >>>Signed-off-by: Sasha Khapyorsky > >>>--- > >>> osm/include/opensm/osm_opensm.h | 1 - > >>> osm/include/opensm/osm_ucast_updn.h | 349 > >>> ----------------------------------- > >>> osm/opensm/osm_ucast_updn.c | 81 +++++++- > >>> 3 files changed, 70 insertions(+), 361 deletions(-) > >>> > >>>diff --git a/osm/include/opensm/osm_opensm.h > >>>b/osm/include/opensm/osm_opensm.h > >>>index cb216a4..5557dbd 100644 > >>>--- a/osm/include/opensm/osm_opensm.h > >>>+++ b/osm/include/opensm/osm_opensm.h > >>>@@ -62,7 +62,6 @@ #include > >>> #include > >>> #include > >>> #include > >>>-#include > >>> > >>> #ifdef __cplusplus > >>> # define BEGIN_C_DECLS extern "C" { > >>>diff --git a/osm/include/opensm/osm_ucast_updn.h > >>>b/osm/include/opensm/osm_ucast_updn.h > >>>index 4609e1b..c2a4376 100644 > >>>--- a/osm/include/opensm/osm_ucast_updn.h > >>>+++ b/osm/include/opensm/osm_ucast_updn.h > >>>@@ -71,363 +71,14 @@ BEGIN_C_DECLS > >>> /* ENUM TypeDefs */ > >>> /* /////////////////////////// */ > >>> > >>>-/* > >>>-* DESCRIPTION > >>>-* This enum respresent available directions of arcs in the graph > >>>-* SYNOPSIS > >>>-*/ > >>>-typedef enum _updn_switch_dir > >>>-{ > >>>- UP = 0, > >>>- DOWN > >>>-} updn_switch_dir_t; > >>>- > >>>-/* > >>>- * TYPE DEFINITIONS > >>>- * UP > >>>- * Current switch direction in propogating the subnet is up > >>>- * DOWN > >>>- * Current switch direction in propogating the subnet is down > >>>- * > >>>- */ > >>>- > >>>-/* > >>>-* DESCRIPTION > >>>-* This enum respresent available states in the UPDN algorithm > >>>-* SYNOPSIS > >>>-*/ > >>>-typedef enum _updn_state > >>>-{ > >>>- UPDN_INIT = 0, > >>>- UPDN_RANK, > >>>- UPDN_MIN_HOP_CALC, > >>>-} updn_state_t; > >>>- > >>>-/* > >>>- * TYPE DEFINITIONS > >>>- * UPDN_INIT - loading the package but still not performing anything > >>>- * UPDN_RANK - post ranking algorithm > >>>- * UPDN_MIN_HOP_CALC - post min hop table calculation > >>>- */ > >>>- > >>> /* ////////////////////////////////// */ > >>> /* Struct TypeDefs */ > >>> /* ///////////////////////////////// */ > >>> > >>>-/****s* UPDN: Rank element/updn_rank_t > >>>-* NAME > >>>-* updn_rank_t > >>>-* > >>>-* DESCRIPTION > >>>-* This object represents a rank type element in a list > >>>-* > >>>-* The updn_rank_t object should be treated as opaque and should > >>>-* be manipulated only through the provided functions. > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-typedef struct _updn_rank > >>>-{ > >>>- cl_map_item_t map_item; > >>>- uint8_t rank; > >>>-} updn_rank_t; > >>>- > >>>-/* > >>>-* FIELDS > >>>-* map_item > >>>-* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > >>>-* > >>>-* rank > >>>-* Rank value of this node > >>>-* > >>>-*/ > >>>- > >>>-/****s* UPDN: Histogram element/updn_hist_t > >>>-* NAME > >>>-* updn_hist_t > >>>-* > >>>-* DESCRIPTION > >>>-* This object represents a histogram type element in a list > >>>-* > >>>-* The updn_hist_t object should be treated as opaque and should > >>>-* be manipulated only through the provided functions. > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-typedef struct _updn_hist > >>>-{ > >>>- cl_map_item_t map_item; > >>>- uint32_t bar_value; > >>>-} updn_hist_t; > >>>- > >>>-/* > >>>-* FIELDS > >>>-* map_item > >>>-* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! > >>>-* > >>>-* bar_value > >>>-* The number of occurences of the same hop value > >>>-* > >>>-*/ > >>>- > >>>-typedef struct _updn_next_step > >>>-{ > >>>- updn_switch_dir_t state; > >>>- osm_switch_t *p_sw; > >>>-} updn_next_step_t; > >>>- > >>>-/*****s* updn: updn/updn_input_t > >>>-* NAME updn_t > >>>-* > >>>-* > >>>-* DESCRIPTION > >>>-* updn input fields structure. > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-typedef struct _updn_input > >>>-{ > >>>- uint32_t num_guids; > >>>- uint64_t * guid_list; > >>>-} updn_input_t; > >>>- > >>>-/* > >>>-* FIELDS > >>>-* num_guids > >>>-* number of guids given at the UI > >>>-* > >>>-* guid_list > >>>-* guids specified as an array (converted from a list given > >>>in the UI) > >>>-* > >>>-* > >>>-* SEE ALSO > >>>-* > >>>-*********/ > >>>- > >>>-/*****s* updn: updn/updn_t > >>>-* NAME updn_t > >>>-* > >>>-* > >>>-* DESCRIPTION > >>>-* updn structure. > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-typedef struct _updn > >>>-{ > >>>- updn_state_t state; > >>>- boolean_t auto_detect_root_nodes; > >>>- cl_qmap_t guid_rank_tbl; > >>>- updn_input_t updn_ucast_reg_inputs; > >>>- cl_list_t * p_root_nodes; > >>>-} updn_t; > >>>- > >>>-/* > >>>-* FIELDS > >>>-* state > >>>-* state of the updn algorithm which basically should pass > >>>through Init -* - Ranking - UpDn algorithm > >>>-* > >>>-* guid_rank_tbl > >>>-* guid 2 rank mapping vector , indexed by guid in network > >>>order > >>>-* > >>>-* > >>>-* SEE ALSO > >>>-* > >>>-*********/ > >>>- > >>> /* ////////////////////////////// */ > >>> /* Function */ > >>> /* ////////////////////////////// */ > >>> > >>>-/***f** OpenSM: Updn/updn_construct > >>>-* NAME > >>>-* updn_construct > >>>-* > >>>-* DESCRIPTION > >>>-* Allocation of updn_t struct > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-updn_t* > >>>-updn_construct(void); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* > >>>-* > >>>-* RETURN VALUE > >>>-* Return a pointer to an updn struct. Null if fails to do so. > >>>-* > >>>-* NOTES > >>>-* First step of the creation of updn_t > >>>-*/ > >>>- > >>>-/****s* OpenSM: Updn/updn_destroy > >>>-* NAME > >>>-* updn_destroy > >>>-* > >>>-* DESCRIPTION > >>>-* release of updn_t struct > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-void > >>>-updn_destroy( > >>>- IN updn_t* const p_updn ); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* p_updn > >>>-* A pointer to the updn_t struct that is goining to be > >>>released > >>>-* > >>>-* RETURN VALUE > >>>-* > >>>-* NOTES > >>>-* Final step of the releasing of updn_t > >>>-* > >>>-* SEE ALSO > >>>-* updn_construct > >>>-*********/ > >>>- > >>>-/****f* OpenSM: Updn/updn_init > >>>-* NAME > >>>-* updn_init > >>>-* > >>>-* DESCRIPTION > >>>-* Initialization of an updn_t struct > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>-cl_status_t > >>>-updn_init( > >>>- IN updn_t* const p_updn ); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* p_updn > >>>-* A pointer to the updn_t struct that is goining to be > >>>initilized > >>>-* > >>>-* RETURN VALUE > >>>-* The status of the function. > >>>-* > >>>-* NOTES > >>>-* > >>>-* SEE ALSO > >>>-* updn_construct > >>>-********/ > >>>- > >>>-/****** OpenSM: Updn/updn_subn_rank > >>>-* NAME > >>>-* updn_subn_rank > >>>-* > >>>-* DESCRIPTION > >>>-* This function ranks the subnet for credit loop free algorithm > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>-int > >>>-updn_subn_rank( > >>>- IN uint64_t root_guid , > >>>- IN uint8_t base_rank, > >>>- IN updn_t* p_updn ); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* p_subn > >>>-* [in] Pointer to a Subnet object to construct. > >>>-* > >>>-* base_rank > >>>-* [in] The base ranking value (lowest value) > >>>-* > >>>-* p_updn > >>>-* [in] Pointer to updn structure which includes state & > >>>lid2rank table > >>>-* > >>>-* RETURN VALUE > >>>-* This function returns 0 when rankning has succeded , otherwise 1. > >>>-******/ > >>>- > >>>-/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table > >>>-* NAME > >>>-* osm_subn_set_up_down_min_hop_table > >>>-* > >>>-* DESCRIPTION > >>>-* This function set min hop table of all switches by BFS through each > >>>-* port guid at the subnet using ranking done before. > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-int > >>>-osm_subn_set_up_down_min_hop_table( > >>>- IN updn_t* p_updn ); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* p_updn > >>>-* [in] Pointer to updn structure which includes state & > >>>lid2rank table > >>>-* > >>>-* RETURN VALUE > >>>-* This function returns 0 when rankning has succeded , otherwise 1. > >>>-******/ > >>>- > >>>-/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table > >>>-* NAME > >>>-* osm_subn_calc_up_down_min_hop_table > >>>-* > >>>-* DESCRIPTION > >>>-* This function perform ranking and setting of all switches' min hop > >>>table > >>>-* by UP DOWN algorithm > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>- > >>>-int > >>>-osm_subn_calc_up_down_min_hop_table( > >>>- IN uint32_t num_guids, > >>>- IN uint64_t* guid_list, > >>>- IN updn_t* p_updn ); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* > >>>-* guid_list > >>>-* [in] Guid list from which to start ranking . > >>>-* > >>>-* p_updn > >>>-* [in] Pointer to updn structure which includes state & > >>>lid2rank table > >>>-* RETURN VALUE > >>>-* This function returns 0 when rankning has succeded , otherwise 1. > >>>-******/ > >>>- > >>>-/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop > >>>-* NAME > >>>-* osm_updn_find_root_nodes_by_min_hop > >>>-* > >>>-* DESCRIPTION > >>>-* This function perform auto identification of root nodes for UPDN > >>>ranking phase > >>>-* > >>>-* SYNOPSIS > >>>-*/ > >>>-int > >>>-osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); > >>>- > >>>-/* > >>>-* PARAMETERS > >>>-* p_root_nodes_list > >>>-* > >>>-* [out] Pointer to the root nodes list found in the subnet > >>>-* > >>>-* RETURN VALUE > >>>-* This function returns 0 when auto identification had succeeded > >>>-******/ > >>>- > >>> END_C_DECLS > >>> > >>> #endif /* _OSM_UCAST_UPDN_H_ */ > >>>diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > >>>index 86ac3ad..0121e6e 100644 > >>>--- a/osm/opensm/osm_ucast_updn.c > >>>+++ b/osm/opensm/osm_ucast_updn.c > >>>@@ -55,8 +55,62 @@ #include > >>> #include > >>> #include > >>> #include > >>>-#include > >>>-#include > >>>+ > >>>+/* //////////////////////////// */ > >>>+/* Local types */ > >>>+/* /////////////////////////// */ > >>>+ > >>>+/* direction */ > >>>+typedef enum _updn_switch_dir > >>>+{ > >>>+ UP = 0, > >>>+ DOWN > >>>+} updn_switch_dir_t; > >>>+ > >>>+/* This enum respresent available states in the UPDN algorithm */ > >>>+typedef enum _updn_state > >>>+{ > >>>+ UPDN_INIT = 0, > >>>+ UPDN_RANK, > >>>+ UPDN_MIN_HOP_CALC, > >>>+} updn_state_t; > >>>+ > >>>+/* Rank value of this node */ > >>>+typedef struct _updn_rank > >>>+{ > >>>+ cl_map_item_t map_item; > >>>+ uint8_t rank; > >>>+} updn_rank_t; > >>>+ > >>>+/* Histogram element - the number of occurences of the same hop value */ > >>>+typedef struct _updn_hist > >>>+{ > >>>+ cl_map_item_t map_item; > >>>+ uint32_t bar_value; > >>>+} updn_hist_t; > >>>+ > >>>+typedef struct _updn_next_step > >>>+{ > >>>+ updn_switch_dir_t state; > >>>+ osm_switch_t *p_sw; > >>>+} updn_next_step_t; > >>>+ > >>>+/* guids list */ > >>>+typedef struct _updn_input > >>>+{ > >>>+ uint32_t num_guids; > >>>+ uint64_t * guid_list; > >>>+} updn_input_t; > >>>+ > >>>+/* updn structure */ > >>>+typedef struct _updn > >>>+{ > >>>+ updn_state_t state; > >>>+ boolean_t auto_detect_root_nodes; > >>>+ cl_qmap_t guid_rank_tbl; > >>>+ updn_input_t updn_ucast_reg_inputs; > >>>+ cl_list_t * p_root_nodes; > >>>+} updn_t; > >>> > >>> > >>> /* ///////////////////////////////// */ > >>>@@ -65,6 +119,11 @@ #include > >>> /* This var is predefined and initialized */ > >>> extern osm_opensm_t osm; > >>> > >>>+/* ///////////////////////////////// */ > >>>+/* Statics */ > >>>+/* ///////////////////////////////// */ > >>>+static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); > >>>+ > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* This function returns direction based on rank and guid info of > >>> current & > >>>@@ -471,7 +530,7 @@ __updn_bfs_by_node( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>>-void > >>>+static void > >>> updn_destroy( > >>> IN updn_t* const p_updn ) > >>> { > >>>@@ -508,7 +567,7 @@ updn_destroy( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>>-updn_t* > >>>+static updn_t* > >>> updn_construct(void) > >>> { > >>> updn_t* p_updn; > >>>@@ -523,7 +582,7 @@ updn_construct(void) > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>>-cl_status_t > >>>+static cl_status_t > >>> updn_init( > >>> IN updn_t* const p_updn ) > >>> { > >>>@@ -635,7 +694,7 @@ updn_init( > >>> **********************************************************************/ > >>> /* NOTE : PLS check if we need to decide that the first */ > >>> /* rank is a SWITCH for BFS purpose */ > >>>-int > >>>+static int > >>> updn_subn_rank( > >>> IN uint64_t root_guid, > >>> IN uint8_t base_rank, > >>>@@ -795,7 +854,7 @@ updn_subn_rank( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>>-int > >>>+static int > >>> osm_subn_set_up_down_min_hop_table( > >>> IN updn_t* p_updn ) > >>> { > >>>@@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( > >>> > >>> /********************************************************************** > >>> **********************************************************************/ > >>>-int > >>>+static int > >>> osm_subn_calc_up_down_min_hop_table( > >>> IN uint32_t num_guids, > >>> IN uint64_t * guid_list, > >>>@@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* UPDN callback function */ > >>>-int __osm_updn_call( > >>>+static int __osm_updn_call( > >>> void *ctx ) > >>> { > >>> OSM_LOG_ENTER(&(osm.log), __osm_updn_call); > >>>@@ -969,7 +1028,7 @@ int __osm_updn_call( > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* UPDN convert cl_list to guid array in updn struct */ > >>>-void __osm_updn_convert_list2array( > >>>+static void __osm_updn_convert_list2array( > >>> IN updn_t * p_updn ) > >>> { > >>> uint32_t i = 0, max_num = 0; > >>>@@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( > >>> /********************************************************************** > >>> **********************************************************************/ > >>> /* Find Root nodes automatically by Min Hop Table info */ > >>>-int > >>>+static int > >>> osm_updn_find_root_nodes_by_min_hop( > >>> OUT updn_t * p_updn ) > >>> { > >>> > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit > >http://openib.org/mailman/listinfo/openib-general > > > From eitan at mellanox.co.il Mon Oct 23 04:10:43 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 23 Oct 2006 13:10:43 +0200 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <1161599481.25985.295079.camel@hal.voltaire.com> References: <20061019212639.GA24600@sashak.voltaire.com> <453B3194.7000702@dev.mellanox.co.il> <20061022102429.GB29681@sashak.voltaire.com> <453C6D47.307@mellanox.co.il> <1161594784.25985.292177.camel@hal.voltaire.com> <453C947B.7070903@mellanox.co.il> <1161599481.25985.295079.camel@hal.voltaire.com> Message-ID: <453CA333.7000906@mellanox.co.il> Hal Rosenstock wrote: > On Mon, 2006-10-23 at 06:07, Eitan Zahavi wrote: > >> Hal Rosenstock wrote: >> >>> Eitan, >>> >>> On Mon, 2006-10-23 at 03:20, Eitan Zahavi wrote: >>> >>> >>>> Hi Sasha, >>>> >>>> If we would like to change OpenSM coding style to not include __osm >>>> prefix for >>>> all static functions we should do it all over the code. >>>> >>>> >>> Is there any value to __osm_ in the local function names ? If not, I >>> don't really see the harm here. >>> >>> >> Yes there is value in keeping a consistent code style across a project. >> Every project I know has a style. >> OpenSM style is there for many years. >> > > Sure but this was a much smaller question to which I would like to hear > your answer. > > In this particular case, (other than consistency with the current > defacto coding style as I am not sure where this aspect is documented), > is there value in adding __osm_ to local function names ? > Other then consistency NO. But this is no different then any other coding style item. >> We can change it if we like but >> let us do it consciously and on the entire tree. >> > > Sure but it can be done incrementally rather than all at once as many > things have done in the past. > Only if we agree on a style to migrate to. If we do agree on a style then we should have dedicated style change patches and even develop a way to formally check nothing else but the style was changed. > > >>>> Meanwhile lets keep the style as it is. I thought we all agreed to this >>>> in the past. >>>> It does not make sense to me to have a creeping style change one for >>>> every developer involved. >>>> >>>> Should we start the thread for what should be our target style and >>>> convert all files now? >>>> If we do then lets agree on that - and then change. >>>> >>>> >>> Do all such changes need to be hung on the yet to be determined coding >>> style ? >>> >>> >> YES - No coding style changes should be allowed on a per checkin basis. >> Otherwise we turn the coding style into a mess. >> > > This is a little too heavy handed. Certain things can be accomplished > without the grand vision of an agreed upon (updated) coding style. > There is no point in coding style of a project if every developer can change it when he/she wishes. > -- Hal > > >>> -- Hal >>> >>> >>> >>>> Thanks >>>> >>>> Eitan >>>> >>>> Sasha Khapyorsky wrote: >>>> >>>> >>>>> On 10:53 Sun 22 Oct , Yevgeny Kliteynik wrote: >>>>> >>>>> >>>>> >>>>>> Hi Sasha. >>>>>> >>>>>> One small comments: >>>>>> >>>>>> [snip] >>>>>> >>>>>> >>>>>> >>>>>>> osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); >>>>>>> ... >>>>>>> osm_updn_find_root_nodes_by_min_hop( >>>>>>> ... >>>>>>> osm_subn_set_up_down_min_hop_table( >>>>>>> ... >>>>>>> osm_subn_calc_up_down_min_hop_table( >>>>>>> ... >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> Please add the "__" prefix to the static function names. >>>>>> >>>>>> >>>>>> >>>>> Then would be better to remove 'osm_' and '__osm_' prefixes in static >>>>> names, but this will be function renaming, not just 'make static'. >>>>> >>>>> Sasha >>>>> >>>>> >>>>> >>>>> >>>>>> Thanks. >>>>>> >>>>>> -- >>>>>> Yevgeny >>>>>> >>>>>> Sasha Khapyorsky wrote: >>>>>> >>>>>> >>>>>> >>>>>>> This makes local functions static and moves definitions of locally used >>>>>>> types to .c file. >>>>>>> >>>>>>> Signed-off-by: Sasha Khapyorsky >>>>>>> --- >>>>>>> osm/include/opensm/osm_opensm.h | 1 - >>>>>>> osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- >>>>>>> osm/opensm/osm_ucast_updn.c | 81 +++++++- >>>>>>> 3 files changed, 70 insertions(+), 361 deletions(-) >>>>>>> >>>>>>> diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h >>>>>>> index cb216a4..5557dbd 100644 >>>>>>> --- a/osm/include/opensm/osm_opensm.h >>>>>>> +++ b/osm/include/opensm/osm_opensm.h >>>>>>> @@ -62,7 +62,6 @@ #include >>>>>>> #include >>>>>>> #include >>>>>>> #include >>>>>>> -#include >>>>>>> >>>>>>> #ifdef __cplusplus >>>>>>> # define BEGIN_C_DECLS extern "C" { >>>>>>> diff --git a/osm/include/opensm/osm_ucast_updn.h b/osm/include/opensm/osm_ucast_updn.h >>>>>>> index 4609e1b..c2a4376 100644 >>>>>>> --- a/osm/include/opensm/osm_ucast_updn.h >>>>>>> +++ b/osm/include/opensm/osm_ucast_updn.h >>>>>>> @@ -71,363 +71,14 @@ BEGIN_C_DECLS >>>>>>> /* ENUM TypeDefs */ >>>>>>> /* /////////////////////////// */ >>>>>>> >>>>>>> -/* >>>>>>> -* DESCRIPTION >>>>>>> -* This enum respresent available directions of arcs in the graph >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> -typedef enum _updn_switch_dir >>>>>>> -{ >>>>>>> - UP = 0, >>>>>>> - DOWN >>>>>>> -} updn_switch_dir_t; >>>>>>> - >>>>>>> -/* >>>>>>> - * TYPE DEFINITIONS >>>>>>> - * UP >>>>>>> - * Current switch direction in propogating the subnet is up >>>>>>> - * DOWN >>>>>>> - * Current switch direction in propogating the subnet is down >>>>>>> - * >>>>>>> - */ >>>>>>> - >>>>>>> -/* >>>>>>> -* DESCRIPTION >>>>>>> -* This enum respresent available states in the UPDN algorithm >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> -typedef enum _updn_state >>>>>>> -{ >>>>>>> - UPDN_INIT = 0, >>>>>>> - UPDN_RANK, >>>>>>> - UPDN_MIN_HOP_CALC, >>>>>>> -} updn_state_t; >>>>>>> - >>>>>>> -/* >>>>>>> - * TYPE DEFINITIONS >>>>>>> - * UPDN_INIT - loading the package but still not performing anything >>>>>>> - * UPDN_RANK - post ranking algorithm >>>>>>> - * UPDN_MIN_HOP_CALC - post min hop table calculation >>>>>>> - */ >>>>>>> - >>>>>>> /* ////////////////////////////////// */ >>>>>>> /* Struct TypeDefs */ >>>>>>> /* ///////////////////////////////// */ >>>>>>> >>>>>>> -/****s* UPDN: Rank element/updn_rank_t >>>>>>> -* NAME >>>>>>> -* updn_rank_t >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* This object represents a rank type element in a list >>>>>>> -* >>>>>>> -* The updn_rank_t object should be treated as opaque and should >>>>>>> -* be manipulated only through the provided functions. >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -typedef struct _updn_rank >>>>>>> -{ >>>>>>> - cl_map_item_t map_item; >>>>>>> - uint8_t rank; >>>>>>> -} updn_rank_t; >>>>>>> - >>>>>>> -/* >>>>>>> -* FIELDS >>>>>>> -* map_item >>>>>>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! >>>>>>> -* >>>>>>> -* rank >>>>>>> -* Rank value of this node >>>>>>> -* >>>>>>> -*/ >>>>>>> - >>>>>>> -/****s* UPDN: Histogram element/updn_hist_t >>>>>>> -* NAME >>>>>>> -* updn_hist_t >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* This object represents a histogram type element in a list >>>>>>> -* >>>>>>> -* The updn_hist_t object should be treated as opaque and should >>>>>>> -* be manipulated only through the provided functions. >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -typedef struct _updn_hist >>>>>>> -{ >>>>>>> - cl_map_item_t map_item; >>>>>>> - uint32_t bar_value; >>>>>>> -} updn_hist_t; >>>>>>> - >>>>>>> -/* >>>>>>> -* FIELDS >>>>>>> -* map_item >>>>>>> -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! >>>>>>> -* >>>>>>> -* bar_value >>>>>>> -* The number of occurences of the same hop value >>>>>>> -* >>>>>>> -*/ >>>>>>> - >>>>>>> -typedef struct _updn_next_step >>>>>>> -{ >>>>>>> - updn_switch_dir_t state; >>>>>>> - osm_switch_t *p_sw; >>>>>>> -} updn_next_step_t; >>>>>>> - >>>>>>> -/*****s* updn: updn/updn_input_t >>>>>>> -* NAME updn_t >>>>>>> -* >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* updn input fields structure. >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -typedef struct _updn_input >>>>>>> -{ >>>>>>> - uint32_t num_guids; >>>>>>> - uint64_t * guid_list; >>>>>>> -} updn_input_t; >>>>>>> - >>>>>>> -/* >>>>>>> -* FIELDS >>>>>>> -* num_guids >>>>>>> -* number of guids given at the UI >>>>>>> -* >>>>>>> -* guid_list >>>>>>> -* guids specified as an array (converted from a list given in the UI) >>>>>>> -* >>>>>>> -* >>>>>>> -* SEE ALSO >>>>>>> -* >>>>>>> -*********/ >>>>>>> - >>>>>>> -/*****s* updn: updn/updn_t >>>>>>> -* NAME updn_t >>>>>>> -* >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* updn structure. >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -typedef struct _updn >>>>>>> -{ >>>>>>> - updn_state_t state; >>>>>>> - boolean_t auto_detect_root_nodes; >>>>>>> - cl_qmap_t guid_rank_tbl; >>>>>>> - updn_input_t updn_ucast_reg_inputs; >>>>>>> - cl_list_t * p_root_nodes; >>>>>>> -} updn_t; >>>>>>> - >>>>>>> -/* >>>>>>> -* FIELDS >>>>>>> -* state >>>>>>> -* state of the updn algorithm which basically should pass through Init >>>>>>> -* - Ranking - UpDn algorithm >>>>>>> -* >>>>>>> -* guid_rank_tbl >>>>>>> -* guid 2 rank mapping vector , indexed by guid in network order >>>>>>> -* >>>>>>> -* >>>>>>> -* SEE ALSO >>>>>>> -* >>>>>>> -*********/ >>>>>>> - >>>>>>> /* ////////////////////////////// */ >>>>>>> /* Function */ >>>>>>> /* ////////////////////////////// */ >>>>>>> >>>>>>> -/***f** OpenSM: Updn/updn_construct >>>>>>> -* NAME >>>>>>> -* updn_construct >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* Allocation of updn_t struct >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -updn_t* >>>>>>> -updn_construct(void); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* >>>>>>> -* >>>>>>> -* RETURN VALUE >>>>>>> -* Return a pointer to an updn struct. Null if fails to do so. >>>>>>> -* >>>>>>> -* NOTES >>>>>>> -* First step of the creation of updn_t >>>>>>> -*/ >>>>>>> - >>>>>>> -/****s* OpenSM: Updn/updn_destroy >>>>>>> -* NAME >>>>>>> -* updn_destroy >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* release of updn_t struct >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -void >>>>>>> -updn_destroy( >>>>>>> - IN updn_t* const p_updn ); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* p_updn >>>>>>> -* A pointer to the updn_t struct that is goining to be released >>>>>>> -* >>>>>>> -* RETURN VALUE >>>>>>> -* >>>>>>> -* NOTES >>>>>>> -* Final step of the releasing of updn_t >>>>>>> -* >>>>>>> -* SEE ALSO >>>>>>> -* updn_construct >>>>>>> -*********/ >>>>>>> - >>>>>>> -/****f* OpenSM: Updn/updn_init >>>>>>> -* NAME >>>>>>> -* updn_init >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* Initialization of an updn_t struct >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> -cl_status_t >>>>>>> -updn_init( >>>>>>> - IN updn_t* const p_updn ); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* p_updn >>>>>>> -* A pointer to the updn_t struct that is goining to be initilized >>>>>>> -* >>>>>>> -* RETURN VALUE >>>>>>> -* The status of the function. >>>>>>> -* >>>>>>> -* NOTES >>>>>>> -* >>>>>>> -* SEE ALSO >>>>>>> -* updn_construct >>>>>>> -********/ >>>>>>> - >>>>>>> -/****** OpenSM: Updn/updn_subn_rank >>>>>>> -* NAME >>>>>>> -* updn_subn_rank >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* This function ranks the subnet for credit loop free algorithm >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> -int >>>>>>> -updn_subn_rank( >>>>>>> - IN uint64_t root_guid , >>>>>>> - IN uint8_t base_rank, >>>>>>> - IN updn_t* p_updn ); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* p_subn >>>>>>> -* [in] Pointer to a Subnet object to construct. >>>>>>> -* >>>>>>> -* base_rank >>>>>>> -* [in] The base ranking value (lowest value) >>>>>>> -* >>>>>>> -* p_updn >>>>>>> -* [in] Pointer to updn structure which includes state & lid2rank table >>>>>>> -* >>>>>>> -* RETURN VALUE >>>>>>> -* This function returns 0 when rankning has succeded , otherwise 1. >>>>>>> -******/ >>>>>>> - >>>>>>> -/****** OpenSM: UpDn/osm_subn_set_up_down_min_hop_table >>>>>>> -* NAME >>>>>>> -* osm_subn_set_up_down_min_hop_table >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* This function set min hop table of all switches by BFS through each >>>>>>> -* port guid at the subnet using ranking done before. >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -int >>>>>>> -osm_subn_set_up_down_min_hop_table( >>>>>>> - IN updn_t* p_updn ); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* p_updn >>>>>>> -* [in] Pointer to updn structure which includes state & lid2rank table >>>>>>> -* >>>>>>> -* RETURN VALUE >>>>>>> -* This function returns 0 when rankning has succeded , otherwise 1. >>>>>>> -******/ >>>>>>> - >>>>>>> -/****** OpenSM: UpDn/osm_subn_calc_up_down_min_hop_table >>>>>>> -* NAME >>>>>>> -* osm_subn_calc_up_down_min_hop_table >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* This function perform ranking and setting of all switches' min hop table >>>>>>> -* by UP DOWN algorithm >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> - >>>>>>> -int >>>>>>> -osm_subn_calc_up_down_min_hop_table( >>>>>>> - IN uint32_t num_guids, >>>>>>> - IN uint64_t* guid_list, >>>>>>> - IN updn_t* p_updn ); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* >>>>>>> -* guid_list >>>>>>> -* [in] Guid list from which to start ranking . >>>>>>> -* >>>>>>> -* p_updn >>>>>>> -* [in] Pointer to updn structure which includes state & lid2rank table >>>>>>> -* RETURN VALUE >>>>>>> -* This function returns 0 when rankning has succeded , otherwise 1. >>>>>>> -******/ >>>>>>> - >>>>>>> -/****** OpenSM: UpDn/osm_updn_find_root_nodes_by_min_hop >>>>>>> -* NAME >>>>>>> -* osm_updn_find_root_nodes_by_min_hop >>>>>>> -* >>>>>>> -* DESCRIPTION >>>>>>> -* This function perform auto identification of root nodes for UPDN ranking phase >>>>>>> -* >>>>>>> -* SYNOPSIS >>>>>>> -*/ >>>>>>> -int >>>>>>> -osm_updn_find_root_nodes_by_min_hop( OUT updn_t * p_updn ); >>>>>>> - >>>>>>> -/* >>>>>>> -* PARAMETERS >>>>>>> -* p_root_nodes_list >>>>>>> -* >>>>>>> -* [out] Pointer to the root nodes list found in the subnet >>>>>>> -* >>>>>>> -* RETURN VALUE >>>>>>> -* This function returns 0 when auto identification had succeeded >>>>>>> -******/ >>>>>>> - >>>>>>> END_C_DECLS >>>>>>> >>>>>>> #endif /* _OSM_UCAST_UPDN_H_ */ >>>>>>> diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c >>>>>>> index 86ac3ad..0121e6e 100644 >>>>>>> --- a/osm/opensm/osm_ucast_updn.c >>>>>>> +++ b/osm/opensm/osm_ucast_updn.c >>>>>>> @@ -55,8 +55,62 @@ #include >>>>>>> #include >>>>>>> #include >>>>>>> #include >>>>>>> -#include >>>>>>> -#include >>>>>>> + >>>>>>> +/* //////////////////////////// */ >>>>>>> +/* Local types */ >>>>>>> +/* /////////////////////////// */ >>>>>>> + >>>>>>> +/* direction */ >>>>>>> +typedef enum _updn_switch_dir >>>>>>> +{ >>>>>>> + UP = 0, >>>>>>> + DOWN >>>>>>> +} updn_switch_dir_t; >>>>>>> + >>>>>>> +/* This enum respresent available states in the UPDN algorithm */ >>>>>>> +typedef enum _updn_state >>>>>>> +{ >>>>>>> + UPDN_INIT = 0, >>>>>>> + UPDN_RANK, >>>>>>> + UPDN_MIN_HOP_CALC, >>>>>>> +} updn_state_t; >>>>>>> + >>>>>>> +/* Rank value of this node */ >>>>>>> +typedef struct _updn_rank >>>>>>> +{ >>>>>>> + cl_map_item_t map_item; >>>>>>> + uint8_t rank; >>>>>>> +} updn_rank_t; >>>>>>> + >>>>>>> +/* Histogram element - the number of occurences of the same hop value */ >>>>>>> +typedef struct _updn_hist >>>>>>> +{ >>>>>>> + cl_map_item_t map_item; >>>>>>> + uint32_t bar_value; >>>>>>> +} updn_hist_t; >>>>>>> + >>>>>>> +typedef struct _updn_next_step >>>>>>> +{ >>>>>>> + updn_switch_dir_t state; >>>>>>> + osm_switch_t *p_sw; >>>>>>> +} updn_next_step_t; >>>>>>> + >>>>>>> +/* guids list */ >>>>>>> +typedef struct _updn_input >>>>>>> +{ >>>>>>> + uint32_t num_guids; >>>>>>> + uint64_t * guid_list; >>>>>>> +} updn_input_t; >>>>>>> + >>>>>>> +/* updn structure */ >>>>>>> +typedef struct _updn >>>>>>> +{ >>>>>>> + updn_state_t state; >>>>>>> + boolean_t auto_detect_root_nodes; >>>>>>> + cl_qmap_t guid_rank_tbl; >>>>>>> + updn_input_t updn_ucast_reg_inputs; >>>>>>> + cl_list_t * p_root_nodes; >>>>>>> +} updn_t; >>>>>>> >>>>>>> >>>>>>> /* ///////////////////////////////// */ >>>>>>> @@ -65,6 +119,11 @@ #include >>>>>>> /* This var is predefined and initialized */ >>>>>>> extern osm_opensm_t osm; >>>>>>> >>>>>>> +/* ///////////////////////////////// */ >>>>>>> +/* Statics */ >>>>>>> +/* ///////////////////////////////// */ >>>>>>> +static int osm_updn_find_root_nodes_by_min_hop(OUT updn_t *p_updn); >>>>>>> + >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> /* This function returns direction based on rank and guid info of current & >>>>>>> @@ -471,7 +530,7 @@ __updn_bfs_by_node( >>>>>>> >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> -void >>>>>>> +static void >>>>>>> updn_destroy( >>>>>>> IN updn_t* const p_updn ) >>>>>>> { >>>>>>> @@ -508,7 +567,7 @@ updn_destroy( >>>>>>> >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> -updn_t* >>>>>>> +static updn_t* >>>>>>> updn_construct(void) >>>>>>> { >>>>>>> updn_t* p_updn; >>>>>>> @@ -523,7 +582,7 @@ updn_construct(void) >>>>>>> >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> -cl_status_t >>>>>>> +static cl_status_t >>>>>>> updn_init( >>>>>>> IN updn_t* const p_updn ) >>>>>>> { >>>>>>> @@ -635,7 +694,7 @@ updn_init( >>>>>>> **********************************************************************/ >>>>>>> /* NOTE : PLS check if we need to decide that the first */ >>>>>>> /* rank is a SWITCH for BFS purpose */ >>>>>>> -int >>>>>>> +static int >>>>>>> updn_subn_rank( >>>>>>> IN uint64_t root_guid, >>>>>>> IN uint8_t base_rank, >>>>>>> @@ -795,7 +854,7 @@ updn_subn_rank( >>>>>>> >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> -int >>>>>>> +static int >>>>>>> osm_subn_set_up_down_min_hop_table( >>>>>>> IN updn_t* p_updn ) >>>>>>> { >>>>>>> @@ -880,7 +939,7 @@ osm_subn_set_up_down_min_hop_table( >>>>>>> >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> -int >>>>>>> +static int >>>>>>> osm_subn_calc_up_down_min_hop_table( >>>>>>> IN uint32_t num_guids, >>>>>>> IN uint64_t * guid_list, >>>>>>> @@ -935,7 +994,7 @@ osm_subn_calc_up_down_min_hop_table( >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> /* UPDN callback function */ >>>>>>> -int __osm_updn_call( >>>>>>> +static int __osm_updn_call( >>>>>>> void *ctx ) >>>>>>> { >>>>>>> OSM_LOG_ENTER(&(osm.log), __osm_updn_call); >>>>>>> @@ -969,7 +1028,7 @@ int __osm_updn_call( >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> /* UPDN convert cl_list to guid array in updn struct */ >>>>>>> -void __osm_updn_convert_list2array( >>>>>>> +static void __osm_updn_convert_list2array( >>>>>>> IN updn_t * p_updn ) >>>>>>> { >>>>>>> uint32_t i = 0, max_num = 0; >>>>>>> @@ -1008,7 +1067,7 @@ void __osm_updn_convert_list2array( >>>>>>> /********************************************************************** >>>>>>> **********************************************************************/ >>>>>>> /* Find Root nodes automatically by Min Hop Table info */ >>>>>>> -int >>>>>>> +static int >>>>>>> osm_updn_find_root_nodes_by_min_hop( >>>>>>> OUT updn_t * p_updn ) >>>>>>> { >>>>>>> >>>>>>> >>>>>>> >>>>> _______________________________________________ >>>>> openib-general mailing list >>>>> openib-general at openib.org >>>>> http://openib.org/mailman/listinfo/openib-general >>>>> >>>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>>>> >>>>> >>>>> >>> _______________________________________________ >>> openib-general mailing list >>> openib-general at openib.org >>> http://openib.org/mailman/listinfo/openib-general >>> >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >>> >>> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From umaxx at oleco.net Mon Oct 23 04:15:35 2006 From: umaxx at oleco.net (Joerg Zinke) Date: Mon, 23 Oct 2006 13:15:35 +0200 Subject: [openib-general] OFED 1.1 on Debian based system? In-Reply-To: <20061022224605.5408753e@marvin.local> References: <453BD076.80006@pruesse.net> <20061022224605.5408753e@marvin.local> Message-ID: <20061023131535.78bd0681@marvin.local> On Sun, 22 Oct 2006 22:46:05 +0200 "Joerg Zinke" wrote: > On Sun, 22 Oct 2006 22:11:34 +0200 > "Elmar Pruesse" wrote: > > > I assume just running "./install.sh" as the Readme.txt suggests will > > not work. > > > > it was working here. i just installed vanilla kernel from source and > set some path (check install.sh options)... > to be clear: i used install.sh from SOURCES/ directory, which compiles things - not the install.sh from top-level. i used: # ./install.sh --prefix /usr/local/ --with-all-libs the kernel-options have to be set correctly before... i do not know if this is the suggested way, but it was working for me. regards, joerg From sashak at voltaire.com Mon Oct 23 04:33:05 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 23 Oct 2006 13:33:05 +0200 Subject: [openib-general] [PATCH] opensm: remove obsolete p_report_buf In-Reply-To: <453C6919.7010008@dev.mellanox.co.il> References: <20061020004727.GH24676@sashak.voltaire.com> <453C6919.7010008@dev.mellanox.co.il> Message-ID: <20061023113305.GB18837@sashak.voltaire.com> On 09:02 Mon 23 Oct , Yevgeny Kliteynik wrote: > Hi Sasha. > > The removal of the sm->p_report_buf is a good idea. > However, I do have one comment: > In several cases this buffer was printed using the osm_log_raw() > function, and you replaced this with a plain fprintf(stdout,...). > Right now the osm_log_raw function just prints to stdout too, but > this doesn't always have to be the case. Besides, osm_log_raw > provides verbosity level checking, which is lost when you replace > it with printf. Both functions calls were and still be conditonalized by verbosity level, so it is not lost. Sasha > > --Yevgeny > > Sasha Khapyorsky wrote: > > This removes obsolete now shared sm->p_report_buf buffer and cleans > > up related code. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > osm/include/opensm/osm_base.h | 5 -- > > osm/include/opensm/osm_sm.h | 2 - > > osm/include/opensm/osm_state_mgr.h | 8 --- > > osm/include/opensm/osm_ucast_mgr.h | 5 -- > > osm/opensm/osm_mcast_mgr.c | 11 ++-- > > osm/opensm/osm_sm.c | 15 +----- > > osm/opensm/osm_state_mgr.c | 104 ++++++++++------------------------- > > osm/opensm/osm_ucast_mgr.c | 70 +++++++----------------- > > 8 files changed, 57 insertions(+), 163 deletions(-) > > > > diff --git a/osm/include/opensm/osm_base.h b/osm/include/opensm/osm_base.h > > index 57dd4fd..20e2cc3 100644 > > --- a/osm/include/opensm/osm_base.h > > +++ b/osm/include/opensm/osm_base.h > > @@ -714,11 +714,6 @@ typedef enum _osm_state_mgr_mode > > * > > **********/ > > > > -#define OSM_REPORT_BUF_SIZE 0x10000 > > -#define OSM_REPORT_LINE_SIZE 0x256 > > -#define OSM_REPORT_BUF_THRESHOLD (OSM_REPORT_BUF_SIZE / OSM_REPORT_LINE_SIZE) > > - > > - > > /****d* OpenSM: Base/osm_sm_signal_t > > * NAME > > * osm_sm_signal_t > > diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h > > index bc812f3..05b87ac 100644 > > --- a/osm/include/opensm/osm_sm.h > > +++ b/osm/include/opensm/osm_sm.h > > @@ -178,8 +178,6 @@ typedef struct _osm_sm > > osm_vla_rcv_ctrl_t vla_rcv_ctrl; > > osm_pkey_rcv_t pkey_rcv; > > osm_pkey_rcv_ctrl_t pkey_rcv_ctrl; > > - char* p_report_buf; > > - > > } osm_sm_t; > > /* > > * FIELDS > > diff --git a/osm/include/opensm/osm_state_mgr.h b/osm/include/opensm/osm_state_mgr.h > > index ad4afa0..7aaab58 100644 > > --- a/osm/include/opensm/osm_state_mgr.h > > +++ b/osm/include/opensm/osm_state_mgr.h > > @@ -121,7 +121,6 @@ typedef struct _osm_state_mgr > > cl_qlist_t idle_time_list; > > cl_plock_t *p_lock; > > cl_event_t *p_subnet_up_event; > > - char *p_report_buf; > > osm_sm_state_t state; > > osm_state_mgr_mode_t state_step_mode; > > osm_signal_t next_stage_signal; > > @@ -170,9 +169,6 @@ typedef struct _osm_state_mgr > > * p_subnet_up_event > > * Pointer to the event to set if/when the subnet comes up. > > * > > -* p_report_buf > > -* Pointer to the large log buffer used for user reports. > > -* > > * state > > * State of the SM. > > * > > @@ -380,7 +376,6 @@ osm_state_mgr_init( > > IN const osm_sm_mad_ctrl_t* const p_mad_ctrl, > > IN cl_plock_t* const p_lock, > > IN cl_event_t* const p_subnet_up_event, > > - IN char* const p_report_buf, > > IN osm_log_t* const p_log ); > > /* > > * PARAMETERS > > @@ -420,9 +415,6 @@ osm_state_mgr_init( > > * p_subnet_up_event > > * [in] Pointer to the event to set if/when the subnet comes up. > > * > > -* p_report_buf > > -* [in] Pointer to the large log buffer used for user reports. > > -* > > * p_log > > * [in] Pointer to the log object. > > * > > diff --git a/osm/include/opensm/osm_ucast_mgr.h b/osm/include/opensm/osm_ucast_mgr.h > > index 0fbfc66..1c10abb 100644 > > --- a/osm/include/opensm/osm_ucast_mgr.h > > +++ b/osm/include/opensm/osm_ucast_mgr.h > > @@ -105,7 +105,6 @@ typedef struct _osm_ucast_mgr > > osm_req_t *p_req; > > osm_log_t *p_log; > > cl_plock_t *p_lock; > > - char *p_report_buf; > > } osm_ucast_mgr_t; > > /* > > * FIELDS > > @@ -204,7 +203,6 @@ osm_ucast_mgr_init( > > IN osm_ucast_mgr_t* const p_mgr, > > IN osm_req_t* const p_req, > > IN osm_subn_t* const p_subn, > > - IN char* const p_report_buf, > > IN osm_log_t* const p_log, > > IN cl_plock_t* const p_lock ); > > /* > > @@ -218,9 +216,6 @@ osm_ucast_mgr_init( > > * p_subn > > * [in] Pointer to the Subnet object for this subnet. > > * > > -* p_report_buf > > -* [in] Pointer to the large log buffer used for user reporting. > > -* > > * p_log > > * [in] Pointer to the log object. > > * > > diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c > > index 5a01578..82ef7c3 100644 > > --- a/osm/opensm/osm_mcast_mgr.c > > +++ b/osm/opensm/osm_mcast_mgr.c > > @@ -1382,14 +1382,13 @@ static void > > mcast_mgr_dump_sw_routes( > > IN const osm_mcast_mgr_t* const p_mgr, > > IN const osm_switch_t* const p_sw, > > - IN FILE *p_mcfdbFile ) > > + IN FILE *file ) > > { > > osm_mcast_tbl_t* p_tbl; > > int16_t mlid_ho = 0; > > int16_t mlid_start_ho; > > uint8_t position = 0; > > int16_t block_num = 0; > > - char line[OSM_REPORT_LINE_SIZE]; > > boolean_t print_lid; > > const osm_node_t* p_node; > > uint16_t i, j; > > @@ -1404,7 +1403,7 @@ mcast_mgr_dump_sw_routes( > > > > p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); > > > > - fprintf( p_mcfdbFile, "\nSwitch 0x%016" PRIx64 "\n" > > + fprintf( file, "\nSwitch 0x%016" PRIx64 "\n" > > "LID : Out Port(s)\n", > > cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > > while ( block_num <= p_tbl->max_block_in_use ) > > @@ -1415,7 +1414,7 @@ mcast_mgr_dump_sw_routes( > > mlid_ho = mlid_start_ho + i; > > position = 0; > > print_lid = FALSE; > > - sprintf( line, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); > > + fprintf( file, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); > > while ( position <= p_tbl->max_position ) > > { > > mask_entry = cl_ntoh16((*p_tbl->p_mask_tbl)[mlid_ho][position]); > > @@ -1428,13 +1427,13 @@ mcast_mgr_dump_sw_routes( > > for (j = 0 ; j < 16 ; j++) > > { > > if ( (1 << j) & mask_entry ) > > - sprintf( line, "%s 0x%03X ", line, j+(position*16) ); > > + fprintf( file, " 0x%03X ", j+(position*16) ); > > } > > position++; > > } > > if (print_lid) > > { > > - fprintf( p_mcfdbFile, "%s\n", line ); > > + fprintf( file, "\n" ); > > } > > } > > block_num++; > > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > > index fef3cac..fb4f759 100644 > > --- a/osm/opensm/osm_sm.c > > +++ b/osm/opensm/osm_sm.c > > @@ -256,9 +256,6 @@ osm_sm_destroy( > > cl_event_destroy( &p_sm->signal ); > > cl_event_destroy( &p_sm->subnet_up_event ); > > > > - if( p_sm->p_report_buf != NULL ) > > - free( p_sm->p_report_buf ); > > - > > osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ > > OSM_LOG_EXIT( p_sm->p_log ); > > } > > @@ -291,15 +288,6 @@ osm_sm_init( > > p_sm->p_disp = p_disp; > > p_sm->p_lock = p_lock; > > > > - p_sm->p_report_buf = malloc( OSM_REPORT_BUF_SIZE ); > > - if( p_sm->p_report_buf == NULL ) > > - { > > - osm_log( p_sm->p_log, OSM_LOG_ERROR, > > - "osm_sm_init: ERR 2E09: " > > - "Can't allocate report buffer\n" ); > > - status = IB_INSUFFICIENT_MEMORY; > > - goto Exit; > > - } > > status = cl_event_init( &p_sm->signal, FALSE ); > > if( status != CL_SUCCESS ) > > goto Exit; > > @@ -385,7 +373,6 @@ osm_sm_init( > > status = osm_ucast_mgr_init( &p_sm->ucast_mgr, > > &p_sm->req, > > p_sm->p_subn, > > - p_sm->p_report_buf, > > p_sm->p_log, p_sm->p_lock ); > > if( status != IB_SUCCESS ) > > goto Exit; > > @@ -409,7 +396,7 @@ osm_sm_init( > > &p_sm->mad_ctrl, > > p_sm->p_lock, > > &p_sm->subnet_up_event, > > - p_sm->p_report_buf, p_sm->p_log ); > > + p_sm->p_log ); > > if( status != IB_SUCCESS ) > > goto Exit; > > > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > index d43e9fc..9c159df 100644 > > --- a/osm/opensm/osm_state_mgr.c > > +++ b/osm/opensm/osm_state_mgr.c > > @@ -118,7 +118,6 @@ osm_state_mgr_init( > > IN const osm_sm_mad_ctrl_t * const p_mad_ctrl, > > IN cl_plock_t * const p_lock, > > IN cl_event_t * const p_subnet_up_event, > > - IN char *const p_report_buf, > > IN osm_log_t * const p_log ) > > { > > cl_status_t status; > > @@ -136,7 +135,6 @@ osm_state_mgr_init( > > CL_ASSERT( p_sm_state_mgr ); > > CL_ASSERT( p_mad_ctrl ); > > CL_ASSERT( p_lock ); > > - CL_ASSERT( p_report_buf ); > > > > osm_state_mgr_construct( p_mgr ); > > > > @@ -154,7 +152,6 @@ osm_state_mgr_init( > > p_mgr->state = OSM_SM_STATE_IDLE; > > p_mgr->p_lock = p_lock; > > p_mgr->p_subnet_up_event = p_subnet_up_event; > > - p_mgr->p_report_buf = p_report_buf; > > p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; > > p_mgr->next_stage_signal = OSM_SIGNAL_NONE; > > > > @@ -1247,16 +1244,19 @@ __osm_state_mgr_report( > > uint8_t port_num; > > uint8_t start_port; > > uint32_t num_ports; > > - char line[OSM_REPORT_LINE_SIZE]; > > uint8_t node_type; > > - uint32_t line_num = 0; > > + > > + if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > > + return; > > > > OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report ); > > > > - if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > > - { > > - goto Exit; > > - } > > + fprintf( stdout, > > + "\n===================================================" > > + "====================================================" > > + "\nVendor : Ty " > > + ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " > > + " : Neighbor Port (Port #)\n" ); > > > > p_tbl = &p_mgr->p_subn->port_guid_tbl; > > > > @@ -1286,91 +1286,56 @@ __osm_state_mgr_report( > > num_ports = osm_port_get_num_physp( p_port ); > > for( port_num = start_port; port_num < num_ports; port_num++ ) > > { > > - if( line_num == 0 ) > > - { > > - strcpy( p_mgr->p_report_buf, > > - "\n===================================================" > > - "====================================================" ); > > - strcat( p_mgr->p_report_buf, > > - "\nVendor : Ty " > > - ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " > > - " : Neighbor Port (Port #)\n" ); > > - line_num++; > > - } > > - > > p_physp = osm_port_get_phys_ptr( p_port, port_num ); > > if( ( p_physp == NULL ) || ( !osm_physp_is_valid( p_physp ) ) ) > > continue; > > > > - sprintf( line, "%s : %s : %02X :", > > + fprintf( stdout, "%s : %s : %02X :", > > osm_get_manufacturer_str( cl_ntoh64 > > ( osm_node_get_node_guid > > ( p_node ) ) ), > > osm_get_node_type_str_fixed_width( node_type ), port_num ); > > > > - strcat( p_mgr->p_report_buf, line ); > > - > > p_pi = osm_physp_get_port_info_ptr( p_physp ); > > > > /* > > * Port state is not defined for switch port 0 > > */ > > if( port_num == 0 ) > > - strcat( p_mgr->p_report_buf, " :" ); > > + fprintf( stdout, " :" ); > > else > > - { > > - sprintf( line, " %s :", > > + fprintf( stdout, " %s :", > > osm_get_port_state_str_fixed_width > > ( ib_port_info_get_port_state( p_pi ) ) ); > > - strcat( p_mgr->p_report_buf, line ); > > - } > > > > /* > > * LID values are only meaningful in select cases. > > */ > > - if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) > > - { > > - if( ( ( node_type == IB_NODE_TYPE_SWITCH ) && ( port_num == 0 ) ) > > - || ( node_type != IB_NODE_TYPE_SWITCH ) ) > > - { > > - sprintf( line, " %04X : %01X :", > > - cl_ntoh16( p_pi->base_lid ), > > - ib_port_info_get_lmc( p_pi ) ); > > - > > - strcat( p_mgr->p_report_buf, line ); > > - } > > - else > > - strcat( p_mgr->p_report_buf, " : :" ); > > - } > > + if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN > > + && ( ( node_type == IB_NODE_TYPE_SWITCH && port_num == 0 ) > > + || node_type != IB_NODE_TYPE_SWITCH ) ) > > + fprintf( stdout, " %04X : %01X :", > > + cl_ntoh16( p_pi->base_lid ), > > + ib_port_info_get_lmc( p_pi ) ); > > else > > - strcat( p_mgr->p_report_buf, " : :" ); > > + fprintf( stdout, " : :" ); > > > > if( port_num != 0 ) > > - { > > - sprintf( line, " %s : %s : %s ", > > + fprintf( stdout, " %s : %s : %s ", > > osm_get_mtu_str( ib_port_info_get_neighbor_mtu( p_pi ) ), > > osm_get_lwa_str( p_pi->link_width_active ), > > osm_get_lsa_str( ib_port_info_get_link_speed_active > > ( p_pi ) ) ); > > - } > > else > > - { > > - sprintf( line, " %s : %s : %s ", " ", " ", " " ); > > - } > > - strcat( p_mgr->p_report_buf, line ); > > + fprintf( stdout, " : : " ); > > > > if( osm_physp_get_port_guid( p_physp ) == > > p_mgr->p_subn->sm_port_guid ) > > - { > > - sprintf( line, "* %016" PRIx64 " *", > > + fprintf( stdout, "* %016" PRIx64 " *", > > cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); > > - } > > else > > - { > > - sprintf( line, ": %016" PRIx64 " :", > > + fprintf( stdout, ": %016" PRIx64 " :", > > cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); > > - } > > - strcat( p_mgr->p_report_buf, line ); > > > > if( port_num && > > ( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) ) > > @@ -1378,36 +1343,27 @@ __osm_state_mgr_report( > > p_remote_physp = osm_physp_get_remote( p_physp ); > > if( p_remote_physp && osm_physp_is_valid( p_remote_physp ) ) > > { > > - sprintf( line, " %016" PRIx64 " (%02X)", > > + fprintf( stdout, " %016" PRIx64 " (%02X)", > > cl_ntoh64( osm_physp_get_port_guid > > ( p_remote_physp ) ), > > osm_physp_get_port_num( p_remote_physp ) ); > > - strcat( p_mgr->p_report_buf, line ); > > } > > else > > - strcat( p_mgr->p_report_buf, " UNKNOWN" ); > > + fprintf( stdout, " UNKNOWN" ); > > } > > > > - strcat( p_mgr->p_report_buf, "\n" ); > > - > > - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) > > - { > > - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); > > - line_num = 0; > > - } > > + fprintf( stdout, "\n" ); > > } > > - strcat( p_mgr->p_report_buf, > > + > > + fprintf( stdout, > > "------------------------------------------------------" > > "------------------------------------------------\n" ); > > p_port = ( osm_port_t * ) cl_qmap_next( &p_port->map_item ); > > } > > > > - CL_PLOCK_RELEASE( p_mgr->p_lock ); > > - > > - if( line_num != 0 ) > > - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); > > + fflush(stdout); > > > > - Exit: > > + CL_PLOCK_RELEASE( p_mgr->p_lock ); > > OSM_LOG_EXIT( p_mgr->p_log ); > > } > > > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c > > index 39d6899..da9e9f2 100644 > > --- a/osm/opensm/osm_ucast_mgr.c > > +++ b/osm/opensm/osm_ucast_mgr.c > > @@ -103,7 +103,6 @@ osm_ucast_mgr_init( > > IN osm_ucast_mgr_t* const p_mgr, > > IN osm_req_t* const p_req, > > IN osm_subn_t* const p_subn, > > - IN char* const p_report_buf, > > IN osm_log_t* const p_log, > > IN cl_plock_t* const p_lock ) > > { > > @@ -121,7 +120,6 @@ osm_ucast_mgr_init( > > p_mgr->p_subn = p_subn; > > p_mgr->p_lock = p_lock; > > p_mgr->p_req = p_req; > > - p_mgr->p_report_buf = p_report_buf; > > > > OSM_LOG_EXIT( p_mgr->p_log ); > > return( status ); > > @@ -140,14 +138,13 @@ __osm_ucast_mgr_dump_path_distribution( > > uint8_t num_ports; > > uint32_t num_paths; > > ib_net64_t remote_guid_ho; > > - char line[OSM_REPORT_LINE_SIZE]; > > > > OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution ); > > > > p_node = osm_switch_get_node_ptr( p_sw ); > > num_ports = osm_switch_get_num_ports( p_sw ); > > > > - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_path_distribution: " > > + fprintf( stdout, "__osm_ucast_mgr_dump_path_distribution: " > > "Switch 0x%" PRIx64 "\n" > > "Port : Path Count Through Port", > > cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > > @@ -155,11 +152,10 @@ __osm_ucast_mgr_dump_path_distribution( > > for( i = 0; i < num_ports; i++ ) > > { > > num_paths = osm_switch_path_count_get( p_sw , i ); > > - sprintf( line, "\n %03u : %u", i, num_paths ); > > - strcat( p_mgr->p_report_buf, line ); > > + fprintf( stdout, "\n %03u : %u", i, num_paths ); > > if( i == 0 ) > > { > > - strcat( p_mgr->p_report_buf, " (switch management port)" ); > > + fprintf( stdout, " (switch management port)" ); > > continue; > > } > > > > @@ -172,26 +168,23 @@ __osm_ucast_mgr_dump_path_distribution( > > switch( osm_node_get_remote_type( p_node, i ) ) > > { > > case IB_NODE_TYPE_SWITCH: > > - strcat( p_mgr->p_report_buf, " (link to switch" ); > > + fprintf( stdout, " (link to switch" ); > > break; > > case IB_NODE_TYPE_ROUTER: > > - strcat( p_mgr->p_report_buf, " (link to router" ); > > + fprintf( stdout, " (link to router" ); > > break; > > case IB_NODE_TYPE_CA: > > - strcat( p_mgr->p_report_buf, " (link to CA" ); > > + fprintf( stdout, " (link to CA" ); > > break; > > default: > > - strcat( p_mgr->p_report_buf, " (link to unknown node type" ); > > + fprintf( stdout, " (link to unknown node type" ); > > break; > > } > > > > - sprintf( line, " 0x%" PRIx64 ")", remote_guid_ho ); > > - strcat( p_mgr->p_report_buf, line ); > > + fprintf( stdout, " 0x%" PRIx64 ")", remote_guid_ho ); > > } > > > > - strcat( p_mgr->p_report_buf, "\n" ); > > - > > - osm_log_raw( p_mgr->p_log, OSM_LOG_ROUTING, p_mgr->p_report_buf ); > > + fprintf( stdout, "\n" ); > > > > OSM_LOG_EXIT( p_mgr->p_log ); > > } > > @@ -202,7 +195,7 @@ static void > > __osm_ucast_mgr_dump_ucast_routes( > > IN const osm_ucast_mgr_t* const p_mgr, > > IN const osm_switch_t* const p_sw, > > - IN FILE *p_fdbFile ) > > + IN FILE *file ) > > { > > const osm_node_t* p_node; > > uint8_t port_num; > > @@ -211,8 +204,6 @@ __osm_ucast_mgr_dump_ucast_routes( > > uint8_t best_port; > > uint16_t max_lid_ho; > > uint16_t lid_ho; > > - char line[OSM_REPORT_LINE_SIZE]; > > - uint32_t line_num = 0; > > boolean_t ui_ucast_fdb_assign_func_defined; > > > > OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes ); > > @@ -221,16 +212,13 @@ __osm_ucast_mgr_dump_ucast_routes( > > > > max_lid_ho = osm_switch_get_max_lid_ho( p_sw ); > > > > + fprintf( file, "__osm_ucast_mgr_dump_ucast_routes: " > > + "Switch 0x%016" PRIx64 "\n" > > + "LID : Port : Hops : Optimal\n", > > + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > > for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) > > { > > - if( line_num == 0 ) > > - { > > - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_ucast_routes: " > > - "Switch 0x%016" PRIx64 "\n" > > - "LID : Port : Hops : Optimal\n", > > - cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); > > - line_num++; > > - } > > + fprintf(file, "0x%04X : ", lid_ho); > > > > port_num = osm_switch_get_port_by_lid( p_sw, lid_ho ); > > if( port_num == OSM_NO_PATH ) > > @@ -241,9 +229,7 @@ __osm_ucast_mgr_dump_ucast_routes( > > will reassign and compress the LID range. The > > subnet should work fine either way. > > */ > > - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); > > - strcat( p_mgr->p_report_buf, line ); > > - line_num++; > > + fprintf( file, "UNREACHABLE\n" ); > > continue; > > } > > /* > > @@ -255,19 +241,15 @@ __osm_ucast_mgr_dump_ucast_routes( > > num_hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); > > if( num_hops == OSM_NO_PATH ) > > { > > - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); > > - strcat( p_mgr->p_report_buf, line ); > > - line_num++; > > + fprintf( file, "UNREACHABLE\n" ); > > continue; > > } > > > > best_hops = osm_switch_get_least_hops( p_sw, lid_ho ); > > - sprintf( line, "0x%04X : %03u : %02u : ", > > - lid_ho, port_num, num_hops ); > > - strcat( p_mgr->p_report_buf, line ); > > + fprintf( file, "%03u : %02u : ", port_num, num_hops ); > > > > if( best_hops == num_hops ) > > - strcat( p_mgr->p_report_buf, "yes" ); > > + fprintf( file, "yes" ); > > else > > { > > if (p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign) > > @@ -282,23 +264,13 @@ __osm_ucast_mgr_dump_ucast_routes( > > p_sw, lid_ho, TRUE, > > NULL, NULL, NULL, NULL, /* No LMC Optimization */ > > ui_ucast_fdb_assign_func_defined ); > > - sprintf( line, "No %u hop path possible via port %u!", > > + fprintf( file, "No %u hop path possible via port %u!", > > best_hops, best_port ); > > - strcat( p_mgr->p_report_buf, line ); > > } > > > > - strcat( p_mgr->p_report_buf, "\n" ); > > - > > - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) > > - { > > - fprintf(p_fdbFile,"%s",p_mgr->p_report_buf ); > > - line_num = 0; > > - } > > + fprintf( file, "\n" ); > > } > > > > - if( line_num != 0 ) > > - fprintf(p_fdbFile,"%s\n",p_mgr->p_report_buf ); > > - > > OSM_LOG_EXIT( p_mgr->p_log ); > > } > > From jsquyres at cisco.com Mon Oct 23 04:36:06 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Mon, 23 Oct 2006 07:36:06 -0400 Subject: [openib-general] [mvapich] Announcing the release of MVAPICH2 0.9.6 with on-demand connection management, multi-core optimized shared memory communication and memory hook support In-Reply-To: <200610230353.k9N3r4de015233@xi.cse.ohio-state.edu> References: <200610230353.k9N3r4de015233@xi.cse.ohio-state.edu> Message-ID: <8A563782-679E-411E-80C2-5A31D40C40AE@cisco.com> On Oct 22, 2006, at 11:53 PM, Dhabaleswar Panda wrote: > A stripped down version of this release is also available at the > OpenIB SVN. I see this statement in every MVAPICH release notice and it continues to puzzle me. I understand that there was a use for an alternate distribution source before MVAPICH became open source. But now that the MVAPICH code bases are freely available from OSU via multiple mechanisms (anonymous SVN, tarball download, etc.), why is a "stripped down version" maintained in the OpenIB SVN? 1. What, exactly, is the difference between the MVAPICH available from OSU and the "stripped down version" in the OpenIB SVN? 2. Why would someone choose to download the "stripped down version" from the OpenIB SVN? Have any real users/customers done so? 3. What is the point of maintaining yet more flavors of MVAPICH -- aren't there enough already (multiple versions from OSU, more versions available from each IB vendor)? DK -- can you please explain? Thanks. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From HNGUYEN at de.ibm.com Mon Oct 23 04:39:12 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 23 Oct 2006 13:39:12 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: <453C6F2E.8040806@dev.mellanox.co.il> Message-ID: Hello Tziporet! > I suggest that you fix what you want and create 1.0.1.1. You can place > it in the svn releases area > and direct people that need ehca to this version. > We did something similar when we published 1.0.1 for which we added > SLES9 SP3 support. > Please place it on > https://openib.org/svn/gen2/branches/1.1/ofed/releases/ (I assume you > have a check-in permission to svn). Thanks for this. Will do in next couple of days. Regards Nam Nguyen From rdreier at cisco.com Mon Oct 23 04:41:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 04:41:29 -0700 Subject: [openib-general] OFED 1.1 on Debian based system? References: <453BD076.80006@pruesse.net> Message-ID: > Are there any "best-practices"? Or is maybe even someone working on > debian packages? It would probably be quite a bit of work to get OFED installed on Debian, but the Debian kernels are quite recent, and libibverbs and libmthca are already available in the main Debian archive, so depending on what you are trying to do, you may be able to just use standard Debian packages. - R. From tziporet at dev.mellanox.co.il Mon Oct 23 05:39:43 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 23 Oct 2006 14:39:43 +0200 Subject: [openib-general] [openfabrics-ewg] [PATCH OFED-1.1-rc7] libehca configure: fix missing check of libsysfs In-Reply-To: References: Message-ID: <453CB80F.4090608@dev.mellanox.co.il> Hoang-Nam Nguyen wrote: > Thanks for this. Will do in next couple of days. > Regards > Nam Nguyen > I had a mistake in the version number - you should call it 1.1.1 Tziporet From ogerlitz at voltaire.com Mon Oct 23 05:46:31 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 23 Oct 2006 14:46:31 +0200 Subject: [openib-general] OFED1.1-rc7 build failure on 2.6.9-prep (RH4 U3 hand built) system In-Reply-To: <6C2C79E72C305246B504CBA17B5500C93478F4@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C93478F4@mtlexch01.mtl.com> Message-ID: <453CB9A7.4020006@voltaire.com> Vladimir Sokolovsky wrote: > Hi Or, > I think that required for 2.6.9-34.EL kernel backport patches from > kernel_patches/backport/2.6.9_U3 directory are not applied by configure > script. > You should change kernel name to be 2.6.9-34*. Hi Vlad, Here's the thing: the uname -r of the locally built kernel is 2.6.9-prep, so once you boot with it, the ofed build scripts apply the patches from the ***2.6.9*** backport directory and not from the 2.6.9-34 directory. These patches seem to be broken, maybe you want to remove them? Or. From Sean.Hubbell at msl.army.mil Mon Oct 23 05:53:06 2006 From: Sean.Hubbell at msl.army.mil (Hubbell, Sean C Contractor/Decibel) Date: Mon, 23 Oct 2006 07:53:06 -0500 Subject: [openib-general] IPoIB Question Message-ID: Hello, I currently have several applications that uses a legacy IPv4 protocol and I use IPoIB to utilize my infiniband network which works great. I have completed some timing and throughput analysis and noticed that I do not get very much more if I use an infiniband network interface than using my GigE network interface. My question is, am I using IPoIB correctly or are these the typical numbers that everyone is seeing? Is there a standard application that I may use to test my current configuration? Thanks in advance, Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Oct 23 05:51:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Oct 2006 08:51:45 -0400 Subject: [openib-general] [PATCH] opensm: updn: make local functions static + local types In-Reply-To: <20061019212639.GA24600@sashak.voltaire.com> References: <20061019212639.GA24600@sashak.voltaire.com> Message-ID: <1161607834.25985.300124.camel@hal.voltaire.com> On Thu, 2006-10-19 at 17:26, Sasha Khapyorsky wrote: > This makes local functions static and moves definitions of locally used > types to .c file. > > Signed-off-by: Sasha Khapyorsky > --- > osm/include/opensm/osm_opensm.h | 1 - > osm/include/opensm/osm_ucast_updn.h | 349 ----------------------------------- > osm/opensm/osm_ucast_updn.c | 81 +++++++- > 3 files changed, 70 insertions(+), 361 deletions(-) Thanks. Applied with some cosmetic changes. Also, despite agreeing that it should be done, I did add __ in front of those local osm_ functions in the interest of moving on and temporarily ending the bickering. These are purely cosmetic and will hopefully disappear if ever there will be agreement on either making incremental changes in an incremental way or a wholesale updated coding style but given that we cannot come to any common ground, I remain highly skeptical on this. -- Hal From kliteyn at dev.mellanox.co.il Mon Oct 23 06:29:55 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 23 Oct 2006 15:29:55 +0200 Subject: [openib-general] [PATCH] osm: port to WinIB stack - osm_log.c Message-ID: <453CC3D3.1080608@dev.mellanox.co.il> Hi Hal Fixing something that got lost in the porting - osm_log reports state for windows service. -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osm_log.c =================================================================== --- osm_log.c (revision 9941) +++ osm_log.c (working copy) @@ -81,6 +81,13 @@ static char *month_str[] = { "Nov", "Dec" }; +#else +void +OsmReportState( + IN const char *p_str); +#endif /* ndef WIN32 */ + +#ifndef WIN32 static void truncate_log_file(osm_log_t* const p_log) { @@ -146,6 +153,9 @@ osm_log( printf("%s\n", buffer); fflush( stdout ); } +#ifdef WIN32 + OsmReportState(buffer); +#endif /* WIN32 */ } /* regular log to default out_port */ From halr at voltaire.com Mon Oct 23 06:29:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Oct 2006 09:29:15 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_ucast_updn.h: Eliminate superfluous header file Message-ID: <1161610125.25985.301206.camel@hal.voltaire.com> OpenSM/osm_ucast_updn.h: Eliminate superfluous header file Signed-off-by: Hal Rosenstock Index: include/opensm/osm_ucast_updn.h =================================================================== --- include/opensm/osm_ucast_updn.h (revision 9945) +++ include/opensm/osm_ucast_updn.h (working copy) @@ -1,84 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - -#ifndef _OSM_UCAST_UPDN_H_ -#define _OSM_UCAST_UPDN_H_ - -/* - * Abstract: - * Implementation of Up Down Algorithm using ranking & Min Hop - * Calculation functions - * - * Environment: - * Linux User Mode - * - * $Revision: 1.0 $ - */ -/* LS : This code is useless since we integrate it with opensm */ -/* -#include -#include -#include -#include -#include -#include -*/ - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/* //////////////////////////// */ -/* ENUM TypeDefs */ -/* /////////////////////////// */ - -/* ////////////////////////////////// */ -/* Struct TypeDefs */ -/* ///////////////////////////////// */ - -/* ////////////////////////////// */ -/* Function */ -/* ////////////////////////////// */ - -END_C_DECLS - -#endif /* _OSM_UCAST_UPDN_H_ */ Index: include/Makefile.am =================================================================== --- include/Makefile.am (revision 9918) +++ include/Makefile.am (working copy) @@ -90,7 +90,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_node_info_rcv_ctrl.h \ $(srcdir)/opensm/osm_link_mgr.h \ $(srcdir)/opensm/osm_mcast_fwd_rcv_ctrl.h \ - $(srcdir)/opensm/osm_ucast_updn.h \ $(srcdir)/opensm/osm_msgdef.h \ $(srcdir)/opensm/osm_sa_node_record.h \ $(srcdir)/opensm/st.h \ From rdreier at cisco.com Mon Oct 23 06:36:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 06:36:30 -0700 Subject: [openib-general] [PATCH] IB/iser: start conn after enabling iSER References: Message-ID: What do you want me to do with this patch? Is it a bug fix that needs to go in 2.6.19, or should I queue it for 2.6.20? Shouldn't iser/iscsi patches be cc'ed to linux-scsi and Mike Christie for review too? - R. From HNGUYEN at de.ibm.com Mon Oct 23 06:42:24 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Mon, 23 Oct 2006 15:42:24 +0200 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: <17A82EE2-E2D3-4AED-A3FD-8E85F31361A1@scl.ameslab.gov> Message-ID: Hello Troy! > The netpipe code is available with mercurial by: > hg clone http://source.scl.ameslab.gov/hg/netpipe3-pvfs-dev > Once you have pvfs2-1.5.1 installed, you should be able to do 'make > pvfs' in the netpipe3-pvfs-dev directory and build NPpvfs. > The command line arguments I used to reproduce this were: > ./NPpvfs -d $PVFS_FILE_PATH -l 32768 -u 268435456 -n 100 -o > $NETPIPE_OUTPUT_FILE Did you compile pvfs and NPpvfs as 32-bit or 64-bit libs/execs? I did compile pvfs and NPpvfs as is and realized that pvfs is built by default as 32-bit and NPpvfs as 64-bit. Hence NPpvfs complained to find incompatible pvfs libs. Regards Nam From rdreier at cisco.com Mon Oct 23 06:46:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 06:46:30 -0700 Subject: [openib-general] A question about sa_query References: <453B8B0B.2030402@dev.mellanox.co.il> Message-ID: > There is something that bothers me in sa_query. > According to table 115 in the IB-SPEC when the status in the MAD hdr > is 1,2 or 3 it shouldn't be considered to as an error. (1 means busy, > 2 means redirection, and 3 means both). > The function "recv_handler" in core/sa_query.c sets the status of the > sa_query before calling the callback function. > It sets the status according to the status returned in the mad > header. (mad_recv_wc->recv_buf.mad->mad_hdr.status) > If the status in the mad_hdr is different from 0 it sets the return > status to -EINVAL. > This mean that the higher layers (e.g., SRP) do not know what was the > exact status and therefore treat status 1 (busy) as an error. > Am I missing something? No, you are right. I know the spec says a "busy" status is not an error, but I've not seen anything actually return that, and I'm not sure what a consumer can usefully do with it. Also, we haven't tried handling SA redirection at all. Your issue would be one thing we would have to fix if we did handle it. So you are correct, and I wouldn't object to handling all this more intelligently, but I would want to see some consumer that cared, too. - R. From kliteyn at dev.mellanox.co.il Mon Oct 23 06:51:40 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 23 Oct 2006 15:51:40 +0200 Subject: [openib-general] [PATCH] opensm: remove obsolete p_report_buf In-Reply-To: <20061023113305.GB18837@sashak.voltaire.com> References: <20061020004727.GH24676@sashak.voltaire.com> <453C6919.7010008@dev.mellanox.co.il> <20061023113305.GB18837@sashak.voltaire.com> Message-ID: <453CC8EC.6000605@dev.mellanox.co.il> Sasha Khapyorsky wrote: > On 09:02 Mon 23 Oct , Yevgeny Kliteynik wrote: >> Hi Sasha. >> >> The removal of the sm->p_report_buf is a good idea. >> However, I do have one comment: >> In several cases this buffer was printed using the osm_log_raw() >> function, and you replaced this with a plain fprintf(stdout,...). >> Right now the osm_log_raw function just prints to stdout too, but >> this doesn't always have to be the case. Besides, osm_log_raw >> provides verbosity level checking, which is lost when you replace >> it with printf. > > Both functions calls were and still be conditonalized by verbosity > level, so it is not lost. Right, there is a check in the beginning of the function, mea culpa. Anyway, this wasn't the main point. I grep'ed the osm code, and the only cases where there is an explicit printing to stdout is when log is not initialized yet, or in console mode, and I think that this is a better way to manage logging (even though printing directly to stdout is more efficient). -- Yevgeny > > Sasha > >> --Yevgeny >> >> Sasha Khapyorsky wrote: >>> This removes obsolete now shared sm->p_report_buf buffer and cleans >>> up related code. >>> >>> Signed-off-by: Sasha Khapyorsky >>> --- >>> osm/include/opensm/osm_base.h | 5 -- >>> osm/include/opensm/osm_sm.h | 2 - >>> osm/include/opensm/osm_state_mgr.h | 8 --- >>> osm/include/opensm/osm_ucast_mgr.h | 5 -- >>> osm/opensm/osm_mcast_mgr.c | 11 ++-- >>> osm/opensm/osm_sm.c | 15 +----- >>> osm/opensm/osm_state_mgr.c | 104 ++++++++++------------------------- >>> osm/opensm/osm_ucast_mgr.c | 70 +++++++----------------- >>> 8 files changed, 57 insertions(+), 163 deletions(-) >>> >>> diff --git a/osm/include/opensm/osm_base.h b/osm/include/opensm/osm_base.h >>> index 57dd4fd..20e2cc3 100644 >>> --- a/osm/include/opensm/osm_base.h >>> +++ b/osm/include/opensm/osm_base.h >>> @@ -714,11 +714,6 @@ typedef enum _osm_state_mgr_mode >>> * >>> **********/ >>> >>> -#define OSM_REPORT_BUF_SIZE 0x10000 >>> -#define OSM_REPORT_LINE_SIZE 0x256 >>> -#define OSM_REPORT_BUF_THRESHOLD (OSM_REPORT_BUF_SIZE / OSM_REPORT_LINE_SIZE) >>> - >>> - >>> /****d* OpenSM: Base/osm_sm_signal_t >>> * NAME >>> * osm_sm_signal_t >>> diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h >>> index bc812f3..05b87ac 100644 >>> --- a/osm/include/opensm/osm_sm.h >>> +++ b/osm/include/opensm/osm_sm.h >>> @@ -178,8 +178,6 @@ typedef struct _osm_sm >>> osm_vla_rcv_ctrl_t vla_rcv_ctrl; >>> osm_pkey_rcv_t pkey_rcv; >>> osm_pkey_rcv_ctrl_t pkey_rcv_ctrl; >>> - char* p_report_buf; >>> - >>> } osm_sm_t; >>> /* >>> * FIELDS >>> diff --git a/osm/include/opensm/osm_state_mgr.h b/osm/include/opensm/osm_state_mgr.h >>> index ad4afa0..7aaab58 100644 >>> --- a/osm/include/opensm/osm_state_mgr.h >>> +++ b/osm/include/opensm/osm_state_mgr.h >>> @@ -121,7 +121,6 @@ typedef struct _osm_state_mgr >>> cl_qlist_t idle_time_list; >>> cl_plock_t *p_lock; >>> cl_event_t *p_subnet_up_event; >>> - char *p_report_buf; >>> osm_sm_state_t state; >>> osm_state_mgr_mode_t state_step_mode; >>> osm_signal_t next_stage_signal; >>> @@ -170,9 +169,6 @@ typedef struct _osm_state_mgr >>> * p_subnet_up_event >>> * Pointer to the event to set if/when the subnet comes up. >>> * >>> -* p_report_buf >>> -* Pointer to the large log buffer used for user reports. >>> -* >>> * state >>> * State of the SM. >>> * >>> @@ -380,7 +376,6 @@ osm_state_mgr_init( >>> IN const osm_sm_mad_ctrl_t* const p_mad_ctrl, >>> IN cl_plock_t* const p_lock, >>> IN cl_event_t* const p_subnet_up_event, >>> - IN char* const p_report_buf, >>> IN osm_log_t* const p_log ); >>> /* >>> * PARAMETERS >>> @@ -420,9 +415,6 @@ osm_state_mgr_init( >>> * p_subnet_up_event >>> * [in] Pointer to the event to set if/when the subnet comes up. >>> * >>> -* p_report_buf >>> -* [in] Pointer to the large log buffer used for user reports. >>> -* >>> * p_log >>> * [in] Pointer to the log object. >>> * >>> diff --git a/osm/include/opensm/osm_ucast_mgr.h b/osm/include/opensm/osm_ucast_mgr.h >>> index 0fbfc66..1c10abb 100644 >>> --- a/osm/include/opensm/osm_ucast_mgr.h >>> +++ b/osm/include/opensm/osm_ucast_mgr.h >>> @@ -105,7 +105,6 @@ typedef struct _osm_ucast_mgr >>> osm_req_t *p_req; >>> osm_log_t *p_log; >>> cl_plock_t *p_lock; >>> - char *p_report_buf; >>> } osm_ucast_mgr_t; >>> /* >>> * FIELDS >>> @@ -204,7 +203,6 @@ osm_ucast_mgr_init( >>> IN osm_ucast_mgr_t* const p_mgr, >>> IN osm_req_t* const p_req, >>> IN osm_subn_t* const p_subn, >>> - IN char* const p_report_buf, >>> IN osm_log_t* const p_log, >>> IN cl_plock_t* const p_lock ); >>> /* >>> @@ -218,9 +216,6 @@ osm_ucast_mgr_init( >>> * p_subn >>> * [in] Pointer to the Subnet object for this subnet. >>> * >>> -* p_report_buf >>> -* [in] Pointer to the large log buffer used for user reporting. >>> -* >>> * p_log >>> * [in] Pointer to the log object. >>> * >>> diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c >>> index 5a01578..82ef7c3 100644 >>> --- a/osm/opensm/osm_mcast_mgr.c >>> +++ b/osm/opensm/osm_mcast_mgr.c >>> @@ -1382,14 +1382,13 @@ static void >>> mcast_mgr_dump_sw_routes( >>> IN const osm_mcast_mgr_t* const p_mgr, >>> IN const osm_switch_t* const p_sw, >>> - IN FILE *p_mcfdbFile ) >>> + IN FILE *file ) >>> { >>> osm_mcast_tbl_t* p_tbl; >>> int16_t mlid_ho = 0; >>> int16_t mlid_start_ho; >>> uint8_t position = 0; >>> int16_t block_num = 0; >>> - char line[OSM_REPORT_LINE_SIZE]; >>> boolean_t print_lid; >>> const osm_node_t* p_node; >>> uint16_t i, j; >>> @@ -1404,7 +1403,7 @@ mcast_mgr_dump_sw_routes( >>> >>> p_tbl = osm_switch_get_mcast_tbl_ptr( p_sw ); >>> >>> - fprintf( p_mcfdbFile, "\nSwitch 0x%016" PRIx64 "\n" >>> + fprintf( file, "\nSwitch 0x%016" PRIx64 "\n" >>> "LID : Out Port(s)\n", >>> cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); >>> while ( block_num <= p_tbl->max_block_in_use ) >>> @@ -1415,7 +1414,7 @@ mcast_mgr_dump_sw_routes( >>> mlid_ho = mlid_start_ho + i; >>> position = 0; >>> print_lid = FALSE; >>> - sprintf( line, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); >>> + fprintf( file, "0x%04X :", mlid_ho + IB_LID_MCAST_START_HO ); >>> while ( position <= p_tbl->max_position ) >>> { >>> mask_entry = cl_ntoh16((*p_tbl->p_mask_tbl)[mlid_ho][position]); >>> @@ -1428,13 +1427,13 @@ mcast_mgr_dump_sw_routes( >>> for (j = 0 ; j < 16 ; j++) >>> { >>> if ( (1 << j) & mask_entry ) >>> - sprintf( line, "%s 0x%03X ", line, j+(position*16) ); >>> + fprintf( file, " 0x%03X ", j+(position*16) ); >>> } >>> position++; >>> } >>> if (print_lid) >>> { >>> - fprintf( p_mcfdbFile, "%s\n", line ); >>> + fprintf( file, "\n" ); >>> } >>> } >>> block_num++; >>> diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c >>> index fef3cac..fb4f759 100644 >>> --- a/osm/opensm/osm_sm.c >>> +++ b/osm/opensm/osm_sm.c >>> @@ -256,9 +256,6 @@ osm_sm_destroy( >>> cl_event_destroy( &p_sm->signal ); >>> cl_event_destroy( &p_sm->subnet_up_event ); >>> >>> - if( p_sm->p_report_buf != NULL ) >>> - free( p_sm->p_report_buf ); >>> - >>> osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ >>> OSM_LOG_EXIT( p_sm->p_log ); >>> } >>> @@ -291,15 +288,6 @@ osm_sm_init( >>> p_sm->p_disp = p_disp; >>> p_sm->p_lock = p_lock; >>> >>> - p_sm->p_report_buf = malloc( OSM_REPORT_BUF_SIZE ); >>> - if( p_sm->p_report_buf == NULL ) >>> - { >>> - osm_log( p_sm->p_log, OSM_LOG_ERROR, >>> - "osm_sm_init: ERR 2E09: " >>> - "Can't allocate report buffer\n" ); >>> - status = IB_INSUFFICIENT_MEMORY; >>> - goto Exit; >>> - } >>> status = cl_event_init( &p_sm->signal, FALSE ); >>> if( status != CL_SUCCESS ) >>> goto Exit; >>> @@ -385,7 +373,6 @@ osm_sm_init( >>> status = osm_ucast_mgr_init( &p_sm->ucast_mgr, >>> &p_sm->req, >>> p_sm->p_subn, >>> - p_sm->p_report_buf, >>> p_sm->p_log, p_sm->p_lock ); >>> if( status != IB_SUCCESS ) >>> goto Exit; >>> @@ -409,7 +396,7 @@ osm_sm_init( >>> &p_sm->mad_ctrl, >>> p_sm->p_lock, >>> &p_sm->subnet_up_event, >>> - p_sm->p_report_buf, p_sm->p_log ); >>> + p_sm->p_log ); >>> if( status != IB_SUCCESS ) >>> goto Exit; >>> >>> diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c >>> index d43e9fc..9c159df 100644 >>> --- a/osm/opensm/osm_state_mgr.c >>> +++ b/osm/opensm/osm_state_mgr.c >>> @@ -118,7 +118,6 @@ osm_state_mgr_init( >>> IN const osm_sm_mad_ctrl_t * const p_mad_ctrl, >>> IN cl_plock_t * const p_lock, >>> IN cl_event_t * const p_subnet_up_event, >>> - IN char *const p_report_buf, >>> IN osm_log_t * const p_log ) >>> { >>> cl_status_t status; >>> @@ -136,7 +135,6 @@ osm_state_mgr_init( >>> CL_ASSERT( p_sm_state_mgr ); >>> CL_ASSERT( p_mad_ctrl ); >>> CL_ASSERT( p_lock ); >>> - CL_ASSERT( p_report_buf ); >>> >>> osm_state_mgr_construct( p_mgr ); >>> >>> @@ -154,7 +152,6 @@ osm_state_mgr_init( >>> p_mgr->state = OSM_SM_STATE_IDLE; >>> p_mgr->p_lock = p_lock; >>> p_mgr->p_subnet_up_event = p_subnet_up_event; >>> - p_mgr->p_report_buf = p_report_buf; >>> p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; >>> p_mgr->next_stage_signal = OSM_SIGNAL_NONE; >>> >>> @@ -1247,16 +1244,19 @@ __osm_state_mgr_report( >>> uint8_t port_num; >>> uint8_t start_port; >>> uint32_t num_ports; >>> - char line[OSM_REPORT_LINE_SIZE]; >>> uint8_t node_type; >>> - uint32_t line_num = 0; >>> + >>> + if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) >>> + return; >>> >>> OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_report ); >>> >>> - if( !osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) >>> - { >>> - goto Exit; >>> - } >>> + fprintf( stdout, >>> + "\n===================================================" >>> + "====================================================" >>> + "\nVendor : Ty " >>> + ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " >>> + " : Neighbor Port (Port #)\n" ); >>> >>> p_tbl = &p_mgr->p_subn->port_guid_tbl; >>> >>> @@ -1286,91 +1286,56 @@ __osm_state_mgr_report( >>> num_ports = osm_port_get_num_physp( p_port ); >>> for( port_num = start_port; port_num < num_ports; port_num++ ) >>> { >>> - if( line_num == 0 ) >>> - { >>> - strcpy( p_mgr->p_report_buf, >>> - "\n===================================================" >>> - "====================================================" ); >>> - strcat( p_mgr->p_report_buf, >>> - "\nVendor : Ty " >>> - ": # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID " >>> - " : Neighbor Port (Port #)\n" ); >>> - line_num++; >>> - } >>> - >>> p_physp = osm_port_get_phys_ptr( p_port, port_num ); >>> if( ( p_physp == NULL ) || ( !osm_physp_is_valid( p_physp ) ) ) >>> continue; >>> >>> - sprintf( line, "%s : %s : %02X :", >>> + fprintf( stdout, "%s : %s : %02X :", >>> osm_get_manufacturer_str( cl_ntoh64 >>> ( osm_node_get_node_guid >>> ( p_node ) ) ), >>> osm_get_node_type_str_fixed_width( node_type ), port_num ); >>> >>> - strcat( p_mgr->p_report_buf, line ); >>> - >>> p_pi = osm_physp_get_port_info_ptr( p_physp ); >>> >>> /* >>> * Port state is not defined for switch port 0 >>> */ >>> if( port_num == 0 ) >>> - strcat( p_mgr->p_report_buf, " :" ); >>> + fprintf( stdout, " :" ); >>> else >>> - { >>> - sprintf( line, " %s :", >>> + fprintf( stdout, " %s :", >>> osm_get_port_state_str_fixed_width >>> ( ib_port_info_get_port_state( p_pi ) ) ); >>> - strcat( p_mgr->p_report_buf, line ); >>> - } >>> >>> /* >>> * LID values are only meaningful in select cases. >>> */ >>> - if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) >>> - { >>> - if( ( ( node_type == IB_NODE_TYPE_SWITCH ) && ( port_num == 0 ) ) >>> - || ( node_type != IB_NODE_TYPE_SWITCH ) ) >>> - { >>> - sprintf( line, " %04X : %01X :", >>> - cl_ntoh16( p_pi->base_lid ), >>> - ib_port_info_get_lmc( p_pi ) ); >>> - >>> - strcat( p_mgr->p_report_buf, line ); >>> - } >>> - else >>> - strcat( p_mgr->p_report_buf, " : :" ); >>> - } >>> + if( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN >>> + && ( ( node_type == IB_NODE_TYPE_SWITCH && port_num == 0 ) >>> + || node_type != IB_NODE_TYPE_SWITCH ) ) >>> + fprintf( stdout, " %04X : %01X :", >>> + cl_ntoh16( p_pi->base_lid ), >>> + ib_port_info_get_lmc( p_pi ) ); >>> else >>> - strcat( p_mgr->p_report_buf, " : :" ); >>> + fprintf( stdout, " : :" ); >>> >>> if( port_num != 0 ) >>> - { >>> - sprintf( line, " %s : %s : %s ", >>> + fprintf( stdout, " %s : %s : %s ", >>> osm_get_mtu_str( ib_port_info_get_neighbor_mtu( p_pi ) ), >>> osm_get_lwa_str( p_pi->link_width_active ), >>> osm_get_lsa_str( ib_port_info_get_link_speed_active >>> ( p_pi ) ) ); >>> - } >>> else >>> - { >>> - sprintf( line, " %s : %s : %s ", " ", " ", " " ); >>> - } >>> - strcat( p_mgr->p_report_buf, line ); >>> + fprintf( stdout, " : : " ); >>> >>> if( osm_physp_get_port_guid( p_physp ) == >>> p_mgr->p_subn->sm_port_guid ) >>> - { >>> - sprintf( line, "* %016" PRIx64 " *", >>> + fprintf( stdout, "* %016" PRIx64 " *", >>> cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); >>> - } >>> else >>> - { >>> - sprintf( line, ": %016" PRIx64 " :", >>> + fprintf( stdout, ": %016" PRIx64 " :", >>> cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); >>> - } >>> - strcat( p_mgr->p_report_buf, line ); >>> >>> if( port_num && >>> ( ib_port_info_get_port_state( p_pi ) != IB_LINK_DOWN ) ) >>> @@ -1378,36 +1343,27 @@ __osm_state_mgr_report( >>> p_remote_physp = osm_physp_get_remote( p_physp ); >>> if( p_remote_physp && osm_physp_is_valid( p_remote_physp ) ) >>> { >>> - sprintf( line, " %016" PRIx64 " (%02X)", >>> + fprintf( stdout, " %016" PRIx64 " (%02X)", >>> cl_ntoh64( osm_physp_get_port_guid >>> ( p_remote_physp ) ), >>> osm_physp_get_port_num( p_remote_physp ) ); >>> - strcat( p_mgr->p_report_buf, line ); >>> } >>> else >>> - strcat( p_mgr->p_report_buf, " UNKNOWN" ); >>> + fprintf( stdout, " UNKNOWN" ); >>> } >>> >>> - strcat( p_mgr->p_report_buf, "\n" ); >>> - >>> - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) >>> - { >>> - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); >>> - line_num = 0; >>> - } >>> + fprintf( stdout, "\n" ); >>> } >>> - strcat( p_mgr->p_report_buf, >>> + >>> + fprintf( stdout, >>> "------------------------------------------------------" >>> "------------------------------------------------\n" ); >>> p_port = ( osm_port_t * ) cl_qmap_next( &p_port->map_item ); >>> } >>> >>> - CL_PLOCK_RELEASE( p_mgr->p_lock ); >>> - >>> - if( line_num != 0 ) >>> - osm_log_raw( p_mgr->p_log, OSM_LOG_VERBOSE, p_mgr->p_report_buf ); >>> + fflush(stdout); >>> >>> - Exit: >>> + CL_PLOCK_RELEASE( p_mgr->p_lock ); >>> OSM_LOG_EXIT( p_mgr->p_log ); >>> } >>> >>> diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c >>> index 39d6899..da9e9f2 100644 >>> --- a/osm/opensm/osm_ucast_mgr.c >>> +++ b/osm/opensm/osm_ucast_mgr.c >>> @@ -103,7 +103,6 @@ osm_ucast_mgr_init( >>> IN osm_ucast_mgr_t* const p_mgr, >>> IN osm_req_t* const p_req, >>> IN osm_subn_t* const p_subn, >>> - IN char* const p_report_buf, >>> IN osm_log_t* const p_log, >>> IN cl_plock_t* const p_lock ) >>> { >>> @@ -121,7 +120,6 @@ osm_ucast_mgr_init( >>> p_mgr->p_subn = p_subn; >>> p_mgr->p_lock = p_lock; >>> p_mgr->p_req = p_req; >>> - p_mgr->p_report_buf = p_report_buf; >>> >>> OSM_LOG_EXIT( p_mgr->p_log ); >>> return( status ); >>> @@ -140,14 +138,13 @@ __osm_ucast_mgr_dump_path_distribution( >>> uint8_t num_ports; >>> uint32_t num_paths; >>> ib_net64_t remote_guid_ho; >>> - char line[OSM_REPORT_LINE_SIZE]; >>> >>> OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_path_distribution ); >>> >>> p_node = osm_switch_get_node_ptr( p_sw ); >>> num_ports = osm_switch_get_num_ports( p_sw ); >>> >>> - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_path_distribution: " >>> + fprintf( stdout, "__osm_ucast_mgr_dump_path_distribution: " >>> "Switch 0x%" PRIx64 "\n" >>> "Port : Path Count Through Port", >>> cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); >>> @@ -155,11 +152,10 @@ __osm_ucast_mgr_dump_path_distribution( >>> for( i = 0; i < num_ports; i++ ) >>> { >>> num_paths = osm_switch_path_count_get( p_sw , i ); >>> - sprintf( line, "\n %03u : %u", i, num_paths ); >>> - strcat( p_mgr->p_report_buf, line ); >>> + fprintf( stdout, "\n %03u : %u", i, num_paths ); >>> if( i == 0 ) >>> { >>> - strcat( p_mgr->p_report_buf, " (switch management port)" ); >>> + fprintf( stdout, " (switch management port)" ); >>> continue; >>> } >>> >>> @@ -172,26 +168,23 @@ __osm_ucast_mgr_dump_path_distribution( >>> switch( osm_node_get_remote_type( p_node, i ) ) >>> { >>> case IB_NODE_TYPE_SWITCH: >>> - strcat( p_mgr->p_report_buf, " (link to switch" ); >>> + fprintf( stdout, " (link to switch" ); >>> break; >>> case IB_NODE_TYPE_ROUTER: >>> - strcat( p_mgr->p_report_buf, " (link to router" ); >>> + fprintf( stdout, " (link to router" ); >>> break; >>> case IB_NODE_TYPE_CA: >>> - strcat( p_mgr->p_report_buf, " (link to CA" ); >>> + fprintf( stdout, " (link to CA" ); >>> break; >>> default: >>> - strcat( p_mgr->p_report_buf, " (link to unknown node type" ); >>> + fprintf( stdout, " (link to unknown node type" ); >>> break; >>> } >>> >>> - sprintf( line, " 0x%" PRIx64 ")", remote_guid_ho ); >>> - strcat( p_mgr->p_report_buf, line ); >>> + fprintf( stdout, " 0x%" PRIx64 ")", remote_guid_ho ); >>> } >>> >>> - strcat( p_mgr->p_report_buf, "\n" ); >>> - >>> - osm_log_raw( p_mgr->p_log, OSM_LOG_ROUTING, p_mgr->p_report_buf ); >>> + fprintf( stdout, "\n" ); >>> >>> OSM_LOG_EXIT( p_mgr->p_log ); >>> } >>> @@ -202,7 +195,7 @@ static void >>> __osm_ucast_mgr_dump_ucast_routes( >>> IN const osm_ucast_mgr_t* const p_mgr, >>> IN const osm_switch_t* const p_sw, >>> - IN FILE *p_fdbFile ) >>> + IN FILE *file ) >>> { >>> const osm_node_t* p_node; >>> uint8_t port_num; >>> @@ -211,8 +204,6 @@ __osm_ucast_mgr_dump_ucast_routes( >>> uint8_t best_port; >>> uint16_t max_lid_ho; >>> uint16_t lid_ho; >>> - char line[OSM_REPORT_LINE_SIZE]; >>> - uint32_t line_num = 0; >>> boolean_t ui_ucast_fdb_assign_func_defined; >>> >>> OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_dump_ucast_routes ); >>> @@ -221,16 +212,13 @@ __osm_ucast_mgr_dump_ucast_routes( >>> >>> max_lid_ho = osm_switch_get_max_lid_ho( p_sw ); >>> >>> + fprintf( file, "__osm_ucast_mgr_dump_ucast_routes: " >>> + "Switch 0x%016" PRIx64 "\n" >>> + "LID : Port : Hops : Optimal\n", >>> + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); >>> for( lid_ho = 1; lid_ho <= max_lid_ho; lid_ho++ ) >>> { >>> - if( line_num == 0 ) >>> - { >>> - sprintf( p_mgr->p_report_buf, "__osm_ucast_mgr_dump_ucast_routes: " >>> - "Switch 0x%016" PRIx64 "\n" >>> - "LID : Port : Hops : Optimal\n", >>> - cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); >>> - line_num++; >>> - } >>> + fprintf(file, "0x%04X : ", lid_ho); >>> >>> port_num = osm_switch_get_port_by_lid( p_sw, lid_ho ); >>> if( port_num == OSM_NO_PATH ) >>> @@ -241,9 +229,7 @@ __osm_ucast_mgr_dump_ucast_routes( >>> will reassign and compress the LID range. The >>> subnet should work fine either way. >>> */ >>> - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); >>> - strcat( p_mgr->p_report_buf, line ); >>> - line_num++; >>> + fprintf( file, "UNREACHABLE\n" ); >>> continue; >>> } >>> /* >>> @@ -255,19 +241,15 @@ __osm_ucast_mgr_dump_ucast_routes( >>> num_hops = osm_switch_get_hop_count( p_sw, lid_ho, port_num ); >>> if( num_hops == OSM_NO_PATH ) >>> { >>> - sprintf( line, "0x%04X : UNREACHABLE\n", lid_ho ); >>> - strcat( p_mgr->p_report_buf, line ); >>> - line_num++; >>> + fprintf( file, "UNREACHABLE\n" ); >>> continue; >>> } >>> >>> best_hops = osm_switch_get_least_hops( p_sw, lid_ho ); >>> - sprintf( line, "0x%04X : %03u : %02u : ", >>> - lid_ho, port_num, num_hops ); >>> - strcat( p_mgr->p_report_buf, line ); >>> + fprintf( file, "%03u : %02u : ", port_num, num_hops ); >>> >>> if( best_hops == num_hops ) >>> - strcat( p_mgr->p_report_buf, "yes" ); >>> + fprintf( file, "yes" ); >>> else >>> { >>> if (p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign) >>> @@ -282,23 +264,13 @@ __osm_ucast_mgr_dump_ucast_routes( >>> p_sw, lid_ho, TRUE, >>> NULL, NULL, NULL, NULL, /* No LMC Optimization */ >>> ui_ucast_fdb_assign_func_defined ); >>> - sprintf( line, "No %u hop path possible via port %u!", >>> + fprintf( file, "No %u hop path possible via port %u!", >>> best_hops, best_port ); >>> - strcat( p_mgr->p_report_buf, line ); >>> } >>> >>> - strcat( p_mgr->p_report_buf, "\n" ); >>> - >>> - if( ++line_num >= OSM_REPORT_BUF_THRESHOLD ) >>> - { >>> - fprintf(p_fdbFile,"%s",p_mgr->p_report_buf ); >>> - line_num = 0; >>> - } >>> + fprintf( file, "\n" ); >>> } >>> >>> - if( line_num != 0 ) >>> - fprintf(p_fdbFile,"%s\n",p_mgr->p_report_buf ); >>> - >>> OSM_LOG_EXIT( p_mgr->p_log ); >>> } >>> > From halr at voltaire.com Mon Oct 23 06:54:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Oct 2006 09:54:02 -0400 Subject: [openib-general] [PATCH] osm: port to WinIB stack - osm_log.c In-Reply-To: <453CC3D3.1080608@dev.mellanox.co.il> References: <453CC3D3.1080608@dev.mellanox.co.il> Message-ID: <1161611591.25985.301888.camel@hal.voltaire.com> On Mon, 2006-10-23 at 09:29, Yevgeny Kliteynik wrote: > Hi Hal > > Fixing something that got lost in the porting - > osm_log reports state for windows service. > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From erezz at voltaire.com Mon Oct 23 07:02:35 2006 From: erezz at voltaire.com (Erez Zilber) Date: Mon, 23 Oct 2006 16:02:35 +0200 Subject: [openib-general] [PATCH] IB/iser: start conn after enabling iSER In-Reply-To: References: Message-ID: <453CCB7B.3060504@voltaire.com> Roland Dreier wrote: > What do you want me to do with this patch? Is it a bug fix that needs > to go in 2.6.19, or should I queue it for 2.6.20? > > Shouldn't iser/iscsi patches be cc'ed to linux-scsi and Mike Christie > for review too? > > - R. > Sorry for not mentioning it - this is a bug fix that should go for 2.6.19. This patch was sent separately to open-iscsi list (http://groups-beta.google.com/group/open-iscsi/browse_thread/thread/5c16237f32c54234). For some reason, if I send you an e-mail and CC to openib & open-iscsi, the message doesn't reach the openib list. Therefore, I send it twice. -- ____________________________________________________________ Erez Zilber | 972-9-971-7689 Software Engineer, Storage Team Voltaire – _The Grid Backbone_ __ www.voltaire.com From kschoche at scl.ameslab.gov Mon Oct 23 07:17:00 2006 From: kschoche at scl.ameslab.gov (Kyle Schochenmaier) Date: Mon, 23 Oct 2006 09:17:00 -0500 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: References: Message-ID: <453CCEDC.5010908@scl.ameslab.gov> Hoang-Nam Nguyen wrote: > Hi Troy! > >> The netpipe code is available with mercurial by: >> hg clone http://source.scl.ameslab.gov/hg/netpipe3-pvfs-dev >> Once you have pvfs2-1.5.1 installed, you should be able to do 'make >> pvfs' in the netpipe3-pvfs-dev directory and build NPpvfs. >> The command line arguments I used to reproduce this were: >> ./NPpvfs -d $PVFS_FILE_PATH -l 32768 -u 268435456 -n 100 -o >> $NETPIPE_OUTPUT_FILE >> > Thanks for this. I've been struggling with setting up the systems > to recreate this problem. Please be patient. > Can you please send me the ouput of modinfo ib_ehca (or hcad_mod > in older version)? Also the firmware code level as plained in > previous email. How many memory have you assigned to the partition? > With those data I'd be able to have nearly the same envs like yours. > >> This is the dmesg log: >> PU0001 000e0091:ehca_hcall_7arg_7ret HCAD_ERROR opcode=160 >> ret=fffffffffffffff7 arg1=1000000003000004 arg2=5 arg3=4000f830000 >> arg4=10000 arg5=e0000000000000 arg6=eb6b6920 arg7=0 out1=0 out2=0 >> out3=0 out4=0 out5=0 out6=0 out7=0 >> PU0001 00090454:ehca_reg_mr HCAD_ERROR hipz_alloc_mr failed, >> h_ret=fffffffffffffff7 hca_hndl=1000000003000004 >> PU0001 00090478:ehca_reg_mr <<< ret=ffffffea shca=c0000000e796b000 >> e_mr=c0000000ce865e80 iova_start=000004000f830000 size=10000 acl=7 >> e_pd=c0000000eb6b6920 pginfo=c0000000dfcb3a70 num_pages=10 num_4k=10 >> PU0001 00090176:ehca_reg_user_mr <<< rc=ffffffffffffffea >> pd=c0000000eb6b6920 region=c0000000ce861dd0 mr_access_flags=7 >> udata=c0000000dfcb3ba0 >> > I got this already from you and Kyle. I meant the full log with > debug traces enabled: modprobe ib_ehca debug_level=1 or for older > versions modprobe hcad_mod debug_level=9999999999999999999999. If > possible, try to get it. Anyway I'll do that with my test env. > Thanks! > Nam > > > I believe we have 8GB allocated on each this box(all memory and cpus allocated to one partition ), and we're running firmware version SF240_233. p5l5:~# modinfo hcad_mod filename: /lib/modules/2.6.17/kernel/drivers/infiniband/hw/ehca/hcad_mod.ko version: SVNEHCA_0009 description: IBM eServer HCA InfiniBand Device Driver author: Christoph Raisch license: Dual BSD/GPL srcversion: 2B35F7963CEB9E6067F3F92 depends: ib_core vermagic: 2.6.17 SMP mod_unload gcc-4.0 parm: open_aqp1:AQP1 on startup (0: no (default), 1: yes) (int) parm: debug_level:debug level (0: node, 6: only errors (default), 9: all) (int) parm: hw_level:hardware level (0: autosensing (default), 1: v. 0.20, 2: v. 0.21) (int) parm: nr_ports:number of connected ports (default: 2) (int) parm: use_hp_mr:high performance MRs (0: no (default), 1: yes) (int) parm: port_act_time:time to wait for port activation (default: 30 sec) (int) parm: poll_all_eqs:polls all event queues periodically (0: no, 1: yes (default)) (int) parm: static_rate:set permanent static rate (default: disabled) (int) And, setting the debug_level flag definitely caused the server to not respond... I rebooted and tried it again, same thing, setting the debug_level flag causes the server to crash. (I can still login, but cannot execute anything, e.g. 'ls', it seems all the cpu's are spinning) p5l5:~# modprobe hcad_mod nr_ports=1 debug_level=99999999 console output after above command hangs server: PU0003 000e0252:hipz_h_register_rpage >>> adapter_handle=1000000203000004 pagesize=0 queue_type=0 resource_handle=7000000100018600 logical_address_of_page=e6741000 count=200 PU0003 000e0078:ehca_hcall_7arg_7ret >>> opcode=1ac arg1=1000000203000004 arg2=0 arg3=7000000100018600 arg4=e6741000 arg5=200 arg6=0 arg7=0 PU0003 000e0096:ehca_hcall_7arg_7ret <<< opcode=1ac ret=f out1=50 out2=50 out3=50 out4=50 out5=50 out6=50 out7=50 PU0003 000e0263:hipz_h_register_rpage <<< ret=f PU0003 000e04ad:hipz_h_register_rpage_mr <<< ret=f PU0003 0009076c:ehca_set_pagebuf >>> pginfo=c0000000eb7b75e0 type=1 num_pages=1d4000 num_4k=1d4000 next_buf=0 next_4k=30600 number=200 kpage=c0000000e6741000 page_cnt=30600 page_4k_cnt=30600 next_listelem=0 region=0000000000000000 next_chunk=0000000000000000 next_nmap=0 PU0003 00090807:ehca_set_pagebuf <<< ret=0 e_mr=c0000000e1ac2e80 pginfo=c0000000eb7b75e0 type=1 num_pages=1d4000 num_4k=1d4000 next_buf=0 next_4k=30800 number=200 kpage=c0000000e6742000 page_cnt=30800 page_4k_cnt=30800 i=200 next_listelem=0 region=0000000000000000 next_chunk=0000000000000000 next_nmap=0 PU0003 000e049e:hipz_h_register_rpage_mr >>> adapter_handle=1000000203000004 mr=c0000000e1ac2e80 mr_handle=7000000100018600 pagesize=0 queue_type=0 logical_address_of_page=e6741000 count=200 PU0003 000e0252:hipz_h_register_rpage >>> adapter_handle=1000000203000004 pagesize=0 queue_type=0 resource_handle=7000000100018600 logical_address_of_page=e6741000 count=200 PU0003 000e0078:ehca_hcall_7arg_7ret >>> opcode=1ac arg1=1000000203000004 arg2=0 arg3=7000000100018600 arg4=e6741000 arg5=200 arg6=0 arg7=0 PU0003 000e0096:ehca_hcall_7arg_7ret <<< opcode=1ac ret=f out1=50 out2=50 out3=50 out4=50 out5=50 out6=50 out7=50 PU0003 000e0263:hipz_h_register_rpage <<< ret=f -- Kyle Schochenmaier kschoche at scl.ameslab.gov Research Assistant, Dr. Brett Bode AmesLab - US Dept.Energy Scalable Computing Laboratory From halr at voltaire.com Mon Oct 23 07:16:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Oct 2006 10:16:34 -0400 Subject: [openib-general] A question about sa_query In-Reply-To: References: <453B8B0B.2030402@dev.mellanox.co.il> Message-ID: <1161612991.22403.277.camel@hal.voltaire.com> On Mon, 2006-10-23 at 09:46, Roland Dreier wrote: > > There is something that bothers me in sa_query. > > According to table 115 in the IB-SPEC when the status in the MAD hdr > > is 1,2 or 3 it shouldn't be considered to as an error. (1 means busy, > > 2 means redirection, and 3 means both). > > The function "recv_handler" in core/sa_query.c sets the status of the > > sa_query before calling the callback function. > > It sets the status according to the status returned in the mad > > header. (mad_recv_wc->recv_buf.mad->mad_hdr.status) > > If the status in the mad_hdr is different from 0 it sets the return > > status to -EINVAL. > > This mean that the higher layers (e.g., SRP) do not know what was the > > exact status and therefore treat status 1 (busy) as an error. > > Am I missing something? > > No, you are right. > > I know the spec says a "busy" status is not an error, but I've not > seen anything actually return that, There may be SAs out there that utilize this feature. > and I'm not sure what a consumer can usefully do with it. A different retry strategy ? > Also, we haven't tried handling SA redirection at all. Your issue > would be one thing we would have to fix if we did handle it. > > So you are correct, and I wouldn't object to handling all this more > intelligently, but I would want to see some consumer that cared, too. Not only cared but handled it differently than other errors. -- Hal > - R. From cppbala at yahoo.com Mon Oct 23 07:22:15 2006 From: cppbala at yahoo.com (Bala) Date: Mon, 23 Oct 2006 07:22:15 -0700 (PDT) Subject: [openib-general] Is there a way to recover Mellanox HCA card from wrong Firmware In-Reply-To: <20061022060301.GA1980@mellanox.co.il> Message-ID: <20061023142215.87315.qmail@web35115.mail.mud.yahoo.com> thanks Michael, > Most cards have a single jumper to disable the > flash. using that jumper, I was able to flash the card with correct firmware. thanks again, -bala- > If you set it, the card will boot in flash recovery > mode > which will make it possible for you to re-flash with > correct firmware. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From dotanb at dev.mellanox.co.il Mon Oct 23 07:34:24 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Mon, 23 Oct 2006 16:34:24 +0200 Subject: [openib-general] how to integrate the man pages of the user level verbs into the openib svn? Message-ID: <453CD2F0.7020804@dev.mellanox.co.il> Hi Roland. I started to work on the man pages of the user level verbs few weeks ago and i would like to start integrating those pages to the openib svn (I finished ~40% of the verbs) . How can i do it: * Do you want me to commit the files to libibverbs/man folder? * Do you want me to send you the files and you will commit them? thanks Dotan From rdreier at cisco.com Mon Oct 23 07:49:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 07:49:10 -0700 Subject: [openib-general] how to integrate the man pages of the user level verbs into the openib svn? In-Reply-To: <453CD2F0.7020804@dev.mellanox.co.il> (Dotan Barak's message of "Mon, 23 Oct 2006 16:34:24 +0200") References: <453CD2F0.7020804@dev.mellanox.co.il> Message-ID: > I started to work on the man pages of the user level verbs few weeks > ago and > i would like to start integrating those pages to the openib svn (I > finished ~40% of the verbs) . > > How can i do it: > * Do you want me to commit the files to libibverbs/man folder? > * Do you want me to send you the files and you will commit them? I will want to review them and commit them if they look OK. At a quick glance using POD to write them seems like a problem, since it adds yet another build dependency for libibverbs, and ends up generating very ugly man sources. - R. From tziporet at dev.mellanox.co.il Mon Oct 23 08:09:07 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 23 Oct 2006 17:09:07 +0200 Subject: [openib-general] [openfabrics-ewg] Tomorrow's teleconference In-Reply-To: References: <36364CA4-2B99-402A-ACB4-4A58BCCE32C1@cisco.com> <453B9990.7090704@dev.mellanox.co.il> <453C6DD9.4060505@dev.mellanox.co.il> Message-ID: <453CDB13.6050301@dev.mellanox.co.il> Roland Dreier wrote: > Tziporet> Well I think the decision was just on the kernel parts > Tziporet> (please fix me if I am wrong). So the question how do > Tziporet> we drive this - mainly we need Sean and Hal to move to > Tziporet> git since they are the maintainers of several kernel > Tziporet> modules. > > Actually I don't think Sean and Hal have to do anything in particular. > I'm happy to maintain branches or even complete trees for them based > on sending emails via email, and Sean seems to be using git already > anyway. Hal has not done any kernel work lately (the most recent > change in the kernel I see from him is from April) so I don't think > that is a big issue. > > - R. > that's OK. I understand that this means we no longer look at the kernel code in svn? If this is true we need to know where your git tree so we can generate patches against it. Tziporet From rdreier at cisco.com Mon Oct 23 08:27:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 08:27:25 -0700 Subject: [openib-general] [openfabrics-ewg] Tomorrow's teleconference In-Reply-To: <453CDB13.6050301@dev.mellanox.co.il> (Tziporet Koren's message of "Mon, 23 Oct 2006 17:09:07 +0200") References: <36364CA4-2B99-402A-ACB4-4A58BCCE32C1@cisco.com> <453B9990.7090704@dev.mellanox.co.il> <453C6DD9.4060505@dev.mellanox.co.il> <453CDB13.6050301@dev.mellanox.co.il> Message-ID: > I understand that this means we no longer look at the kernel code in svn? > If this is true we need to know where your git tree so we can generate > patches against it. I thought that was already the procedure? Anyway my git tree is git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git but I thought that was pretty well-known. - R. From sweitzen at cisco.com Mon Oct 23 08:35:20 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 23 Oct 2006 08:35:20 -0700 Subject: [openib-general] IPoIB Question Message-ID: IPoIB performance will vary quite a bit depending on what motherboard, CPU speed, and HCA type you have. What are the specs on the systems you are using? Netperf (www.netperf.org) is a good tool to measure IPoIB performance. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Hubbell, Sean C Contractor/Decibel Sent: Monday, October 23, 2006 5:53 AM To: openib-general at openib.org Cc: Sean Hubbell Subject: [openib-general] IPoIB Question Hello, I currently have several applications that uses a legacy IPv4 protocol and I use IPoIB to utilize my infiniband network which works great. I have completed some timing and throughput analysis and noticed that I do not get very much more if I use an infiniband network interface than using my GigE network interface. My question is, am I using IPoIB correctly or are these the typical numbers that everyone is seeing? Is there a standard application that I may use to test my current configuration? Thanks in advance, Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Oct 23 08:42:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 23 Oct 2006 17:42:41 +0200 Subject: [openib-general] IPoIB Question In-Reply-To: References: Message-ID: <20061023154241.GE7195@mellanox.co.il> Quoting r. Scott Weitzenkamp (sweitzen) : > Netperf (www.netperf.org) is a good tool to measure IPoIB performance. Of special note is the -T flag which often lets you get more consistent results by pinning the test to a single CPU. Another useful tool is iperf, which has a -P option for running multiple socket tests in parallel. In TCP, multi-socket performance often exceeds that of a single socket. -- MST From sashak at voltaire.com Mon Oct 23 08:49:42 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 23 Oct 2006 17:49:42 +0200 Subject: [openib-general] [PATCH] opensm: remove obsolete p_report_buf In-Reply-To: <453CC8EC.6000605@dev.mellanox.co.il> References: <20061020004727.GH24676@sashak.voltaire.com> <453C6919.7010008@dev.mellanox.co.il> <20061023113305.GB18837@sashak.voltaire.com> <453CC8EC.6000605@dev.mellanox.co.il> Message-ID: <20061023154942.GB19183@sashak.voltaire.com> On 15:51 Mon 23 Oct , Yevgeny Kliteynik wrote: > > Sasha Khapyorsky wrote: > > On 09:02 Mon 23 Oct , Yevgeny Kliteynik wrote: > >> Hi Sasha. > >> > >> The removal of the sm->p_report_buf is a good idea. > >> However, I do have one comment: > >> In several cases this buffer was printed using the osm_log_raw() > >> function, and you replaced this with a plain fprintf(stdout,...). > >> Right now the osm_log_raw function just prints to stdout too, but > >> this doesn't always have to be the case. Besides, osm_log_raw > >> provides verbosity level checking, which is lost when you replace > >> it with printf. > > > > Both functions calls were and still be conditonalized by verbosity > > level, so it is not lost. > > Right, there is a check in the beginning of the function, mea culpa. > Anyway, this wasn't the main point. I grep'ed the osm code, and the > only cases where there is an explicit printing to stdout is when log > is not initialized yet, or in console mode, and I think that this is > a better way to manage logging (even though printing directly to stdout > is more efficient). I'm not following. Those functions _are_ printing to stdout. If in a future we will want other file we will replace 'stdout' by other stream (and this is why fprintf() and not printf() is used in the patch). Sasha From mshefty at ichips.intel.com Mon Oct 23 08:59:59 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 23 Oct 2006 08:59:59 -0700 Subject: [openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port In-Reply-To: References: <000301c6ecbf$9b616d80$c0d4180a@amr.corp.intel.com> <45394C3E.4020404@ichips.intel.com> Message-ID: <453CE6FF.7040409@ichips.intel.com> Roland Dreier wrote: > I'm not sure about this -- does this lead to duplication of code like > keeping track of outstanding requests? Or are you exporting some > really low-level interface from ib_sa? This did not lead to code duplication. As you guessed, I exported the send_mad() interface. > Maybe the best thing to do is put the higher level multicast handling > into the ib_sa module (and not export the current lower level > multicast APIs any more). EXPORT_SYMBOL isn't totally free and if > we're exporting really internal stuff for one other user (I'm guessing > that you might be building on top of the sa_query.c::send_mad() level > stuff), then we might as well just combine the multicast and SA > modules into a single .ko (even if there are multiple .c files). This makes sense. > Maybe ib_notice should just go into ib_sa as well. I think this would actually end up being easier / more efficient. Ib_notice needs to register for unsolicited MADs, so I was going to have it register with the ib_mad module directly to receive those. But I didn't want to duplicate tracking the SA's address handle. -Sean From shubbell at dbresearch.net Mon Oct 23 08:55:41 2006 From: shubbell at dbresearch.net (Sean Hubbell) Date: Mon, 23 Oct 2006 10:55:41 -0500 Subject: [openib-general] IPoIB Question In-Reply-To: References: Message-ID: <453CE5FD.4090303@dbresearch.net> We currently have a non-homogeneous cluster so that seems that would possible explain a few of the differences that I have seen on some of my tests. I will look at netperf.org and see what they have to offer. On another note, is there plans to have IPoIB support the full throughput that infiniband 4x or 12x has? Specifically, can I keep my legacy apps and just upgrade the network to take advantage of the bandwidth? Sean Scott Weitzenkamp (sweitzen) wrote: > IPoIB performance will vary quite a bit depending on what motherboard, > CPU speed, and HCA type you have. What are the specs on the systems > you are using? > > Netperf (www.netperf.org ) is a good tool to > measure IPoIB performance. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ------------------------------------------------------------------------ > *From:* openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] *On Behalf Of *Hubbell, > Sean C Contractor/Decibel > *Sent:* Monday, October 23, 2006 5:53 AM > *To:* openib-general at openib.org > *Cc:* Sean Hubbell > *Subject:* [openib-general] IPoIB Question > > Hello, > > I currently have several applications that uses a legacy IPv4 > protocol and I use IPoIB to utilize my infiniband network which > works great. I have completed some timing and throughput analysis > and noticed that I do not get very much more if I use an > infiniband network interface than using my GigE network interface. > My question is, am I using IPoIB correctly or are these the > typical numbers that everyone is seeing? Is there a standard > application that I may use to test my current configuration? > > Thanks in advance, > > Sean > From sweitzen at cisco.com Mon Oct 23 09:13:22 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 23 Oct 2006 09:13:22 -0700 Subject: [openib-general] IPoIB Question Message-ID: If you are using TCP, you can use SDP transparently via libsdp to get improved latency and throughput. Scott > -----Original Message----- > From: Sean Hubbell [mailto:shubbell at dbresearch.net] > Sent: Monday, October 23, 2006 8:56 AM > To: Scott Weitzenkamp (sweitzen) > Cc: openib-general at openib.org > Subject: Re: [openib-general] IPoIB Question > > We currently have a non-homogeneous cluster so that seems that would > possible explain a few of the differences that I have seen on > some of my > tests. I will look at netperf.org and see what they have to offer. > > On another note, is there plans to have IPoIB support the full > throughput that infiniband 4x or 12x has? Specifically, can I keep my > legacy apps and just upgrade the network to take advantage of > the bandwidth? > > Sean > > Scott Weitzenkamp (sweitzen) wrote: > > IPoIB performance will vary quite a bit depending on what > motherboard, > > CPU speed, and HCA type you have. What are the specs on > the systems > > you are using? > > > > Netperf (www.netperf.org ) is a > good tool to > > measure IPoIB performance. > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > > > -------------------------------------------------------------- > ---------- > > *From:* openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] *On Behalf > Of *Hubbell, > > Sean C Contractor/Decibel > > *Sent:* Monday, October 23, 2006 5:53 AM > > *To:* openib-general at openib.org > > *Cc:* Sean Hubbell > > *Subject:* [openib-general] IPoIB Question > > > > Hello, > > > > I currently have several applications that uses a legacy IPv4 > > protocol and I use IPoIB to utilize my infiniband network which > > works great. I have completed some timing and > throughput analysis > > and noticed that I do not get very much more if I use an > > infiniband network interface than using my GigE network > interface. > > My question is, am I using IPoIB correctly or are these the > > typical numbers that everyone is seeing? Is there a standard > > application that I may use to test my current configuration? > > > > Thanks in advance, > > > > Sean > > > From swise at opengridcomputing.com Mon Oct 23 09:20:52 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 23 Oct 2006 11:20:52 -0500 Subject: [openib-general] 2.6.20 window Message-ID: <1161620452.9729.39.camel@stevo-desktop> Hey Roland, What's the window like for getting new rdma devices into the 2.6.20 release? I'm wondering how much time is left for lots of new code to get considered for 2.6.20... Thanks, Steve. From shubbell at dbresearch.net Mon Oct 23 09:19:17 2006 From: shubbell at dbresearch.net (Sean Hubbell) Date: Mon, 23 Oct 2006 11:19:17 -0500 Subject: [openib-general] IPoIB Question In-Reply-To: References: Message-ID: <453CEB85.7090102@dbresearch.net> Scott, Thanks for the reply again. The third party api that we use leverages a combination of UDP and TCP socket conntections for speed. Is there something for UCP as well? Sean Scott Weitzenkamp (sweitzen) wrote: > If you are using TCP, you can use SDP transparently via libsdp to get > improved latency and throughput. > > Scott > > >> -----Original Message----- >> From: Sean Hubbell [mailto:shubbell at dbresearch.net] >> Sent: Monday, October 23, 2006 8:56 AM >> To: Scott Weitzenkamp (sweitzen) >> Cc: openib-general at openib.org >> Subject: Re: [openib-general] IPoIB Question >> >> We currently have a non-homogeneous cluster so that seems that would >> possible explain a few of the differences that I have seen on >> some of my >> tests. I will look at netperf.org and see what they have to offer. >> >> On another note, is there plans to have IPoIB support the full >> throughput that infiniband 4x or 12x has? Specifically, can I keep my >> legacy apps and just upgrade the network to take advantage of >> the bandwidth? >> >> Sean >> >> Scott Weitzenkamp (sweitzen) wrote: >> >>> IPoIB performance will vary quite a bit depending on what >>> >> motherboard, >> >>> CPU speed, and HCA type you have. What are the specs on >>> >> the systems >> >>> you are using? >>> >>> Netperf (www.netperf.org ) is a >>> >> good tool to >> >>> measure IPoIB performance. >>> >>> Scott Weitzenkamp >>> SQA and Release Manager >>> Server Virtualization Business Unit >>> Cisco Systems >>> >>> >>> >>> >> -------------------------------------------------------------- >> ---------- >> >>> *From:* openib-general-bounces at openib.org >>> [mailto:openib-general-bounces at openib.org] *On Behalf >>> >> Of *Hubbell, >> >>> Sean C Contractor/Decibel >>> *Sent:* Monday, October 23, 2006 5:53 AM >>> *To:* openib-general at openib.org >>> *Cc:* Sean Hubbell >>> *Subject:* [openib-general] IPoIB Question >>> >>> Hello, >>> >>> I currently have several applications that uses a legacy IPv4 >>> protocol and I use IPoIB to utilize my infiniband network which >>> works great. I have completed some timing and >>> >> throughput analysis >> >>> and noticed that I do not get very much more if I use an >>> infiniband network interface than using my GigE network >>> >> interface. >> >>> My question is, am I using IPoIB correctly or are these the >>> typical numbers that everyone is seeing? Is there a standard >>> application that I may use to test my current configuration? >>> >>> Thanks in advance, >>> >>> Sean >>> >>> > > > From mst at mellanox.co.il Mon Oct 23 09:38:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 23 Oct 2006 18:38:44 +0200 Subject: [openib-general] IPoIB Question In-Reply-To: <453CEB85.7090102@dbresearch.net> References: <453CEB85.7090102@dbresearch.net> Message-ID: <20061023163844.GB9248@mellanox.co.il> Quoting r. Sean Hubbell : > Subject: Re: IPoIB Question > > Scott, > > Thanks for the reply again. The third party api that we use leverages a > combination of UDP and TCP socket conntections for speed. Is there > something for UCP as well? iperf supports UDP as well. Again, check out the -P flag. -- MST From chetm at us.ibm.com Mon Oct 23 09:56:06 2006 From: chetm at us.ibm.com (Chet Mehta) Date: Mon, 23 Oct 2006 11:56:06 -0500 Subject: [openib-general] GPL only files in OFA repository Message-ID: A cursory scan of OFA code respository shows d the following files to contact a GPL only license. Since OFA Bylaws require code contributions to include BSD & GPL licenses, can the code owners/contributors of these files either update the files to include the appropriate licenses or provide details on why the files cannot be licensed under BSD? ./include/linux/mutex-backport.h ./include/linux/.svn/text-base/mutex-backport.h.svn-base ./mpi/mvapich-gen2/examples/perftest/config/confdb/aclangf90.m4 ./mpi/mvapich-gen2/examples/perftest/config/confdb/.svn/text-base/fortran90.m4.svn-base ./mpi/mvapich-gen2/examples/perftest/config/confdb/.svn/text-base/aclangf90.m4.svn-base ./mpi/mvapich-gen2/examples/perftest/config/confdb/fortran90.m4 ./mpi/mvapich-gen2/doc/.svn/text-base/mpichman-chshmem.pdf.svn-base ./mpi/mvapich-gen2/doc/mpichman-chshmem.pdf ./mpi/mvapich2-gen2/confdb/aclangf90.m4 ./mpi/mvapich2-gen2/confdb/.svn/text-base/fortran90.m4.svn-base ./mpi/mvapich2-gen2/confdb/.svn/text-base/aclangf90.m4.svn-base ./mpi/mvapich2-gen2/confdb/fortran90.m4 Thank you. :Chet. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon Oct 23 10:11:03 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 23 Oct 2006 10:11:03 -0700 Subject: [openib-general] IPoIB Question Message-ID: Nothing today in OF to accelerate UDP sockets. Scott > Thanks for the reply again. The third party api that we use > leverages a > combination of UDP and TCP socket conntections for speed. Is there > something for UCP as well? > > Sean > > Scott Weitzenkamp (sweitzen) wrote: > > If you are using TCP, you can use SDP transparently via > libsdp to get > > improved latency and throughput. > > > > Scott From shubbell at dbresearch.net Mon Oct 23 09:59:49 2006 From: shubbell at dbresearch.net (Sean Hubbell) Date: Mon, 23 Oct 2006 11:59:49 -0500 Subject: [openib-general] IPoIB Question In-Reply-To: <20061023163844.GB9248@mellanox.co.il> References: <453CEB85.7090102@dbresearch.net> <20061023163844.GB9248@mellanox.co.il> Message-ID: <453CF505.8050906@dbresearch.net> Thanks Michael I looked at iperf and that looks like a very nice tool. I will be using that when I evaluate and check performance of my applications. I am also interested in getting more bandwidth out of my applications leveraging a current or planned capability for IPoIB. This way, I will not have to modify my source code and I can just actually change out the interfaces that my applications send and receive on. So, I am looking at libsdp for the TCP funcationality and wanted to know if libsdp supports UDP as well or is there another library that I can use to maximize the bandwidth when transmitting and sending over infiniband? Sean Michael S. Tsirkin wrote: > Quoting r. Sean Hubbell : > >> Subject: Re: IPoIB Question >> >> Scott, >> >> Thanks for the reply again. The third party api that we use leverages a >> combination of UDP and TCP socket conntections for speed. Is there >> something for UCP as well? >> > > iperf supports UDP as well. Again, check out the -P flag. > > From rdreier at cisco.com Mon Oct 23 10:17:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 10:17:29 -0700 Subject: [openib-general] 2.6.20 window In-Reply-To: <1161620452.9729.39.camel@stevo-desktop> (Steve Wise's message of "Mon, 23 Oct 2006 11:20:52 -0500") References: <1161620452.9729.39.camel@stevo-desktop> Message-ID: We're still at 2.6.19-rc2 so you probably have a month or two before 2.6.19 comes out and the 2.6.20 window opens. However it's probably better to start reviewing sooner rather than later even if the code isn't done yet. (If you don't want to release it, then you can send it to me privately for comments too) BTW I _will_ get your changes to libibverbs merged in soon -- I've been spending my libibverbs time debugging other stuff lately. From mst at mellanox.co.il Mon Oct 23 10:19:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 23 Oct 2006 19:19:44 +0200 Subject: [openib-general] IPoIB Question In-Reply-To: <453CF505.8050906@dbresearch.net> References: <453CF505.8050906@dbresearch.net> Message-ID: <20061023171944.GH4568@mellanox.co.il> Quoting r. Sean Hubbell : > I am looking at libsdp for the TCP funcationality and wanted to know if > libsdp supports UDP as well AFAIK, SDP can only emulate TCP sockets. -- MST From swise at opengridcomputing.com Mon Oct 23 10:19:08 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 23 Oct 2006 12:19:08 -0500 Subject: [openib-general] 2.6.20 window In-Reply-To: References: <1161620452.9729.39.camel@stevo-desktop> Message-ID: <1161623948.9729.52.camel@stevo-desktop> On Mon, 2006-10-23 at 10:17 -0700, Roland Dreier wrote: > We're still at 2.6.19-rc2 so you probably have a month or two before > 2.6.19 comes out and the 2.6.20 window opens. However it's probably > better to start reviewing sooner rather than later even if the code > isn't done yet. (If you don't want to release it, then you can send > it to me privately for comments too) > > BTW I _will_ get your changes to libibverbs merged in soon -- I've > been spending my libibverbs time debugging other stuff lately. Ok thanks. Stevo. From krause at cup.hp.com Mon Oct 23 11:04:40 2006 From: krause at cup.hp.com (Michael Krause) Date: Mon, 23 Oct 2006 11:04:40 -0700 Subject: [openib-general] IPoIB Question In-Reply-To: <20061023171944.GH4568@mellanox.co.il> References: <453CF505.8050906@dbresearch.net> <20061023171944.GH4568@mellanox.co.il> Message-ID: <6.2.0.14.2.20061023110117.02a38b98@esmail.cup.hp.com> At 10:19 AM 10/23/2006, Michael S. Tsirkin wrote: >Quoting r. Sean Hubbell : > > I am looking at libsdp for the TCP funcationality and wanted to know if > > libsdp supports UDP as well > >AFAIK, SDP can only emulate TCP sockets. SDP is defined to work with AF_INET applications. If using a shared library approach / pre-load, one can transparently enable any AF_INET application to utilize SDP without a recompile, etc. The SDP Port Mapper specification for iWARP / service id for IB enable the connection management or whatever service it is implemented within to application-transparent discover the real target listen port and establish a SDP session nominally during connection establishment. Implementations may vary in the robustness or policies used to determine what to off-load, number of off-load sessions, etc. - in other words, a lot of opportunity and flexibility is provided to use SDP. Note: WinSocks Direct on Windows provides an equivalent service though uses a proprietary protocol. Vista will have SDP as defined in the specifications. There are currently no plans to develop an equivalent for datagram applications. Any datagram application (user or kernel) can already access the hardware directly and given RDMA is not defined for datagram, it was felt such a specification would provide minimal value. Mike From mshefty at ichips.intel.com Mon Oct 23 11:37:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 23 Oct 2006 11:37:18 -0700 Subject: [openib-general] [PATCH] [cma] qp_access_flags was changed to zero In-Reply-To: <1161591004.22381.1.camel@mtls05.yok.mtl.com> References: <1161591004.22381.1.camel@mtls05.yok.mtl.com> Message-ID: <453D0BDE.80200@ichips.intel.com> Dotan Barak wrote: > qp_attr.qp_state = IB_QPS_INIT; > - qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; > + qp_attr.qp_access_flags = 0; > qp_attr.port_num = id_priv->id.port_num; > return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS | > IB_QP_PKEY_INDEX | IB_QP_PORT); > Does this cause a problem that you run into? (trying to gauge the severity here - I'm guessing that this just sets a bit that's ignored by the lower level driver) I think there's a related issue in the ib_cm, which also sets IB_ACCESS_LOCAL_WRITE. The above code is executed when the user calls rdma_create_qp(). The ib_cm routine is executed when connecting the QP, which will overwrite these settings. I think we'll want to change both places to get the desired result. - Sean From swise at opengridcomputing.com Mon Oct 23 12:02:13 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 23 Oct 2006 14:02:13 -0500 Subject: [openib-general] ucma into for-2.6.20 branch Message-ID: <1161630133.15506.14.camel@stevo-desktop> Hey guys, Just wondering when you think the UCMA code will be merged into Roland's 2.6.20 branch. I'd like to begin review/submission of the chelsio RDMA driver, but I'd like the UCMA code in too so we can test on the git tree directly... Stevo. From shubbell at qube3.dbresearch.net Mon Oct 23 11:55:00 2006 From: shubbell at qube3.dbresearch.net (Sean Hubbell) Date: Mon, 23 Oct 2006 13:55:00 -0500 Subject: [openib-general] IPoIB Question Message-ID: <200610231855.k9NIt0c01038@qube3.dbresearch.net> Perfect, I'll check with my vendor to see if this is possible. If so, this rocks! Thanks! Sean ---------- Original message ---------- Date: Mon, 23 Oct 2006 11:04:40 -0700 From: Michael Krause Reply-To: Michael Krause To: Michael S. Tsirkin , Sean Hubbell CC: openib-general at openib.org Subject: Re: [openib-general] IPoIB Question At 10:19 AM 10/23/2006, Michael S. Tsirkin wrote: >Quoting r. Sean Hubbell : > > I am looking at libsdp for the TCP funcationality and wanted to know if > > libsdp supports UDP as well > >AFAIK, SDP can only emulate TCP sockets. SDP is defined to work with AF_INET applications. If using a shared library approach / pre-load, one can transparently enable any AF_INET application to utilize SDP without a recompile, etc. The SDP Port Mapper specification for iWARP / service id for IB enable the connection management or whatever service it is implemented within to application-transparent discover the real target listen port and establish a SDP session nominally during connection establishment. Implementations may vary in the robustness or policies used to determine what to off-load, number of off-load sessions, etc. - in other words, a lot of opportunity and flexibility is provided to use SDP. Note: WinSocks Direct on Windows provides an equivalent service though uses a proprietary protocol. Vista will have SDP as defined in the specifications. There are currently no plans to develop an equivalent for datagram applications. Any datagram application (user or kernel) can already access the hardware directly and given RDMA is not defined for datagram, it was felt such a specification would provide minimal value. Mike From twbowman at gmail.com Mon Oct 23 12:48:03 2006 From: twbowman at gmail.com (Todd Bowman) Date: Mon, 23 Oct 2006 13:48:03 -0600 Subject: [openib-general] IPoIB odd loopback packet from arp Message-ID: Using the OFED 1.0 and OFED 1.1 stack I have notice some rcvswrelay errors. I have tracked it down to the arp request. I can reproduce the problem with the following steps: ( I have used both 2.6.14.14 and 2.6.18.1 kernels) ib109> arp -d ib110 ib109> ping ib110 -c 2 # ib_ipoib module debug 13:15:46 ib109 kernel: ib0: sending packet, length=60 address=f6187200 qpn=0xffffff 13:15:46 ib109 kernel: ib0: called: id 34, op 0, status: 0 13:15:46 ib109 kernel: ib0: send complete, wrid 34 13:15:46 ib109 kernel: ib0: called: id -2147483623, op 128, status: 0 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x0369 13:15:46 ib109 kernel: ib0: dropping loopback packet 13:15:46 ib109 kernel: ib0: called: id -2147483622, op 128, status: 0 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x016d 13:15:46 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 qpn=0x000404 13:15:46 ib109 kernel: ib0: called: id 35, op 0, status: 0 13:15:46 ib109 kernel: ib0: send complete, wrid 35 13:15:46 ib109 kernel: ib0: called: id -2147483621, op 128, status: 0 13:15:46 ib109 kernel: ib0: received 128 bytes, SLID 0x016d 13:15:47 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 qpn=0x000404 13:15:47 ib109 kernel: ib0: called: id 36, op 0, status: 0 13:15:47 ib109 kernel: ib0: send complete, wrid 36 13:15:47 ib109 kernel: ib0: called: id -2147483620, op 128, status: 0 13:15:47 ib109 kernel: ib0: received 128 bytes, SLID 0x016d 13:15:51 ib109 kernel: ib0: called: id -2147483619, op 128, status: 0 13:15:51 ib109 kernel: ib0: received 100 bytes, SLID 0x016d 13:15:51 ib109 kernel: ib0: sending packet, length=60 address=f6e57520 qpn=0x000404 13:15:51 ib109 kernel: ib0: called: id 37, op 0, status: 0 13:15:51 ib109 kernel: ib0: send complete, wrid 37 # tcpdump -i ib0 13:15:46.977578 arp who-has ib110 tell ib109 hardware #32 13:15:46.977682 arp reply ib110 is-at 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:11:59 hardware #32 13:15:46.977710 IP ib109 > ib110: icmp 64: echo request seq 0 13:15:46.977790 IP ib110 > ib109: icmp 64: echo reply seq 0 13:15:47.977772 IP ib109 > ib110: icmp 64: echo request seq 1 13:15:47.977892 IP ib110 > ib109: icmp 64: echo reply seq 1 13:15:51.977076 arp who-has ib109 tell ib110 hardware #32 13:15:51.977094 arp reply ib109 is-at 00:02:00:14:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:3b:31 hardware #32 # error dump rcvswrelayerrors:1 MT47396 Infiniscale-III 0x2c9010b022090[1] <--------> ib109 HCA-1 0x2c90200003b30[1] 1) The ping is successful and the arp table is populated so Is this really a problem or a false positive? 2) The second arp does not generate an error (the error dump reports all new errors in switches). Why? Any ideas? Thanks in advance. Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Mon Oct 23 13:42:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 23 Oct 2006 13:42:33 -0700 Subject: [openib-general] ucma into for-2.6.20 branch In-Reply-To: <1161630133.15506.14.camel@stevo-desktop> References: <1161630133.15506.14.camel@stevo-desktop> Message-ID: <453D2939.5040201@ichips.intel.com> Steve Wise wrote: > Just wondering when you think the UCMA code will be merged into Roland's > 2.6.20 branch. I'd like to begin review/submission of the chelsio RDMA > driver, but I'd like the UCMA code in too so we can test on the git tree > directly... I should have my patches re-worked this week for submission. It's Roland's call from there whether to merge everything into 2.6.20, or split the patches into separate branchs (like -mm, multicast, etc.). - Sean From swise at opengridcomputing.com Mon Oct 23 14:17:52 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 23 Oct 2006 16:17:52 -0500 Subject: [openib-general] ucma into for-2.6.20 branch In-Reply-To: <453D2939.5040201@ichips.intel.com> References: <1161630133.15506.14.camel@stevo-desktop> <453D2939.5040201@ichips.intel.com> Message-ID: <1161638272.15506.29.camel@stevo-desktop> Ok, thanks for the update. Stevo. On Mon, 2006-10-23 at 13:42 -0700, Sean Hefty wrote: > Steve Wise wrote: > > Just wondering when you think the UCMA code will be merged into Roland's > > 2.6.20 branch. I'd like to begin review/submission of the chelsio RDMA > > driver, but I'd like the UCMA code in too so we can test on the git tree > > directly... > > I should have my patches re-worked this week for submission. It's Roland's call > from there whether to merge everything into 2.6.20, or split the patches into > separate branchs (like -mm, multicast, etc.). > > - Sean From kliteyn at dev.mellanox.co.il Mon Oct 23 14:30:16 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 23 Oct 2006 23:30:16 +0200 Subject: [openib-general] [PATCH] OpenSM/osm_ucast_updn.h: Eliminate superfluous header file In-Reply-To: <1161617591.20151.18.camel@kliteynik.yok.mtl.com> References: <1161617591.20151.18.camel@kliteynik.yok.mtl.com> Message-ID: <453D3468.5070600@dev.mellanox.co.il> Looks good, thanks. -- Yevgeny > > OpenSM/osm_ucast_updn.h: Eliminate superfluous header file > > Signed-off-by: Hal Rosenstock > > > Index: include/opensm/osm_ucast_updn.h > =================================================================== > --- include/opensm/osm_ucast_updn.h (revision 9945) > +++ include/opensm/osm_ucast_updn.h (working copy) > @@ -1,84 +0,0 @@ > -/* > - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > - * > - * This software is available to you under a choice of one of two > - * licenses. You may choose to be licensed under the terms of the GNU > - * General Public License (GPL) Version 2, available from the file > - * COPYING in the main directory of this source tree, or the > - * OpenIB.org BSD license below: > - * > - * Redistribution and use in source and binary forms, with or > - * without modification, are permitted provided that the following > - * conditions are met: > - * > - * - Redistributions of source code must retain the above > - * copyright notice, this list of conditions and the following > - * disclaimer. > - * > - * - Redistributions in binary form must reproduce the above > - * copyright notice, this list of conditions and the following > - * disclaimer in the documentation and/or other materials > - * provided with the distribution. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > - * SOFTWARE. > - * > - * $Id$ > - */ > - > -#ifndef _OSM_UCAST_UPDN_H_ > -#define _OSM_UCAST_UPDN_H_ > - > -/* > - * Abstract: > - * Implementation of Up Down Algorithm using ranking & Min Hop > - * Calculation functions > - * > - * Environment: > - * Linux User Mode > - * > - * $Revision: 1.0 $ > - */ > -/* LS : This code is useless since we integrate it with opensm */ > -/* > -#include > -#include > -#include > -#include > -#include > -#include > -*/ > - > -#ifdef __cplusplus > -# define BEGIN_C_DECLS extern "C" { > -# define END_C_DECLS } > -#else /* !__cplusplus */ > -# define BEGIN_C_DECLS > -# define END_C_DECLS > -#endif /* __cplusplus */ > - > -BEGIN_C_DECLS > - > -/* //////////////////////////// */ > -/* ENUM TypeDefs */ > -/* /////////////////////////// */ > - > -/* ////////////////////////////////// */ > -/* Struct TypeDefs */ > -/* ///////////////////////////////// */ > - > -/* ////////////////////////////// */ > -/* Function */ > -/* ////////////////////////////// */ > - > -END_C_DECLS > - > -#endif /* _OSM_UCAST_UPDN_H_ */ > Index: include/Makefile.am > =================================================================== > --- include/Makefile.am (revision 9918) > +++ include/Makefile.am (working copy) > @@ -90,7 +90,6 @@ EXTRA_DIST = \ > $(srcdir)/opensm/osm_node_info_rcv_ctrl.h \ > $(srcdir)/opensm/osm_link_mgr.h \ > $(srcdir)/opensm/osm_mcast_fwd_rcv_ctrl.h \ > - $(srcdir)/opensm/osm_ucast_updn.h \ > $(srcdir)/opensm/osm_msgdef.h \ > $(srcdir)/opensm/osm_sa_node_record.h \ > $(srcdir)/opensm/st.h \ > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Mon Oct 23 14:40:11 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 23 Oct 2006 23:40:11 +0200 Subject: [openib-general] [PATCH] opensm: autogen.sh: tools version verification fixes Message-ID: <20061023214011.GA6311@sashak.voltaire.com> This fixes couple of things related to tools version verifications in autogen.sh. Originally autogen.sh was claiming that automake-1.10 is older that automake-1.6.3 and was failing with zero exit status, so: - regular expression fix - proper version string separation - numeric camparison for extracted version elements - non-zero exit status when old tools are detected - slightly improved condition statements Signed-off-by: Sasha Khapyorsky --- osm/autogen.sh | 47 +++++++++++++++++++++-------------------------- 1 files changed, 21 insertions(+), 26 deletions(-) diff --git a/osm/autogen.sh b/osm/autogen.sh index 658d377..6570426 100755 --- a/osm/autogen.sh +++ b/osm/autogen.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/bash # We change dir since the later utilities assume to work in the project dir cd ${0%*/*} @@ -7,49 +7,44 @@ # make sure autoconf is up-to-date ac_ver=`autoconf --version | head -n 1 | awk '{print $NF}'` ac_maj=`echo $ac_ver|sed 's/\..*//'` ac_min=`echo $ac_ver|sed 's/.*\.//'` -if [[ $ac_maj < 2 ]]; then +if [[ $ac_maj -lt 2 ]]; then echo Min autoconf version is 2.57 - exit -fi -if [[ $ac_maj = 2 && $ac_min < 57 ]]; then + exit 1 +elif [[ $ac_maj -eq 2 && $ac_min -lt 57 ]]; then echo Min autoconf version is 2.57 - exit + exit 1 fi # make sure automake is up-to-date am_ver=`automake --version | head -n 1 | awk '{print $NF}'` am_maj=`echo $am_ver|sed 's/\..*//'` -am_min=`echo $am_ver|sed 's/.*\.\([^\.]*\)\..*/\1/'` -am_sub=`echo $am_ver|sed 's/.*\.//'` -if [[ $am_maj < 1 ]]; then +am_min=`echo $am_ver|sed 's/[^\.]*\.\([^\.]*\)\.*.*/\1/'` +am_sub=`echo $am_ver|sed 's/[^\.]*\.[^\.]*\.*//'` +if [[ $am_maj -lt 1 ]]; then echo Min automake version is 1.6.3 - exit -fi -if [[ $am_maj = 1 && $am_min < 6 ]]; then + exit 1 +elif [[ $am_maj -eq 1 && $am_min -lt 6 ]]; then echo "automake version is too old:$am_maj.$am_min.$am_sub < required 1.6.3" - exit -fi -if [[ $am_maj = 1 && $am_min = 6 && $am_sub < 3 ]]; then + exit 1 +elif [[ $am_maj -eq 1 && $am_min -eq 6 && $am_sub -lt 3 ]]; then echo "automake version is too old:$am_maj.$am_min.$am_sub < required 1.6.3" - exit + exit 1 fi # make sure libtool is up-to-date lt_ver=`libtool --version | head -n 1 | awk '{print $4}'` lt_maj=`echo $lt_ver|sed 's/\..*//'` -lt_min=`echo $lt_ver|sed 's/.*\.\([^\.]*\)\..*/\1/'` -lt_sub=`echo $lt_ver|sed 's/.*\.//'` -if [[ $lt_maj < 1 ]]; then +lt_min=`echo $lt_ver|sed 's/[^\.]*\.\([^\.]*\)\.*.*/\1/'` +lt_sub=`echo $lt_ver|sed 's/[^\.]*\.[^\.]*\.*//'` +if [[ $lt_maj -lt 1 ]]; then echo Min libtool version is 1.4.2 - exit -fi -if [[ $lt_maj = 1 && $lt_min < 4 ]]; then + exit 1 +elif [[ $lt_maj -eq 1 && $lt_min -lt 4 ]]; then echo "automake version is too old:$lt_maj.$lt_min.$lt_sub < required 1.4.2" - exit -fi -if [[ $lt_maj = 1 && $lt_min = 4 && $lt_sub < 2 ]]; then + exit 1 +elif [[ $lt_maj -eq 1 && $lt_min -eq 4 && $lt_sub -lt 2 ]]; then echo "automake version is too old:$lt_maj.$lt_min.$lt_sub < required 1.4.2" - exit + exit 1 fi # cleanup -- 1.4.3.g7768 From sashak at voltaire.com Mon Oct 23 14:53:03 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 23 Oct 2006 23:53:03 +0200 Subject: [openib-general] [PATCH] diags: fix compilation warning with gcc-4.1.1 Message-ID: <20061023215303.GB6311@sashak.voltaire.com> This fixes 'differ in signedness pointer' compilation warnings with gcc-4.1.1 . Signed-off-by: Sasha Khapyorsky --- diags/src/ibportstate.c | 8 ++++---- diags/src/smpquery.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/diags/src/ibportstate.c b/diags/src/ibportstate.c index bf180f1..1af87c7 100644 --- a/diags/src/ibportstate.c +++ b/diags/src/ibportstate.c @@ -84,7 +84,7 @@ iberror(const char *fn, char *msg, ...) /*******************************************/ static int -get_node_info(ib_portid_t *dest, char *data) +get_node_info(ib_portid_t *dest, uint8_t *data) { int node_type; @@ -99,7 +99,7 @@ get_node_info(ib_portid_t *dest, char *d } static int -get_port_info(ib_portid_t *dest, char *data, int portnum, int port_op) +get_port_info(ib_portid_t *dest, uint8_t *data, int portnum, int port_op) { char buf[2048]; char val[64]; @@ -120,7 +120,7 @@ get_port_info(ib_portid_t *dest, char *d } static int -set_port_info(ib_portid_t *dest, char *data, int portnum, int port_op) +set_port_info(ib_portid_t *dest, uint8_t *data, int portnum, int port_op) { char buf[2048]; char val[64]; @@ -230,7 +230,7 @@ main(int argc, char **argv) int state, physstate, lwe, lws, lwa, lse, lss, lsa; int peerlocalportnum, peerlwe, peerlws, peerlwa, peerlse, peerlss, peerlsa; int width, peerwidth, peerspeed; - char data[IB_SMP_DATA_SIZE]; + uint8_t data[IB_SMP_DATA_SIZE]; ib_portid_t peerportid = {0}; int portnum = 0; ib_portid_t selfportid = {0}; diff --git a/diags/src/smpquery.c b/diags/src/smpquery.c index 88ad86a..68f9258 100644 --- a/diags/src/smpquery.c +++ b/diags/src/smpquery.c @@ -238,7 +238,7 @@ static char *sl2vl_dump_table_entry(ib_p static char * sl2vl_table(ib_portid_t *dest, char **argv, int argc) { - char data[IB_SMP_DATA_SIZE]; + uint8_t data[IB_SMP_DATA_SIZE]; int type, num_ports, portnum = 0; int i; char *ret; @@ -300,7 +300,7 @@ static char *vlarb_dump_table(ib_portid_ static char * vlarb_table(ib_portid_t *dest, char **argv, int argc) { - char data[IB_SMP_DATA_SIZE]; + uint8_t data[IB_SMP_DATA_SIZE]; int portnum = 0; int type, enhsp0, lowcap, highcap; char *ret = 0; -- 1.4.3.g7768 From parks at lanl.gov Mon Oct 23 15:40:31 2006 From: parks at lanl.gov (Parks Fields) Date: Mon, 23 Oct 2006 16:40:31 -0600 Subject: [openib-general] IPoIB Question In-Reply-To: <453CF505.8050906@dbresearch.net> References: <453CEB85.7090102@dbresearch.net> <20061023163844.GB9248@mellanox.co.il> <453CF505.8050906@dbresearch.net> Message-ID: <7.0.1.0.2.20061023163917.025c51d0@lanl.gov> At 10:59 AM 10/23/2006, Sean Hubbell wrote: >Thanks Michael I looked at iperf and that looks like a very nice tool. Something else about Iperf is, that it supports multiple streams. Which maybe closer to the way some apps operate. ***** Correspondence ***** This email contains no programmatic content that requires independent ADC review From vishal at endace.com Mon Oct 23 16:02:27 2006 From: vishal at endace.com (vishal) Date: Tue, 24 Oct 2006 12:02:27 +1300 Subject: [openib-general] configure: error: C compiler cannot create executables Message-ID: <1161644547.5074.33.camel@julia.et.endace.com> Hi, I got the following error when trying to install OFED 1.1 on SUSE 10.1 Enterprise x86_64:- checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details. Failed to execute: ./configure --cache-file=/var/tmp/OFEDRPM/BUILD/openib-1.1/configure.cache --disable-libcheck --prefix /usr/local/ofed --libdir /usr/local/ofed/lib CPPFLAGS="-I../libibverbs/include" error: Bad exit status from /var/tmp/rpm-tmp.30870 (%install) >From config.log:- configure:2466: $? = 0 configure:2468: gcc -v &5 Using built-in specs. Target: x86_64-suse-linux Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.1.0 --enable-ssp --disable-libssp --enable-java-awt=gtk --enable-gtk-cairo --disable-libjava-multilib --with-slibdir=/lib64 --with-system-zlib --enable-shared --enable-__cxa_atexit --enable-libstdcxx-allocator=new --without-system-libunwind --with-cpu=generic --host=x86_64-suse-linux Thread model: posix gcc version 4.1.0 (SUSE Linux) configure:2471: $? = 0 configure:2473: gcc -V &5 gcc: '-V' option must have argument configure:2476: $? = 1 configure:2499: checking for C compiler default output file name configure:2502: gcc -m32 -g -O2 -I../libibverbs/include -m32 -g -O2 -L/usr/lib conftest.c >&5 /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../libc.so when searching for -lc /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../libc.a when searching for -lc /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/libc.so when searching for -lc /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/libc.a when searching for -lc /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: cannot find -lc collect2: ld returned 1 exit status configure:2505: $? = 1 configure: failed program was: | /* confdefs.h. */ | | #define PACKAGE_NAME "libibverbs" | #define PACKAGE_TARNAME "libibverbs" | #define PACKAGE_VERSION "1.0.4" | #define PACKAGE_STRING "libibverbs 1.0.4" | #define PACKAGE_BUGREPORT "openib-general at openib.org" | #define PACKAGE "libibverbs" | #define VERSION "1.0.4" | /* end confdefs.h. */ | | int | main () | { | | ; | return 0; | } configure:2544: error: C compiler cannot create executables ## ---------------- ## ## Cache variables. ## ## ---------------- ## ac_cv_build=x86_64-unknown-linux-gnu ac_cv_build_alias=x86_64-unknown-linux-gnu ac_cv_env_CC_set= ac_cv_env_CC_value= ac_cv_env_CFLAGS_set=set ac_cv_env_CFLAGS_value='-m32 -g -O2' ac_cv_env_CPPFLAGS_set=set ac_cv_env_CPPFLAGS_value=-I../libibverbs/include ac_cv_env_CPP_set= ac_cv_env_CPP_value= ac_cv_env_CXXCPP_set= ac_cv_env_CXXCPP_value= ac_cv_env_CXXFLAGS_set=set ac_cv_env_CXXFLAGS_value='-m32 -g -O2' ac_cv_env_CXX_set= ac_cv_env_CXX_value= ac_cv_env_F77_set= ac_cv_env_F77_value= ac_cv_env_FFLAGS_set=set ac_cv_env_FFLAGS_value='-m32 -g -O2' ac_cv_env_LDFLAGS_set=set ac_cv_env_LDFLAGS_value='-m32 -g -O2 -L/usr/lib' ac_cv_env_build_alias_set= ac_cv_env_build_alias_value= ac_cv_env_host_alias_set= ac_cv_env_host_alias_value= ac_cv_env_target_alias_set= ac_cv_env_target_alias_value= ac_cv_host=x86_64-unknown-linux-gnu ac_cv_host_alias=x86_64-unknown-linux-gnu ac_cv_path_install='/usr/bin/install -c' ac_cv_prog_AWK=gawk ac_cv_prog_ac_ct_CC=gcc ac_cv_prog_make_make_set=yes ## ----------------- ## ## Output variables. ## ## ----------------- ## ACLOCAL='${SHELL} /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/missing --run aclocal-1.9' AMDEPBACKSLASH='\' AMDEP_FALSE='#' AMDEP_TRUE='' AMTAR='${SHELL} /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/missing --run tar' AR='' AUTOCONF='${SHELL} /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/missing --run autoconf' AUTOHEADER='${SHELL} /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/missing --run autoheader' AUTOMAKE='${SHELL} /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/missing --run automake-1.9' AWK='gawk' CC='gcc' CCDEPMODE='' CFLAGS='-m32 -g -O2' CPP='' CPPFLAGS='-I../libibverbs/include' CXX='' CXXCPP='' CXXDEPMODE='' CXXFLAGS='-m32 -g -O2' CYGPATH_W='echo' DEFS='' DEPDIR='.deps' ECHO='echo' ECHO_C='' ECHO_N='-n' ECHO_T='' EGREP='' EXEEXT='' F77='' FFLAGS='-m32 -g -O2' HAVE_LD_VERSION_SCRIPT_FALSE='' HAVE_LD_VERSION_SCRIPT_TRUE='' INSTALL_DATA='${INSTALL} -m 644' INSTALL_PROGRAM='${INSTALL}' INSTALL_SCRIPT='${INSTALL}' INSTALL_STRIP_PROGRAM='${SHELL} $(install_sh) -c -s' LDFLAGS='-m32 -g -O2 -L/usr/lib' LIBOBJS='' LIBS='' LIBTOOL='' LN_S='' LTLIBOBJS='' MAKEINFO='${SHELL} /var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/missing --run makeinfo' OBJEXT='' PACKAGE='libibverbs' PACKAGE_BUGREPORT='openib-general at openib.org' PACKAGE_NAME='libibverbs' PACKAGE_STRING='libibverbs 1.0.4' PACKAGE_TARNAME='libibverbs' PACKAGE_VERSION='1.0.4' PATH_SEPARATOR=':' RANLIB='' SET_MAKE='' SHELL='/bin/sh' STRIP='' VERSION='1.0.4' ac_ct_AR='' ac_ct_CC='gcc' ac_ct_CXX='' ac_ct_F77='' ac_ct_RANLIB='' ac_ct_STRIP='' am__fastdepCC_FALSE='' am__fastdepCC_TRUE='' am__fastdepCXX_FALSE='' am__fastdepCXX_TRUE='' am__include='include' am__leading_dot='.' am__quote='' am__tar='${AMTAR} chof - "$$tardir"' am__untar='${AMTAR} xf -' bindir='${exec_prefix}/bin' build='x86_64-unknown-linux-gnu' build_alias='' build_cpu='x86_64' build_os='linux-gnu' build_vendor='unknown' datadir='${prefix}/share' exec_prefix='NONE' host='x86_64-unknown-linux-gnu' host_alias='' host_cpu='x86_64' host_os='linux-gnu' host_vendor='unknown' includedir='${prefix}/include' infodir='${prefix}/info' install_sh='/var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/libibverbs/config/install-sh' libdir='/usr/local/ofed/lib' libexecdir='${exec_prefix}/libexec' localstatedir='${prefix}/var' mandir='${prefix}/man' mkdir_p='mkdir -p --' oldincludedir='/usr/include' prefix='/usr/local/ofed' program_transform_name='s,x,x,' sbindir='${exec_prefix}/sbin' sharedstatedir='${prefix}/com' sysconfdir='${prefix}/etc' target_alias='' ## ----------- ## ## confdefs.h. ## ## ----------- ## #define PACKAGE "libibverbs" #define PACKAGE_BUGREPORT "openib-general at openib.org" #define PACKAGE_NAME "libibverbs" #define PACKAGE_STRING "libibverbs 1.0.4" #define PACKAGE_TARNAME "libibverbs" #define PACKAGE_VERSION "1.0.4" #define VERSION "1.0.4" configure: exit 77 I tried to compile a simple program, and gcc does create the executable. Not sure whats missing! Any help would be appreciated. Thanks! Vishal From pw at osc.edu Mon Oct 23 16:08:25 2006 From: pw at osc.edu (Pete Wyckoff) Date: Mon, 23 Oct 2006 19:08:25 -0400 Subject: [openib-general] client-server small message performance issues In-Reply-To: References: <20061017185330.GA2450@quasar.osc.edu> Message-ID: <20061023230825.GA27928@osc.edu> rdreier at cisco.com wrote on Tue, 17 Oct 2006 14:24 -0700: > > Basic ping pong is 25 us. That's fine as this is not a particularly > > optimal way to communicate. Each additional server adds 6 us. That > > seems like a lot of overhead just to do another pair of posts and > > polls, but not my major complaint. Look at the jump from 6 to 7 > > servers, 41 us. Beyond that, too. And the standard deviation > > becomes huge. A plot of the individual values shows a large spread, > > not just a few outliers. > > > The hardware is all Mellanox MT25204 > > I would guess you are seeing the effect of exceeding the size of some > internal HCA cache, maybe the QP state cache. But I don't know enough > details of the HCA internals to know if this is true and if so which > limit you're hitting. Mellanox picked up on my email and sent me new firmware that contains some optimizations for that particular silicon. The pre-release firmware image makes the numbers look much more reasonable. -- Pete From johann.george at qlogic.com Mon Oct 23 16:22:50 2006 From: johann.george at qlogic.com (Johann George) Date: Mon, 23 Oct 2006 16:22:50 -0700 Subject: [openib-general] new server up and running Message-ID: <20061023232250.GA3118@cuprite.pathscale.com> I was asked at today's EWG meeting to remind the group that we do have the new OpenFabrics server up and running. If you are interested in installing and administering a particular package (git, mailman, twiki, etc.), please email me or Matt Leininger and let us know your interest. We can create an account for you to logon and allow you to get started. We are in the process of creating a DNS entry, staging.openfabrics.org, that will reference the new server and will let you know as soon as it is available. Johann From mshefty at ichips.intel.com Mon Oct 23 17:00:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 23 Oct 2006 17:00:18 -0700 Subject: [openib-general] [PATCH] [cma] qp_access_flags was changed to zero In-Reply-To: <453D0BDE.80200@ichips.intel.com> References: <1161591004.22381.1.camel@mtls05.yok.mtl.com> <453D0BDE.80200@ichips.intel.com> Message-ID: <453D5792.4070800@ichips.intel.com> Sean Hefty wrote: > I think there's a related issue in the ib_cm, which also sets > IB_ACCESS_LOCAL_WRITE. The above code is executed when the user calls > rdma_create_qp(). The ib_cm routine is executed when connecting the QP, which > will overwrite these settings. I think we'll want to change both places to get > the desired result. I've pulled in your patch, along with a similar patch to the ib_cm. I'll run through some tests tomorrow to verify that nothing broke. Unless this is causing a serious issue, though, I will request this fix for 2.6.20. - Sean From troy at scl.ameslab.gov Mon Oct 23 17:08:16 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Mon, 23 Oct 2006 19:08:16 -0500 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: References: Message-ID: On Oct 23, 2006, at 8:42 AM, Hoang-Nam Nguyen wrote: > Hello Troy! >> The netpipe code is available with mercurial by: >> hg clone http://source.scl.ameslab.gov/hg/netpipe3-pvfs-dev >> Once you have pvfs2-1.5.1 installed, you should be able to do 'make >> pvfs' in the netpipe3-pvfs-dev directory and build NPpvfs. >> The command line arguments I used to reproduce this were: >> ./NPpvfs -d $PVFS_FILE_PATH -l 32768 -u 268435456 -n 100 -o >> $NETPIPE_OUTPUT_FILE > Did you compile pvfs and NPpvfs as 32-bit or 64-bit libs/execs? > I did compile pvfs and NPpvfs as is and realized that pvfs is built > by default as 32-bit and NPpvfs as 64-bit. Hence NPpvfs complained > to find incompatible pvfs libs. > Regards > Nam > I wasn't able to get reliable backtraces out of a 64 bit NPpvfs and pvfs libs, so I rebuilt as 32 bit, and now I get much more interesting errors and kernel logs.. If I start 4 netpipe processes on the same node with: ./NPpvfs -l 32768 -u 268435456 -n 100 -o results/proc2.w.out -I -d / pvfs2/6node/proc2 I get errors like: 27: 786429 bytes 100 times --> 2249.96 Mbps in 2666.70 usec 28: 786432 bytes 100 times --> [E 18:47:20.394586] Error: ib_check_cq: entry id 0x100ac7f0 opcode RDMA WRITE error IBV_WC_LOC_PROT_ERR. [E 18:47:20.395051] [bt] ./NPpvfs(error+0x9c) [0x1005858c] [E 18:47:20.395087] [bt] ./NPpvfs [0x10056a00] [E 18:47:20.395118] [bt] ./NPpvfs [0x1005726c] And kernel logs like this: Oct 23 18:48:37 p5l8 kernel: PU0007 00060066:print_error_data HCAD_ERROR QP 0xdfe (resource=2000000000000dfe) has errors. Oct 23 18:48:37 p5l8 kernel: PU0007 00060077:print_error_data HCAD_ERROR Error data is available: 2000000000000dfe. Oct 23 18:48:37 p5l8 kernel: PU0007 00060079:print_error_data HCAD_ERROR EHCA ----- error data begin --------------------------------------------------- Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f000 ofs=0000 00000000000004d0 2000000000000dfe Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f010 ofs=0010 0100000000000310 8000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f020 ofs=0020 a000000500000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f030 ofs=0030 0000000001000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f040 ofs=0040 0000000000000001 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f050 ofs=0050 0000000000000014 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f060 ofs=0060 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f070 ofs=0070 000000000000ffff 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f080 ofs=0080 008000000000262b 0000000000ffffff Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f090 ofs=0090 0000000000ffffff 0000000009f49900 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f0a0 ofs=00a0 00000000000e0492 000000000000000a Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f0b0 ofs=00b0 0000000000000001 000000000000002b Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f0c0 ofs=00c0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f0d0 ofs=00d0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f0e0 ofs=00e0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f0f0 ofs=00f0 0000000000000000 0000000000000003 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f100 ofs=0100 000000000000001a 0000000000000004 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f110 ofs=0110 0000000000000004 0000000000000032 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f120 ofs=0120 00000000dc9d4600 0000000003c32f28 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f130 ofs=0130 000000000009f4aa 000000000009f4aa Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f140 ofs=0140 0a00000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f150 ofs=0150 0000000000000002 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f160 ofs=0160 0000000000002633 000000000000262c Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f170 ofs=0170 0000000000000001 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f180 ofs=0180 0000000000000006 0000000000000004 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f190 ofs=0190 0000000000000004 00000001da05023d Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f1a0 ofs=01a0 000000000000001f 000000000000262b Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f1b0 ofs=01b0 00000000dc9d4600 0000000003c32f28 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f1c0 ofs=01c0 0000000000000001 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f1d0 ofs=01d0 00000000dc9e5600 0000000003c33328 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f1e0 ofs=01e0 0000000000000006 0000000000000001 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f1f0 ofs=01f0 0000000000000003 000000000000262c Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f200 ofs=0200 000000000009f499 0000000000000004 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f210 ofs=0210 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f220 ofs=0220 0000000000000000 0000003000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f230 ofs=0230 0000000000000002 000000000000262b Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f240 ofs=0240 0000000000000000 000000000009f4a9 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f250 ofs=0250 00000000e3e9f820 0000000000000106 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f260 ofs=0260 0000000000000106 0000000000000003 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f270 ofs=0270 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f280 ofs=0280 008000000000262b 0000000000ffffff Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f290 ofs=0290 0000000000ffffff 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f2a0 ofs=02a0 000000000000262c 8000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f2b0 ofs=02b0 09f22a0000000000 3808000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f2c0 ofs=02c0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f2d0 ofs=02d0 0000000000000000 2000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f2e0 ofs=02e0 8000000000000000 3808000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f2f0 ofs=02f0 0000000000000000 6800000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f300 ofs=0300 a800000000000000 0000003000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f310 ofs=0310 4000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f320 ofs=0320 0000000000000000 02000000000000c8 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f330 ofs=0330 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f340 ofs=0340 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f350 ofs=0350 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f360 ofs=0360 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f370 ofs=0370 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f380 ofs=0380 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f390 ofs=0390 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f3a0 ofs=03a0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f3b0 ofs=03b0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f3c0 ofs=03c0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f3d0 ofs=03d0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f3e0 ofs=03e0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f3f0 ofs=03f0 0000000000000000 0400000000000060 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f400 ofs=0400 8000000000000000 c000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f410 ofs=0410 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f420 ofs=0420 0000000003c2a383 00000000d7bc8280 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f430 ofs=0430 000000000000043f 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f440 ofs=0440 0000000000000000 0003000000000004 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f450 ofs=0450 0000000000000004 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f460 ofs=0460 0300000000000068 8040000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f470 ofs=0470 c000c00000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f480 ofs=0480 0000000000000000 0000000003c4ae81 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f490 ofs=0490 00000000fbe4f960 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f4a0 ofs=04a0 0000000000000000 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f4b0 ofs=04b0 0000000000000000 0000000000000004 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007a:print_error_data resource=2000000000000dfe adr=c00000012ec3f4c0 ofs=04c0 0000000000000004 0000000000000000 Oct 23 18:48:37 p5l8 kernel: PU0007 0006007c:print_error_data HCAD_ERROR EHCA ----- error data end ---------------------------------------------------- From rdreier at cisco.com Mon Oct 23 20:00:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 20:00:45 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <20061020214921.GB4054@obsidianresearch.com> (Jason Gunthorpe's message of "Fri, 20 Oct 2006 15:49:21 -0600") References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> <20061020214921.GB4054@obsidianresearch.com> Message-ID: > I just took a quick look at asm-ia64/io.h and there is __ia64_mf_a > barriers after all non-posted IO operations (ib/outb). config write and > config read transcations have identical rules to IO transactions at > the PCI bus level. > > I'm going to go out on a limb here and say that if Linux code assumes > strong ordering of IO operations then it makes sense to also assume > strong ordering on config writes. So, instead of patching mthca with > this barrier it should go in the Altix config access mechanism.. I don't really know what mf.a does on ia64, but it seems likely that even if the CPU issues and retires the reads and writes in order (which is all a CPU barrier is likely to do), we would still have a problem where the PCI-X host bridge allows the MMIO read to pass the config write, because the config write is still pending a split completion on the bus. And a quick web search finds "interesting" stuff like this: > Platform-acceptance is a tricky business, as it's, well, platform > dependent (note that "mf.a" doesn't really guarantee to do anything). So I'm inclined to think this patch is correct. However, I'll check with linux-pci and linux-ia64 before asking Linus to merge it. Thanks, Roland From rdreier at cisco.com Mon Oct 23 20:37:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 23 Oct 2006 20:37:48 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <45394C86.2030708@sgi.com> (John Partridge's message of "Fri, 20 Oct 2006 17:24:06 -0500") References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> <45394C86.2030708@sgi.com> Message-ID: > So, do you need anything alse from me like an attached copy of the patch ? I thought I had everything I needed -- but actually could you send a new copy of the patch along with a Signed-off-by: line? Thanks, Roland From panda at cse.ohio-state.edu Mon Oct 23 20:36:56 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Mon, 23 Oct 2006 23:36:56 -0400 (EDT) Subject: [openib-general] GPL only files in OFA repository In-Reply-To: from "Chet Mehta" at Oct 23, 2006 11:56:06 AM Message-ID: <200610240336.k9O3auap026717@xi.cse.ohio-state.edu> Hi Chet, Thanks for your note. Some of the files do not belong to mvapich/mvapich2 directories. Some of the other files come with the standard MPICH and MPICH2 distributions by Argonne. We are taking a closer look at these files. Best Regards, DK > A cursory scan of OFA code respository shows d the following files to > contact a GPL only license. Since OFA Bylaws require code contributions to > include BSD & GPL licenses, can the code owners/contributors of these > files either update the files to include the appropriate licenses or > provide details on why the files cannot be licensed under BSD? > > ./include/linux/mutex-backport.h > ./include/linux/.svn/text-base/mutex-backport.h.svn-base > ./mpi/mvapich-gen2/examples/perftest/config/confdb/aclangf90.m4 > ./mpi/mvapich-gen2/examples/perftest/config/confdb/.svn/text-base/fortran90.m4.svn-base > ./mpi/mvapich-gen2/examples/perftest/config/confdb/.svn/text-base/aclangf90.m4.svn-base > ./mpi/mvapich-gen2/examples/perftest/config/confdb/fortran90.m4 > ./mpi/mvapich-gen2/doc/.svn/text-base/mpichman-chshmem.pdf.svn-base > ./mpi/mvapich-gen2/doc/mpichman-chshmem.pdf > ./mpi/mvapich2-gen2/confdb/aclangf90.m4 > ./mpi/mvapich2-gen2/confdb/.svn/text-base/fortran90.m4.svn-base > ./mpi/mvapich2-gen2/confdb/.svn/text-base/aclangf90.m4.svn-base > ./mpi/mvapich2-gen2/confdb/fortran90.m4 > > Thank you. > :Chet. > --=_alternative 005D077886257210_= > Content-Type: text/html; > charset=us-ascii > Content-Transfer-Encoding: 7bit > > >
A cursory scan of OFA code respository > shows d the following files to contact a GPL only license. Since OFA Bylaws > require code contributions to include BSD & GPL licenses, can the code > owners/contributors of these files either update the files to include the > appropriate licenses or provide details on why the files cannot be licensed > under BSD? >
>
./include/linux/mutex-backport.h >
./include/linux/.svn/text-base/mutex-backport.h.svn-base >
./mpi/mvapich-gen2/examples/perftest/config/confdb/aclangf90.m4 >
./mpi/mvapich-gen2/examples/perftest/config/confdb/.svn/text-base/fortran90.m4.svn-base >
./mpi/mvapich-gen2/examples/perftest/config/confdb/.svn/text-base/aclangf90.m4.svn-base >
./mpi/mvapich-gen2/examples/perftest/config/confdb/fortran90.m4 >
./mpi/mvapich-gen2/doc/.svn/text-base/mpichman-chshmem.pdf.svn-base >
./mpi/mvapich-gen2/doc/mpichman-chshmem.pdf >
./mpi/mvapich2-gen2/confdb/aclangf90.m4 >
./mpi/mvapich2-gen2/confdb/.svn/text-base/fortran90.m4.svn-base >
./mpi/mvapich2-gen2/confdb/.svn/text-base/aclangf90.m4.svn-base >
./mpi/mvapich2-gen2/confdb/fortran90.m4 >
>
Thank you. >
:Chet. > --=_alternative 005D077886257210_=-- > > --===============1169631496== > MIME-Version: 1.0 > Content-Type: text/plain; > charset=us-ascii > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > --===============1169631496==-- > From greg.lindahl at qlogic.com Mon Oct 23 22:00:24 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Mon, 23 Oct 2006 22:00:24 -0700 Subject: [openib-general] IPoIB Question In-Reply-To: References: Message-ID: <20061024050023.GA1841@greglaptop.hsd1.ca.comcast.net> On Mon, Oct 23, 2006 at 07:53:06AM -0500, Hubbell, Sean C Contractor/Decibel wrote: > I currently have several applications that uses a legacy IPv4 protocol > and I use IPoIB to utilize my infiniband network which works great. I > have completed some timing and throughput analysis and noticed that I do > not get very much more if I use an infiniband network interface than > using my GigE network interface. You might want to note that different InfinBand implementations have quite different performance of IPoIB, especially for UDP. Another issue is that IPoIB has quite different performance with different Linux kernels. This is especially evident for TCP, although you can use SDP to accelerate TCP sockets and avoid this issue. > My question is, am I using IPoIB correctly or are these the typical > numbers that everyone is seeing? It is certainly the case that there are some message patterns and situations for which InfiniBand is not much of an improvement over gigE. -- greg From jgunthorpe at obsidianresearch.com Mon Oct 23 22:12:20 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 23 Oct 2006 23:12:20 -0600 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> <20061020214921.GB4054@obsidianresearch.com> Message-ID: <20061024051220.GC25360@obsidianresearch.com> On Mon, Oct 23, 2006 at 08:00:45PM -0700, Roland Dreier wrote: > > I just took a quick look at asm-ia64/io.h and there is __ia64_mf_a > > barriers after all non-posted IO operations (ib/outb). config write and > > config read transcations have identical rules to IO transactions at > > the PCI bus level. > > > > I'm going to go out on a limb here and say that if Linux code assumes > > strong ordering of IO operations then it makes sense to also assume > > strong ordering on config writes. So, instead of patching mthca with > > this barrier it should go in the Altix config access mechanism.. > I don't really know what mf.a does on ia64, but it seems likely that > even if the CPU issues and retires the reads and writes in order > (which is all a CPU barrier is likely to do), we would still have a > problem where the PCI-X host bridge allows the MMIO read to pass the > config write, because the config write is still pending a split > completion on the bus. Well, I'm in the same boat as you as far as Altix goes, but in general, on multi-processor systems some barrier instructions can produce fencing operations on the CPU bus that a PCI-X bridge or chipset might observe. Eventually in that discussion thread you found there is this: > Well, mf.a *does* do something on the 460 chipset. So I think it's > mostly a platform issue. Are you saying that on SGI's IA-64 platforms > mf.a doesn't do the equivalent of the MIPS sync? (Just curious.) So on some Itaniums mf.a comes out the CPU bus in some manner.. After thinking about it some more, there are more cases in the kernel than just the one in mthca. For instance there are paths in the pci core than manipulate BAR registers without a barrier. It would be bad to change a bar via config write and issue a MMIO operation to the new address without waiting for the split to return. I also know of a fibre channel chip that has a similar strong requirement of order in the reset sequence like mthca. Regards, Jason From HNGUYEN at de.ibm.com Tue Oct 24 00:21:38 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Tue, 24 Oct 2006 09:21:38 +0200 Subject: [openib-general] ibv_reg_mr failure with pvfs on ehca? In-Reply-To: <453CCEDC.5010908@scl.ameslab.gov> Message-ID: Hi Kyle! > And, setting the debug_level flag definitely caused the server to not > respond... I rebooted and tried it again, same thing, setting the > debug_level flag causes the server to crash. (I can still login, but > cannot execute anything, e.g. 'ls', it seems all the cpu's are spinning) > p5l5:~# modprobe hcad_mod nr_ports=1 debug_level=99999999 > console output after above command hangs server: > PU0003 000e0252:hipz_h_register_rpage >>> > adapter_handle=1000000203000004 pagesize=0 queue_type=0 > resource_handle=7000000100018600 logical_address_of_page=e6741000 count=200 > PU0003 000e0078:ehca_hcall_7arg_7ret >>> opcode=1ac > arg1=1000000203000004 arg2=0 arg3=7000000100018600 arg4=e6741000 > arg5=200 arg6=0 arg7=0 > PU0003 000e0096:ehca_hcall_7arg_7ret <<< opcode=1ac ret=f out1=50 > out2=50 out3=50 out4=50 out5=50 out6=50 out7=50 > PU0003 000e0263:hipz_h_register_rpage <<< ret=f > PU0003 000e04ad:hipz_h_register_rpage_mr <<< ret=f > PU0003 0009076c:ehca_set_pagebuf >>> pginfo=c0000000eb7b75e0 type=1 > num_pages=1d4000 num_4k=1d4000 next_buf=0 next_4k=30600 number=200 > kpage=c0000000e6741000 page_cnt=30600 page_4k_cnt=30600 next_listelem=0 > region=0000000000000000 next_chunk=0000000000000000 next_nmap=0 > PU0003 00090807:ehca_set_pagebuf <<< ret=0 e_mr=c0000000e1ac2e80 > pginfo=c0000000eb7b75e0 type=1 num_pages=1d4000 num_4k=1d4000 next_buf=0 > next_4k=30800 number=200 kpage=c0000000e6742000 page_cnt=30800 > page_4k_cnt=30800 i=200 next_listelem=0 region=0000000000000000 > next_chunk=0000000000000000 next_nmap=0 > PU0003 000e049e:hipz_h_register_rpage_mr >>> > adapter_handle=1000000203000004 mr=c0000000e1ac2e80 > mr_handle=7000000100018600 pagesize=0 queue_type=0 > logical_address_of_page=e6741000 count=200 > PU0003 000e0252:hipz_h_register_rpage >>> > adapter_handle=1000000203000004 pagesize=0 queue_type=0 > resource_handle=7000000100018600 logical_address_of_page=e6741000 count=200 > PU0003 000e0078:ehca_hcall_7arg_7ret >>> opcode=1ac > arg1=1000000203000004 arg2=0 arg3=7000000100018600 arg4=e6741000 > arg5=200 arg6=0 arg7=0 > PU0003 000e0096:ehca_hcall_7arg_7ret <<< opcode=1ac ret=f out1=50 > out2=50 out3=50 out4=50 out5=50 out6=50 out7=50 > PU0003 000e0263:hipz_h_register_rpage <<< ret=f > We looked at the traces above and saw a register MR with 0x1d4000 pages, that's about 7,3GB. In this trace part we are at registering the pages 0x30600-0x307FF. So we really guess the system seems to be busy with flushing out the remaining traces and appears to hang while you can do login or ping to it. Fortunately you have an "old" version of ehca that allows selecting debug traces for certain components. In this case I would filter only debug traces for mrmw, and the command for that looks like this: echo 66666666696666666666 > /sys/bus/ibmebus/drivers/ehca/debug_level ^this should turn on debug traces for mrmw only Or you pass the option debug_level to modprobe: modprobe hcad_mod debug_level=66666666696666666666 then you should see only mrmw traces in dmesg and that's still a lot, because we do register the whole mem space at module load time. If that still seems to hang, I can provide you with a debug patch later. For now please give us little time to set up test envs and recreate your problem. Thanks! Nam From mst at mellanox.co.il Tue Oct 24 00:33:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 24 Oct 2006 09:33:46 +0200 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <20061024051220.GC25360@obsidianresearch.com> References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> <20061020214921.GB4054@obsidianresearch.com> <20061024051220.GC25360@obsidianresearch.com> Message-ID: <20061024073346.GD6763@mellanox.co.il> Quoting r. Jason Gunthorpe : > After thinking about it some more, there are more cases in the kernel > than just the one in mthca. For instance there are paths in the pci > core than manipulate BAR registers without a barrier. It would be bad > to change a bar via config write and issue a MMIO operation to the new > address without waiting for the split to return. I also know of a > fibre channel chip that has a similar strong requirement of order in > the reset sequence like mthca. Interesting. So maybe pci config access commands should include the barrier instead of sticking these things in mthca? -- MST From eli at dev.mellanox.co.il Tue Oct 24 00:57:40 2006 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Tue, 24 Oct 2006 09:57:40 +0200 Subject: [openib-general] IPoIB odd loopback packet from arp In-Reply-To: References: Message-ID: <1161676660.7290.22.camel@localhost> Todd, This does not look like an error. The first arp is a broadcast (qpn=ffffff) so it is received in at the sending interface and is dropped. The second on is a unicast arp (qpn=0x000404) so it is not received at the local interface. On Mon, 2006-10-23 at 13:48 -0600, Todd Bowman wrote: > Using the OFED 1.0 and OFED 1.1 stack I have notice some rcvswrelay > errors. I have tracked it down to the arp request. I can reproduce > the problem with the following steps: > > ( I have used both 2.6.14.14 and 2.6.18.1 kernels) > > ib109> arp -d ib110 > ib109> ping ib110 -c 2 > > # ib_ipoib module debug > 13:15:46 ib109 kernel: ib0: sending packet, length=60 address=f6187200 > qpn=0xffffff > 13:15:46 ib109 kernel: ib0: called: id 34, op 0, status: 0 > 13:15:46 ib109 kernel: ib0: send complete, wrid 34 > 13:15:46 ib109 kernel: ib0: called: id -2147483623, op 128, status: 0 > 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x0369 > 13:15:46 ib109 kernel: ib0: dropping loopback packet > 13:15:46 ib109 kernel: ib0: called: id -2147483622, op 128, status: 0 > 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x016d > 13:15:46 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 > qpn=0x000404 > 13:15:46 ib109 kernel: ib0: called: id 35, op 0, status: 0 > 13:15:46 ib109 kernel: ib0: send complete, wrid 35 > 13:15:46 ib109 kernel: ib0: called: id -2147483621, op 128, status: 0 > 13:15:46 ib109 kernel: ib0: received 128 bytes, SLID 0x016d > 13:15:47 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 > qpn=0x000404 > 13:15:47 ib109 kernel: ib0: called: id 36, op 0, status: 0 > 13:15:47 ib109 kernel: ib0: send complete, wrid 36 > 13:15:47 ib109 kernel: ib0: called: id -2147483620, op 128, status: 0 > 13:15:47 ib109 kernel: ib0: received 128 bytes, SLID 0x016d > 13:15:51 ib109 kernel: ib0: called: id -2147483619, op 128, status: 0 > 13:15:51 ib109 kernel: ib0: received 100 bytes, SLID 0x016d > 13:15:51 ib109 kernel: ib0: sending packet, length=60 address=f6e57520 > qpn=0x000404 > 13:15:51 ib109 kernel: ib0: called: id 37, op 0, status: 0 > 13:15:51 ib109 kernel: ib0: send complete, wrid 37 > > # tcpdump -i ib0 > 13:15:46.977578 arp who-has ib110 tell ib109 hardware #32 > 13:15:46.977682 arp reply ib110 is-at > 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:11:59 hardware > #32 > 13:15:46.977710 IP ib109 > ib110: icmp 64: echo request seq 0 > 13:15:46.977790 IP ib110 > ib109: icmp 64: echo reply seq 0 > 13:15:47.977772 IP ib109 > ib110: icmp 64: echo request seq 1 > 13:15:47.977892 IP ib110 > ib109: icmp 64: echo reply seq 1 > 13:15:51.977076 arp who-has ib109 tell ib110 hardware #32 > 13:15:51.977094 arp reply ib109 is-at > 00:02:00:14:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:3b:31 hardware > #32 > > # error dump > rcvswrelayerrors:1 MT47396 Infiniscale-III 0x2c9010b022090[1] > <--------> ib109 HCA-1 0x2c90200003b30[1] > > 1) The ping is successful and the arp table is populated so Is this > really a problem or a false positive? > 2) The second arp does not generate an error (the error dump reports > all new errors in switches). Why? > > Any ideas? > > Thanks in advance. > > Todd > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ogerlitz at voltaire.com Tue Oct 24 01:36:23 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 24 Oct 2006 10:36:23 +0200 (IST) Subject: [openib-general] problems using the ipoib ha package of RH4 U3 Message-ID: Hi Vlad, I am trying to test the ipoib ha package on OFED 1.1 over RH4 U3 and does not manage to get it working fine. I am running the following commands: ifup ib0 ifup ib1 ipoib_ha_log="/var/log/ipoib_ha.log" sleep 2 ./ipoib_ha.pl -p ib0 -s ib1 --with-arping --with-multicast -v > ${ipoib_ha_log} 2>&1 & and have ib0 working fine, then i cause the ib port associated with ib0 (port 1 of the hca) to get into initialized state but i don't see any failover happening. After the port state is changed, the HA script prints the following Got CARRIER-ON event on ib0. Got CARRIER-ON event on ib1. Got CARRIER-ON event on ib1. but it does seem to do any fail-over (ie does not configure ib1 with the address of ib0, etc) Below is the full output of the ha script (/var/log/ipoib_ha.log) and my ifcfg config scripts. I think there might was an issue with the version of the iproute package, i see that my RH4 U3 system uses iproute-2.6.9-3, is this the issue? Or. Date:Tue Oct 24 13:25:02 2006 ib0: ====================================== NETMASK = 255.255.255.0 BOOTPROTO = static IPADDR = 192.168.3.61 status = ONBOOT = yes HA = 0 DEVICE = ib0 Date:Tue Oct 24 13:25:02 2006 Bond: ====================================== NETMASK = 255.255.255.0 BOOTPROTO = static IPADDR = 192.168.3.61 status = ONBOOT = yes HA = 0 DEVICE = ib0 Got CARRIER-ON event on ib0. Got CARRIER-ON event on ib1. Got CARRIER-ON event on ib1. [root at excell02 ipoibtools]# cat /etc/sysconfig/network-scripts/ifcfg-ib0 # Static settings; all values provided by this file DEVICE=ib0 BOOTPROTO=static ONBOOT=yes IPADDR=192.168.3.61 NETMASK=255.255.255.0 [root at excell02 ipoibtools]# cat /etc/sysconfig/network-scripts/ifcfg-ib1 # Static settings; all values provided by this file DEVICE=ib1 BOOTPROTO=static ONBOOT=yes IPADDR=192.168.3.61 NETMASK=255.255.255.0 From vlad at mellanox.co.il Tue Oct 24 01:46:12 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 24 Oct 2006 10:46:12 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 Message-ID: <6C2C79E72C305246B504CBA17B5500C934806B@mtlexch01.mtl.com> Hi Or, Try to disconnect the port 1 of HCA from IB subnet and then check if ib1 became active. Regards, Vladimir > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Tuesday, October 24, 2006 10:36 AM > To: Vladimir Sokolovsky > Cc: openib-general at openib.org > Subject: problems using the ipoib ha package of RH4 U3 > > Hi Vlad, > > I am trying to test the ipoib ha package on OFED 1.1 over RH4 > U3 and does not manage to get it working fine. I am running > the following commands: > > ifup ib0 > ifup ib1 > ipoib_ha_log="/var/log/ipoib_ha.log" > sleep 2 > ./ipoib_ha.pl -p ib0 -s ib1 --with-arping --with-multicast -v > > ${ipoib_ha_log} 2>&1 & > > and have ib0 working fine, then i cause the ib port > associated with ib0 (port 1 of the hca) to get into > initialized state but i don't see any failover happening. > > After the port state is changed, the HA script prints the following > > Got CARRIER-ON event on ib0. > Got CARRIER-ON event on ib1. > Got CARRIER-ON event on ib1. > > but it does seem to do any fail-over (ie does not configure > ib1 with the address of ib0, etc) > > Below is the full output of the ha script > (/var/log/ipoib_ha.log) and my ifcfg config scripts. > > I think there might was an issue with the version of the > iproute package, i see that my RH4 U3 system uses > iproute-2.6.9-3, is this the issue? > > Or. > > Date:Tue Oct 24 13:25:02 2006 > ib0: > ====================================== > NETMASK = 255.255.255.0 > BOOTPROTO = static > IPADDR = 192.168.3.61 > status = > ONBOOT = yes > HA = 0 > DEVICE = ib0 > > Date:Tue Oct 24 13:25:02 2006 > Bond: > ====================================== > NETMASK = 255.255.255.0 > BOOTPROTO = static > IPADDR = 192.168.3.61 > status = > ONBOOT = yes > HA = 0 > DEVICE = ib0 > Got CARRIER-ON event on ib0. > Got CARRIER-ON event on ib1. > Got CARRIER-ON event on ib1. > > > > > [root at excell02 ipoibtools]# cat > /etc/sysconfig/network-scripts/ifcfg-ib0 > # Static settings; all values provided by this file > DEVICE=ib0 BOOTPROTO=static ONBOOT=yes > IPADDR=192.168.3.61 > NETMASK=255.255.255.0 > > [root at excell02 ipoibtools]# cat > /etc/sysconfig/network-scripts/ifcfg-ib1 > # Static settings; all values provided by this file > DEVICE=ib1 > BOOTPROTO=static > ONBOOT=yes > IPADDR=192.168.3.61 > NETMASK=255.255.255.0 > From ogerlitz at voltaire.com Tue Oct 24 01:52:47 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 24 Oct 2006 10:52:47 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 In-Reply-To: <6C2C79E72C305246B504CBA17B5500C934806B@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C934806B@mtlexch01.mtl.com> Message-ID: <453DD45F.7060308@voltaire.com> Vladimir Sokolovsky wrote: > Try to disconnect the port 1 of HCA from IB subnet and then check if ib1 > became active. OK, i will be able to do it later today, anyway, something bothers me here: the script is based on parsing the output of $ /sbin/ip monitor link and this output should be the same for both the cases (ie unplug the ib cable or make the port state down) since it is in the **ip** and not **ib** level. Or. From mst at mellanox.co.il Tue Oct 24 02:00:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 24 Oct 2006 11:00:58 +0200 Subject: [openib-general] configure: error: C compiler cannot create executables In-Reply-To: <1161644547.5074.33.camel@julia.et.endace.com> References: <1161644547.5074.33.camel@julia.et.endace.com> Message-ID: <20061024090058.GB7154@mellanox.co.il> Quoting vishal : > Thread model: posix > gcc version 4.1.0 (SUSE Linux) > configure:2471: $? = 0 > configure:2473: gcc -V &5 > gcc: '-V' option must have argument > configure:2476: $? = 1 > configure:2499: checking for C compiler default output file name > configure:2502: gcc -m32 -g -O2 -I../libibverbs/include -m32 -g -O2 > -L/usr/lib conftest.c >&5 > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../libc.so when searching for -lc > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../libc.a when searching for -lc > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/libc.so when searching for -lc > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/libc.a when searching for -lc > /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: cannot find -lc Look at the compiler command line - it tries to create a 32 bit executable and does not find the C library for that. You should have 32 bit devel packages installed. Vlad, is this in RN somewhere? -- MST From vlad at mellanox.co.il Tue Oct 24 02:09:29 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 24 Oct 2006 11:09:29 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 Message-ID: <6C2C79E72C305246B504CBA17B5500C934808A@mtlexch01.mtl.com> The HA service (ipoib_ha.pl) is looking for "NO-CARRIER" in the "ip" utility output. Ipoib_ha.pl use an updated "ip" utility which is a part of ipoibtools RPM and located under 'prefix'/utils directory. The fail-over flow is the following: On every event received from 'ip monitor link' check if the primary IPoIB interface has NO-CARRIER using 'ip link show' command. If "yes" then check the status of the secondary interface using the same command. And if the secondary interface has no NO-CARRIER then migrate IPoIB configuration from primary to the secondary interface. Regards, Vladimir > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Tuesday, October 24, 2006 10:53 AM > To: Vladimir Sokolovsky > Cc: openib-general at openib.org > Subject: Re: problems using the ipoib ha package of RH4 U3 > > Vladimir Sokolovsky wrote: > > Try to disconnect the port 1 of HCA from IB subnet and then > check if > > ib1 became active. > > OK, i will be able to do it later today, anyway, something bothers me > here: the script is based on parsing the output of > > $ /sbin/ip monitor link > > and this output should be the same for both the cases (ie > unplug the ib cable or make the port state down) since it is > in the **ip** and not > **ib** level. > > Or. > From ogerlitz at voltaire.com Tue Oct 24 05:00:45 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 24 Oct 2006 14:00:45 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 In-Reply-To: <6C2C79E72C305246B504CBA17B5500C934808A@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C934808A@mtlexch01.mtl.com> Message-ID: <453E006D.9080405@voltaire.com> Vladimir Sokolovsky wrote: > The HA service (ipoib_ha.pl) is looking for "NO-CARRIER" in the "ip" > utility output. > Ipoib_ha.pl use an updated "ip" utility which is a part of ipoibtools > RPM and located under 'prefix'/utils directory. OK, when using the ipoib_ha.pl in the ipoibtools rpm framework things are working quite fine. My understanding is that the difference being usage of the ofed provided ip utility (/usr/local/ofed/bin/ip) vs the ip utility that comes with the distro (/sbin/ip which on RH4 U3 comes with iproute-2.6.9-3). So this iproute package lacks the reporting of "NO-CARRIER" and you provide an ip util which does report that, correct? From my experience on other systems "(ie SLES9 SP1 whre /sbin/ip comes with iproute2-2.4.7-866.8) there are version of the iproute package which does support reporting no carrier change. Have you tried finding such one for RH4, ie iproute-2.X.Y-Z? Doug - maybe you can help here? Or. From dotanb at dev.mellanox.co.il Tue Oct 24 05:00:29 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Tue, 24 Oct 2006 14:00:29 +0200 Subject: [openib-general] [PATCH] [cma] qp_access_flags was changed to zero In-Reply-To: <453D5792.4070800@ichips.intel.com> References: <1161591004.22381.1.camel@mtls05.yok.mtl.com> <453D0BDE.80200@ichips.intel.com> <453D5792.4070800@ichips.intel.com> Message-ID: <453E005D.5000704@dev.mellanox.co.il> Sean Hefty wrote: > Sean Hefty wrote: >> I think there's a related issue in the ib_cm, which also sets >> IB_ACCESS_LOCAL_WRITE. The above code is executed when the user >> calls rdma_create_qp(). The ib_cm routine is executed when >> connecting the QP, which will overwrite these settings. I think >> we'll want to change both places to get the desired result. > > I've pulled in your patch, along with a similar patch to the ib_cm. > I'll run through some tests tomorrow to verify that nothing broke. > Unless this is causing a serious issue, though, I will request this > fix for 2.6.20. > > - Sean Great, thanks. There were two reasons for me to send this patch in the first place: 1) if the low level driver will add a check to the remote access permission, this code may fail. 2) the core is being used as a reference code for new programmers, so it should be correct. thanks Dotan From vlad at mellanox.co.il Tue Oct 24 05:16:02 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 24 Oct 2006 14:16:02 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 Message-ID: <6C2C79E72C305246B504CBA17B5500C934810B@mtlexch01.mtl.com> Please see below, Regards, Vladimir > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Tuesday, October 24, 2006 2:01 PM > To: Vladimir Sokolovsky > Cc: openib-general at openib.org; dledford at redhat.com > Subject: Re: problems using the ipoib ha package of RH4 U3 > > Vladimir Sokolovsky wrote: > > The HA service (ipoib_ha.pl) is looking for "NO-CARRIER" in the "ip" > > utility output. > > Ipoib_ha.pl use an updated "ip" utility which is a part of > ipoibtools > > RPM and located under 'prefix'/utils directory. > > OK, when using the ipoib_ha.pl in the ipoibtools rpm > framework things are working quite fine. > > My understanding is that the difference being usage of the > ofed provided ip utility (/usr/local/ofed/bin/ip) vs the ip > utility that comes with the distro (/sbin/ip which on RH4 U3 > comes with iproute-2.6.9-3). > > So this iproute package lacks the reporting of "NO-CARRIER" > and you provide an ip util which does report that, correct? Correct. > > From my experience on other systems "(ie SLES9 SP1 whre > /sbin/ip comes with iproute2-2.4.7-866.8) there are version > of the iproute package which does support reporting no carrier change. > > Have you tried finding such one for RH4, ie iproute-2.X.Y-Z? I didn't found updated iproute2 package for RH, so I included the latest iproute2 from sourceforge into ipoibtools package. > > Doug - maybe you can help here? > > Or. > > > From ogerlitz at voltaire.com Tue Oct 24 05:44:29 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 24 Oct 2006 14:44:29 +0200 Subject: [openib-general] configure: error: C compiler cannot create executables In-Reply-To: <20061024090058.GB7154@mellanox.co.il> References: <1161644547.5074.33.camel@julia.et.endace.com> <20061024090058.GB7154@mellanox.co.il> Message-ID: <453E0AAD.3010709@voltaire.com> Michael S. Tsirkin wrote: > Quoting vishal : >> Thread model: posix >> gcc version 4.1.0 (SUSE Linux) >> configure:2471: $? = 0 >> configure:2473: gcc -V &5 >> gcc: '-V' option must have argument >> configure:2476: $? = 1 >> configure:2499: checking for C compiler default output file name >> configure:2502: gcc -m32 -g -O2 -I../libibverbs/include -m32 -g -O2 >> -L/usr/lib conftest.c >&5 >> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../libc.so when searching for -lc >> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../libc.a when searching for -lc >> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/libc.so when searching for -lc >> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib64/libc.a when searching for -lc >> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/../../../../x86_64-suse-linux/bin/ld: cannot find -lc > > Look at the compiler command line - it tries to create a 32 bit executable > and does not find the C library for that. > You should have 32 bit devel packages installed. Michael, Cool, thanks for pointing this out, for me it (installing the glibc 32bit devel rpm) solved this problem. Or. From ogerlitz at voltaire.com Tue Oct 24 06:43:28 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 24 Oct 2006 15:43:28 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 In-Reply-To: <6C2C79E72C305246B504CBA17B5500C934808A@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C934808A@mtlexch01.mtl.com> Message-ID: <453E1880.6090309@voltaire.com> Vlad, On a related issue, i think it was mentioned at some point that the ipoib ha package does not support devices which use the non standard pkey, is it correct? why? My thinking was that if the pkey 8001 is configured for both port 1 and port2 of the hca, the user hacks /etc/init.d/openibd to create ib0:8001 and ib1:8001 child devices **before** the ha service is started and then tells the ha service to use ib0:8001 as primary and ib1:8001 as secondary, it should work, what do you think? Or. From shubbell at dbresearch.net Tue Oct 24 06:35:18 2006 From: shubbell at dbresearch.net (Sean Hubbell) Date: Tue, 24 Oct 2006 08:35:18 -0500 Subject: [openib-general] IPoIB Question In-Reply-To: <20061024050023.GA1841@greglaptop.hsd1.ca.comcast.net> References: <20061024050023.GA1841@greglaptop.hsd1.ca.comcast.net> Message-ID: <453E1696.9090708@dbresearch.net> Greg Lindahl wrote: > On Mon, Oct 23, 2006 at 07:53:06AM -0500, Hubbell, Sean C Contractor/Decibel wrote: > > >> I currently have several applications that uses a legacy IPv4 protocol >> and I use IPoIB to utilize my infiniband network which works great. I >> have completed some timing and throughput analysis and noticed that I do >> not get very much more if I use an infiniband network interface than >> using my GigE network interface. >> > > You might want to note that different InfinBand implementations have > quite different performance of IPoIB, especially for UDP. > > Another issue is that IPoIB has quite different performance with > different Linux kernels. This is especially evident for TCP, although > you can use SDP to accelerate TCP sockets and avoid this issue. > > We are currently looking at the new tickless kernel. Do you have one that you recommend? Sean From twbowman at gmail.com Tue Oct 24 06:51:25 2006 From: twbowman at gmail.com (Todd Bowman) Date: Tue, 24 Oct 2006 07:51:25 -0600 Subject: [openib-general] IPoIB odd loopback packet from arp In-Reply-To: <1161676660.7290.22.camel@localhost> References: <1161676660.7290.22.camel@localhost> Message-ID: Thanks Eli. So the switch is incrementing the rcvswrelay counter when it sends the broadcast back through the original port. This doesn't seem to be correct behavior, it makes that counter unreliable. On 10/24/06, Eli Cohen wrote: > > Todd, > This does not look like an error. The first arp is a broadcast > (qpn=ffffff) so it is received in at the sending interface and is > dropped. The second on is a unicast arp (qpn=0x000404) so it is not > received at the local interface. > > > On Mon, 2006-10-23 at 13:48 -0600, Todd Bowman wrote: > > Using the OFED 1.0 and OFED 1.1 stack I have notice some rcvswrelay > > errors. I have tracked it down to the arp request. I can reproduce > > the problem with the following steps: > > > > ( I have used both 2.6.14.14 and 2.6.18.1 kernels) > > > > ib109> arp -d ib110 > > ib109> ping ib110 -c 2 > > > > # ib_ipoib module debug > > 13:15:46 ib109 kernel: ib0: sending packet, length=60 address=f6187200 > > qpn=0xffffff > > 13:15:46 ib109 kernel: ib0: called: id 34, op 0, status: 0 > > 13:15:46 ib109 kernel: ib0: send complete, wrid 34 > > 13:15:46 ib109 kernel: ib0: called: id -2147483623, op 128, status: 0 > > 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x0369 > > 13:15:46 ib109 kernel: ib0: dropping loopback packet > > 13:15:46 ib109 kernel: ib0: called: id -2147483622, op 128, status: 0 > > 13:15:46 ib109 kernel: ib0: received 100 bytes, SLID 0x016d > > 13:15:46 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 > > qpn=0x000404 > > 13:15:46 ib109 kernel: ib0: called: id 35, op 0, status: 0 > > 13:15:46 ib109 kernel: ib0: send complete, wrid 35 > > 13:15:46 ib109 kernel: ib0: called: id -2147483621, op 128, status: 0 > > 13:15:46 ib109 kernel: ib0: received 128 bytes, SLID 0x016d > > 13:15:47 ib109 kernel: ib0: sending packet, length=88 address=f6e57520 > > qpn=0x000404 > > 13:15:47 ib109 kernel: ib0: called: id 36, op 0, status: 0 > > 13:15:47 ib109 kernel: ib0: send complete, wrid 36 > > 13:15:47 ib109 kernel: ib0: called: id -2147483620, op 128, status: 0 > > 13:15:47 ib109 kernel: ib0: received 128 bytes, SLID 0x016d > > 13:15:51 ib109 kernel: ib0: called: id -2147483619, op 128, status: 0 > > 13:15:51 ib109 kernel: ib0: received 100 bytes, SLID 0x016d > > 13:15:51 ib109 kernel: ib0: sending packet, length=60 address=f6e57520 > > qpn=0x000404 > > 13:15:51 ib109 kernel: ib0: called: id 37, op 0, status: 0 > > 13:15:51 ib109 kernel: ib0: send complete, wrid 37 > > > > # tcpdump -i ib0 > > 13:15:46.977578 arp who-has ib110 tell ib109 hardware #32 > > 13:15:46.977682 arp reply ib110 is-at > > 00:00:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:11:59 hardware > > #32 > > 13:15:46.977710 IP ib109 > ib110: icmp 64: echo request seq 0 > > 13:15:46.977790 IP ib110 > ib109: icmp 64: echo reply seq 0 > > 13:15:47.977772 IP ib109 > ib110: icmp 64: echo request seq 1 > > 13:15:47.977892 IP ib110 > ib109: icmp 64: echo reply seq 1 > > 13:15:51.977076 arp who-has ib109 tell ib110 hardware #32 > > 13:15:51.977094 arp reply ib109 is-at > > 00:02:00:14:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:3b:31 hardware > > #32 > > > > # error dump > > rcvswrelayerrors:1 MT47396 Infiniscale-III 0x2c9010b022090[1] > > <--------> ib109 HCA-1 0x2c90200003b30[1] > > > > 1) The ping is successful and the arp table is populated so Is this > > really a problem or a false positive? > > 2) The second arp does not generate an error (the error dump reports > > all new errors in switches). Why? > > > > Any ideas? > > > > Thanks in advance. > > > > Todd > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at mellanox.co.il Tue Oct 24 07:05:41 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 24 Oct 2006 16:05:41 +0200 Subject: [openib-general] problems using the ipoib ha package of RH4 U3 Message-ID: <6C2C79E72C305246B504CBA17B5500C93481A7@mtlexch01.mtl.com> > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Tuesday, October 24, 2006 3:43 PM > To: Vladimir Sokolovsky > Cc: openib-general at openib.org > Subject: Re: problems using the ipoib ha package of RH4 U3 > > Vlad, > > On a related issue, i think it was mentioned at some point > that the ipoib ha package does not support devices which use > the non standard pkey, is it correct? why? > > My thinking was that if the pkey 8001 is configured for both > port 1 and > port2 of the hca, the user hacks /etc/init.d/openibd to > create ib0:8001 and ib1:8001 child devices **before** the ha > service is started and then tells the ha service to use > ib0:8001 as primary and ib1:8001 as secondary, it should > work, what do you think? > Probably you are right, but it should be tested. In any case pkey support should be added to the openibd script. > Or. > > > Regards, Vladimir From ogerlitz at voltaire.com Tue Oct 24 07:36:02 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 24 Oct 2006 16:36:02 +0200 (IST) Subject: [openib-general] [PATCH] fix two typos in ipoib HA documentation Message-ID: diff -up SOURCES/openib-1.1/src/userspace/ipoibtools-orig/README SOURCES/openib-1.1/src/userspace/ipoibtools/README --- SOURCES/openib-1.1/src/userspace/ipoibtools-orig/README 2006-10-19 16:21:08.000000000 +0200 +++ SOURCES/openib-1.1/src/userspace/ipoibtools/README 2006-10-24 17:35:29.000000000 +0200 @@ -28,7 +28,7 @@ perform the following steps: The HA service may also be activated manually, via the following command: - ipoib_ha.pl -p ${PRIMARY_IPOIB_DEV} -b ${SECONDARY_IPOIB_DEV} \ + ipoib_ha.pl -p ${PRIMARY_IPOIB_DEV} -s ${SECONDARY_IPOIB_DEV} \ --with-arping --with-multicast diff -up docs-orig/ipoib_release_notes.txt docs/ipoib_release_notes.txt --- docs-orig/ipoib_release_notes.txt 2006-10-19 11:22:25.000000000 +0200 +++ docs/ipoib_release_notes.txt 2006-10-24 17:33:34.000000000 +0200 @@ -112,7 +112,7 @@ DHCP Notes =============================================================================== -5. High Availability (HA) Servicey +5. High Availability (HA) Service =============================================================================== High Availability (HA) service for IPoIB interfaces is provided via the ipoibtools package. Ipoibtools currently includes a perl script, ipoib_ha.pl, Only in docs: ipoib_release_notes.txt.bak From vlad at mellanox.co.il Tue Oct 24 07:44:36 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 24 Oct 2006 16:44:36 +0200 Subject: [openib-general] [PATCH] fix two typos in ipoib HA documentation Message-ID: <6C2C79E72C305246B504CBA17B5500C93481DE@mtlexch01.mtl.com> Thanks, Applied. Regards, Vladimir > -----Original Message----- > From: Or Gerlitz [mailto:ogerlitz at voltaire.com] > Sent: Tuesday, October 24, 2006 4:36 PM > To: Vladimir Sokolovsky > Cc: openib-general at openib.org > Subject: [PATCH] fix two typos in ipoib HA documentation > > diff -up > SOURCES/openib-1.1/src/userspace/ipoibtools-orig/README > SOURCES/openib-1.1/src/userspace/ipoibtools/README > --- SOURCES/openib-1.1/src/userspace/ipoibtools-orig/README > 2006-10-19 16:21:08.000000000 +0200 > +++ SOURCES/openib-1.1/src/userspace/ipoibtools/README > 2006-10-24 17:35:29.000000000 +0200 > @@ -28,7 +28,7 @@ perform the following steps: > > The HA service may also be activated manually, via the > following command: > > - ipoib_ha.pl -p ${PRIMARY_IPOIB_DEV} -b ${SECONDARY_IPOIB_DEV} \ > + ipoib_ha.pl -p ${PRIMARY_IPOIB_DEV} -s ${SECONDARY_IPOIB_DEV} \ > --with-arping --with-multicast > > > diff -up docs-orig/ipoib_release_notes.txt > docs/ipoib_release_notes.txt > --- docs-orig/ipoib_release_notes.txt 2006-10-19 > 11:22:25.000000000 +0200 > +++ docs/ipoib_release_notes.txt 2006-10-24 > 17:33:34.000000000 +0200 > @@ -112,7 +112,7 @@ DHCP Notes > > > > ============================================================== > ================= > -5. High Availability (HA) Servicey > +5. High Availability (HA) Service > > ============================================================== > ================= > High Availability (HA) service for IPoIB interfaces is > provided via the ipoibtools package. Ipoibtools currently > includes a perl script, ipoib_ha.pl, Only in docs: > ipoib_release_notes.txt.bak > > > From johnip at sgi.com Tue Oct 24 08:50:07 2006 From: johnip at sgi.com (John Partridge) Date: Tue, 24 Oct 2006 10:50:07 -0500 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> <45394C86.2030708@sgi.com> Message-ID: <453E362F.7040800@sgi.com> Roland Dreier wrote: > I thought I had everything I needed -- but actually could you send a > new copy of the patch along with a Signed-off-by: line? OK here it is. A file copy is attached too in case my mailer messed up the format :- mthca_reset.c: Fix a non-blocking Config Write that needs flushing in mthca_reset() Signed-off-by: John Partridge johnip at sgi.com --- --- openib-1.1-buildable-ORIG/drivers/infiniband/hw/mthca/mthca_reset.c 2006-09-20 07:19:24.000000000 -0500 +++ openib-1.1/drivers/infiniband/hw/mthca/mthca_reset.c 2006-10-19 13:55:04.292275707 -0500 @@ -281,6 +281,17 @@ goto out; } + /* + * Perform a "flush" of the pci_write_config_dword() for PCI_COMMAND. + * The PCI_COMMAND to the HCA must complete before we exit mthca_reset() + * or any PIO Memory Reads via the BAR will fail at this point. + */ + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { + err = -ENODEV; + mthca_err(mdev, "Couldn't access HCA memory after restoring, " + "aborting.\n"); + } + out: if (bridge) pci_dev_put(bridge); Many Thanks for your help with this. Regards John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pv954262.patch URL: From kliteyn at dev.mellanox.co.il Tue Oct 24 08:57:35 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 24 Oct 2006 17:57:35 +0200 Subject: [openib-general] [PATCH] osm: Trivial fix in osmtest Message-ID: <453E37EF.8070900@dev.mellanox.co.il> Fixing signed/unsigned data types problem (discovered on Windows) -- Yevgeny Signed-off-by: Yevgeny Kliteynik Index: osmtest/osmt_multicast.c =================================================================== --- osmtest/osmt_multicast.c (revision 9948) +++ osmtest/osmt_multicast.c (working copy) @@ -61,7 +61,7 @@ static void __osmt_print_all_multicast_records( IN osmtest_t * const p_osmt ) { - int i; + uint32_t i; ib_api_status_t status; osmv_query_req_t req; osmv_user_query_t user; From krause at cup.hp.com Tue Oct 24 10:51:39 2006 From: krause at cup.hp.com (Michael Krause) Date: Tue, 24 Oct 2006 10:51:39 -0700 Subject: [openib-general] IPoIB Question In-Reply-To: <20061024050023.GA1841@greglaptop.hsd1.ca.comcast.net> References: <20061024050023.GA1841@greglaptop.hsd1.ca.comcast.net> Message-ID: <6.2.0.14.2.20061024104729.02a02f58@esmail.cup.hp.com> At 10:00 PM 10/23/2006, Greg Lindahl wrote: >On Mon, Oct 23, 2006 at 07:53:06AM -0500, Hubbell, Sean C >Contractor/Decibel wrote: > > > I currently have several applications that uses a legacy IPv4 protocol > > and I use IPoIB to utilize my infiniband network which works great. I > > have completed some timing and throughput analysis and noticed that I do > > not get very much more if I use an infiniband network interface than > > using my GigE network interface. > >You might want to note that different InfinBand implementations have >quite different performance of IPoIB, especially for UDP. > >Another issue is that IPoIB has quite different performance with >different Linux kernels. This is especially evident for TCP, although >you can use SDP to accelerate TCP sockets and avoid this issue. > > > My question is, am I using IPoIB correctly or are these the typical > > numbers that everyone is seeing? > >It is certainly the case that there are some message patterns and >situations for which InfiniBand is not much of an improvement over >gigE. Unfortunately, the comparison of IB to GbE are often apple-to-orange comparisons even for IP over IB. Until a HCA supplies the same level of functional off-load enabled by the IP network stack that is used with Ethernet, it really isn't a fair comparison. The same is also true for many of the marketroids and their comparisons of IB to Ethernet based solutions. Fortunately, most customers are getting a bit smarter and not falling for the marketing drivel these days - certainly the OEM don't fall for it thought the marketroids continue to come in and try to convince people it isn't an apple-to-orange comparison. The fact is both technologies have their pros / cons and it is really the workload or production environment that determines which is the best fit instead of the force fit. In any case, not really a development issue so will drop further discussion. Mike From parks at lanl.gov Tue Oct 24 11:17:09 2006 From: parks at lanl.gov (Parks Fields) Date: Tue, 24 Oct 2006 12:17:09 -0600 Subject: [openib-general] Kernel for IPoIB Question In-Reply-To: <453CF505.8050906@dbresearch.net> References: <453CEB85.7090102@dbresearch.net> <20061023163844.GB9248@mellanox.co.il> <453CF505.8050906@dbresearch.net> Message-ID: <7.0.1.0.2.20061024095701.0259b4f0@lanl.gov> Sincethe to[pic of kernel affecting the performance, which kernel is now giving the best performance. We are trying to choose the next kernel for our clusters and wanted to choose one that does well with IPoIB. I know 2.6.16 was not doing well, even with 10G myrinet cards. What is recommended 2.6. 17,18,19,20 ??????? thanks ***** Correspondence ***** This email contains no programmatic content that requires independent ADC review From rdreier at cisco.com Tue Oct 24 11:22:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 24 Oct 2006 11:22:33 -0700 Subject: [openib-general] Race in mthca_cmd_post() In-Reply-To: <453E362F.7040800@sgi.com> (John Partridge's message of "Tue, 24 Oct 2006 10:50:07 -0500") References: <200610141747.k9EHl9GV009351@cmf.nrl.navy.mil> <4537D68C.4040409@sgi.com> <20061020182445.GA4054@obsidianresearch.com> <45394C86.2030708@sgi.com> <453E362F.7040800@sgi.com> Message-ID: Thanks -- just for the future: > Signed-off-by: John Partridge johnip at sgi.com The email address should really be in angle brackets like: Signed-off-by: John Partridge No need to resend this time, but please fix it next time. thanks, Roland From rdreier at cisco.com Tue Oct 24 12:13:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 24 Oct 2006 12:13:19 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? Message-ID: John Partridge found an interesting bug involving mthca (Mellanox InfiniBand HCA driver) on IA64/Altix systems. Basically, during initialization, mthca does: - do some config writes, including enabling BARs - then start a firmware command - read an MMIO register from a BAR (to check if FW is busy) However, John found that the Altix PCI-X bridge was allowing the MMIO read to start before the config write was done (which is allowed by the PCI spec). The PCI trace looked like: 23454: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 23462: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 23470: Split compl. Lower A = 00 Req = (0,0,0) Tag = 0 Comp = (0,2,0) WAIT = 1 (Error completion) 23476: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE) and that "Error completion" leads to a crash. John proposed the following patch to fix this, which looks good to me. However, I have a couple of questions about this situation: 1) Is this something that should be fixed in the driver? The PCI spec allows MMIO cycles to start before an earlier config cycle completed, but do we want to expose this fact to drivers? Would it be better for ia64 to use some sort of barrier to make sure pci_write_config_xxx() is strongly ordered with MMIO? 2) Is this issue lurking in other drivers? Thanks, Roland commit 424b50b6360b325ce642ece687756a600c25d28a Author: John Partridge Date: Tue Oct 24 11:54:16 2006 -0700 IB/mthca: Make sure all PCI config writes reach device before doing MMIO During initialization, mthca writes some PCI config space registers and then does an MMIO read from one of the BARs it just enabled. This MMIO read sometimes failed and caused a crash on SGI Altix machines, because the PCI-X host bridge (legitimately, according to the PCI spec) allowed the MMIO read to start before the config write completed. To fix this, add a config read after all config writes to make sure they are all done before starting the MMIO read. Signed-off-by: John Partridge Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c index 91934f2..578dc7c 100644 --- a/drivers/infiniband/hw/mthca/mthca_reset.c +++ b/drivers/infiniband/hw/mthca/mthca_reset.c @@ -281,6 +281,20 @@ good: goto out; } + /* + * Perform a "flush" of the PCI config writes here by reading + * the PCI_COMMAND register. This is needed to make sure that + * we don't try to touch other PCI BARs before the config + * writes are done -- otherwise an MMIO cycle could start + * before the config writes are done and reach the HCA before + * the BAR is actually enabled. + */ + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { + err = -ENODEV; + mthca_err(mdev, "Couldn't access HCA memory after restoring, " + "aborting.\n"); + } + out: if (bridge) pci_dev_put(bridge); From jeff at garzik.org Tue Oct 24 12:22:10 2006 From: jeff at garzik.org (Jeff Garzik) Date: Tue, 24 Oct 2006 15:22:10 -0400 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: Message-ID: <20061024192210.GE2043@havoc.gtf.org> On Tue, Oct 24, 2006 at 12:13:19PM -0700, Roland Dreier wrote: > 1) Is this something that should be fixed in the driver? The PCI > spec allows MMIO cycles to start before an earlier config cycle > completed, but do we want to expose this fact to drivers? Would > it be better for ia64 to use some sort of barrier to make sure > pci_write_config_xxx() is strongly ordered with MMIO? The PCI config APIs have traditionally enforced very strong ordering. Heck, the PCI config APIs often take a spinlock on each read or write; so they are definitely not intended to be as fast as MMIO. Jeff From mst at mellanox.co.il Tue Oct 24 12:24:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 24 Oct 2006 21:24:12 +0200 Subject: [openib-general] Kernel for IPoIB Question In-Reply-To: <7.0.1.0.2.20061024095701.0259b4f0@lanl.gov> References: <453CEB85.7090102@dbresearch.net> <20061023163844.GB9248@mellanox.co.il> <453CF505.8050906@dbresearch.net> <7.0.1.0.2.20061024095701.0259b4f0@lanl.gov> Message-ID: <20061024192412.GB11328@mellanox.co.il> Quoting r. Parks Fields : > Subject: Kernel for IPoIB Question > > > Sincethe to[pic of kernel affecting the performance, which kernel is > now giving the best performance. We are trying to choose the next > kernel for our clusters and wanted to choose one that does well with > IPoIB. I know 2.6.16 was not doing well, even with 10G myrinet cards. > > What is recommended 2.6. 17,18,19,20 ??????? > > thanks I don't think I saw anyone report speed problems in IPoIB in 2.6.16 - and with TCP performance really depends on the setup/software, so why don't you just install several kernels and try them all out? I recall some people saw performance degradation on some network topologies after Linux fixed the ack stretch violation in 2.6.11 or so. If you go 2.6.18.1, you have the added bonus of upstream IPoIB code basically identical to that of OFED 1.1. -- MST From greg.lindahl at qlogic.com Tue Oct 24 13:15:58 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Tue, 24 Oct 2006 13:15:58 -0700 Subject: [openib-general] IPoIB Question In-Reply-To: <453E1696.9090708@dbresearch.net> References: <20061024050023.GA1841@greglaptop.hsd1.ca.comcast.net> <453E1696.9090708@dbresearch.net> Message-ID: <20061024201558.GC1730@greglaptop.internal.keyresearch.com> On Tue, Oct 24, 2006 at 08:35:18AM -0500, Sean Hubbell wrote: > We are currently looking at the new tickless kernel. Do you have one > that you recommend? The main one to less-recommend is 2.6.9-based kernels, those are the slowest at TCP. Modern kernels, like the ones you see in Fedora 4 and up and SLES 10, seem to all be good and about equal in this area. I don't think we've tried a tickless kernel. We do most of our testing on the various kernels that ship with distros, plus the tip-of-tree kernel.org kernel. -- greg From sean.hefty at intel.com Tue Oct 24 13:22:28 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 13:22:28 -0700 Subject: [openib-general] [PATCH] for 2.6.19 RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count Message-ID: <000001c6f7aa$20aac000$6ccc180a@amr.corp.intel.com> From: Krishna Kumar rdma_bind_addr() leaks a cma_dev reference count in failure case. Signed-off-by: Krishna Kumar Signed-off-by: Sean Hefty --- Modified from Krishna's patch to drop use of did_acquire_dev flag. Because this bug is in error handling only, I don't believe that anyone is hitting it in practice. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9ae4f3a..d8ca3c1 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1762,22 +1762,29 @@ int rdma_bind_addr(struct rdma_cm_id *id if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); - if (!ret) { - mutex_lock(&lock); - ret = cma_acquire_dev(id_priv); - mutex_unlock(&lock); - } if (ret) - goto err; + goto err1; + + mutex_lock(&lock); + ret = cma_acquire_dev(id_priv); + mutex_unlock(&lock); + if (ret) + goto err1; } memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); ret = cma_get_port(id_priv); if (ret) - goto err; + goto err2; return 0; -err: +err2: + if (!cma_any_addr(addr)) { + mutex_lock(&lock); + cma_detach_from_dev(id_priv); + mutex_unlock(&lock); + } +err1: cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); return ret; } From sweitzen at cisco.com Tue Oct 24 13:23:07 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 24 Oct 2006 13:23:07 -0700 Subject: [openib-general] IPoIB Question Message-ID: We see 3.6 Gb/sec with IPoIB using RHEL4U4 2.6.9-42 x86_64 kernel on Dell PE1950 Woodcrest systems. In my testing, faster hardware is more important than newer kernels, but I don't try newer kernels much. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Greg Lindahl > Sent: Tuesday, October 24, 2006 1:16 PM > To: Sean Hubbell > Cc: openib-general at openib.org > Subject: Re: [openib-general] IPoIB Question > > On Tue, Oct 24, 2006 at 08:35:18AM -0500, Sean Hubbell wrote: > > > We are currently looking at the new tickless kernel. Do you > have one > > that you recommend? > > The main one to less-recommend is 2.6.9-based kernels, those are the > slowest at TCP. Modern kernels, like the ones you see in Fedora 4 and > up and SLES 10, seem to all be good and about equal in this area. > > I don't think we've tried a tickless kernel. We do most of our testing > on the various kernels that ship with distros, plus the tip-of-tree > kernel.org kernel. > > -- greg > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Tue Oct 24 13:35:27 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 13:35:27 -0700 Subject: [openib-general] [PATCH] for 2.6.20 rdma/cm: remove setting local write as part of QP access flags Message-ID: <000101c6f7ab$f0b37980$6ccc180a@amr.corp.intel.com> From: Dotan Barak The qp_access_flags are for remote access permissions only, so local write is an invalid value. Signed-off-by: Dotan Barak Signed-off-by: Sean Hefty --- Current drivers appear to ignore this flag being set when the QP is modified to INIT, but for correctness, it should not be there. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 25b1018..1cf0d42 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3173,8 +3173,7 @@ static int cm_init_qp_init_attr(struct c case IB_CM_ESTABLISHED: *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT; - qp_attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE | - IB_ACCESS_REMOTE_WRITE; + qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE; if (cm_id_priv->responder_resources) qp_attr->qp_access_flags |= IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_ATOMIC; diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d8ca3c1..2b4748e 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -343,7 +343,7 @@ static int cma_init_ib_qp(struct rdma_id return ret; qp_attr.qp_state = IB_QPS_INIT; - qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + qp_attr.qp_access_flags = 0; qp_attr.port_num = id_priv->id.port_num; return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT); From bugzilla-daemon at openib.org Tue Oct 24 13:47:56 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 24 Oct 2006 13:47:56 -0700 (PDT) Subject: [openib-general] [Bug 286] New: "ifconfig ib# down" hangs telnet connection-- NETDEV WATCHDOG: ib0: transmit timed out Message-ID: <20061024204756.A35622283D4@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=286 Summary: "ifconfig ib# down" hangs telnet connection-- NETDEV WATCHDOG: ib0: transmit timed out Product: OpenFabrics Linux Version: 1.0rc6 Platform: X86 OS/Version: SLES 9 Status: NEW Severity: major Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: amir.vetry at sun.com CC: David.Brean at Sun.COM, amir.vetry at sun.com "ifconfig ib# down" command hangs ethernet telnet connection, and even if any other subsequence telnet connection opened to the system and execute "ifconfig -a" will hang the Galaxy 4F system. Moreover, "Control ^C" will not retrieve telnet connection. System does not allow to kill any ifconfig process either. Note: This system telnet connection is via ethernet port (not IB). Easiest to reproduce this issue is when various IPoIB traffic is ran across a IB-HCA PCI-E ports (from Mellanox) in Galaxy4F (Sun x4600), the IB link drops and the following error messages are seen in /var/log/messages. "NETDEV WATCHDOG: ib0: transmit timed out" When IB links dropped, no traffic can pass through the IB ports, and all the IPoIB traffics stop and the ping also fails. After the above has been experienced, type "ifconfig ib# down", this should hang ethernet telnet connection, even it hangs console connection. System information ============= - Galaxy 4 F (sun x4600) - IB-HCA PCI-E (Mellanox) - OFED-1.0.1 (or 1.0) driver - OS: Suse 9- U3 (or Redhat4-u3) - Linux 2.6.5-7.244-smp #1 SMP x86_64 x86_64 x86_64 GNU/Linux IB-HCA PCI-E information ======================== - fw_ver: 4.6.2 - vendor_id: 0x02c9 - vendor_part_id: 25208 - hw_ver: 0xA0 Some type of IB switch: (e.g. Sun Sleipner switch (or Topspin 360 switch) =========================== - 9 Port IB - Bootable Image: 2.1.2 (Apr 28 06) OR - Topspin 360 switch FW: 2.8 (52) - 12 IB ports Topology ======== G4F(ib#)-----(ib#)IBswitch(ib#)----TrafficGenerator Steps to reproduce ================== 1. Telnet to a Galaxy4f system (via ethernet port) 2. Run two (or more) stream IPoIB traffic simultaneously (e.g. sync_netperf, NFS-corrupt) 3. While traffic is running, monitor messages log 4. Look for the error messages like: NETDEV WATCHDOG: ib0: transmit timed out 5. Once this message (above message) is observed, the IB link should go down 6. Do "ifconfig -a" to view which ib port is available 7. Do "ifconfig down" This should hang connection!! # ifconfig -a ib4 Link encap:UNSPEC HWaddr 00-00-04-04-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:192.9.11.176 Bcast:192.9.11.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:20:2f2d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:3229 errors:0 dropped:0 overruns:0 frame:0 TX packets:29 errors:0 dropped:3060 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:184632 (180.3 Kb) TX bytes:3000 (2.9 Kb) # ifconfig ib4 down up System will hang here !!! =============================================== #cat /etc/*release SUSE LINUX Enterprise Server 9 (x86_64) VERSION = 9 PATCHLEVEL = 3 LSB_VERSION="core-2.0-noarch:core-3.0-noarch:core-2.0-x86_64:core-3.0-x86_64" nspgqa176b:~ # hca_id: mthca2 fw_ver: 4.6.2 node_guid: 0002:c902:0020:2f2c sys_image_guid: 0002:c902:0020:2f2f vendor_id: 0x02c9 vendor_part_id: 25208 hw_ver: 0xA0 board_id: MT_00B0000001 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 3 port_lmc: 0x00 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 4 port_lmc: 0x00 nspgqa176b:~ # nspgqa176b:~ # nspgqa176b:~ # ps -ef | grep ifconfig root 27495 1 0 10:57 ? 00:00:00 ifconfig ib4 down up root 27497 1 0 10:57 ? 00:00:00 /sbin/ifconfig -a root 27634 1 0 11:26 ? 00:00:00 ifconfig -a root 27711 1 0 11:40 ? 00:00:00 ifconfig -a root 27754 1 0 11:44 pts/4 00:00:00 ifconfig -a root 27781 1 0 11:56 pts/3 00:00:00 ifconfig -a root 27787 27638 0 11:57 pts/7 00:00:00 grep ifconfig nspgqa176b:~ # nspgqa176b:~ # kill 27495 27497 27634 27711 27754 27781 27787 27638 -bash: kill: (27787) - No such process nspgqa176b:~ # nspgqa176b:~ # ps -ef | grep ifconfig root 27495 1 0 10:57 ? 00:00:00 ifconfig ib4 down up root 27497 1 0 10:57 ? 00:00:00 /sbin/ifconfig -a root 27634 1 0 11:26 ? 00:00:00 ifconfig -a root 27711 1 0 11:40 ? 00:00:00 ifconfig -a root 27754 1 0 11:44 pts/4 00:00:00 ifconfig -a root 27781 1 0 11:56 pts/3 00:00:00 ifconfig -a root 27789 27638 0 11:58 pts/7 00:00:00 grep ifconfig nspgqa176b:~ # nspgqa176b:~ # pkill -9 27495 nspgqa176b:~ # ps -ef | grep ifconfig root 27495 1 0 10:57 ? 00:00:00 ifconfig ib4 down up root 27497 1 0 10:57 ? 00:00:00 /sbin/ifconfig -a root 27634 1 0 11:26 ? 00:00:00 ifconfig -a root 27711 1 0 11:40 ? 00:00:00 ifconfig -a root 27754 1 0 11:44 pts/4 00:00:00 ifconfig -a root 27781 1 0 11:56 pts/3 00:00:00 ifconfig -a root 27792 27638 0 11:58 pts/7 00:00:00 grep ifconfig nspgqa176b:~ # ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Tue Oct 24 13:51:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 24 Oct 2006 13:51:32 -0700 Subject: [openib-general] [PATCH] for 2.6.20 rdma/cm: remove setting local write as part of QP access flags In-Reply-To: <000101c6f7ab$f0b37980$6ccc180a@amr.corp.intel.com> (Sean Hefty's message of "Tue, 24 Oct 2006 13:35:27 -0700") References: <000101c6f7ab$f0b37980$6ccc180a@amr.corp.intel.com> Message-ID: Thanks, queued for 2.6.20 From rdreier at cisco.com Tue Oct 24 13:51:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 24 Oct 2006 13:51:38 -0700 Subject: [openib-general] [PATCH] for 2.6.19 RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count In-Reply-To: <000001c6f7aa$20aac000$6ccc180a@amr.corp.intel.com> (Sean Hefty's message of "Tue, 24 Oct 2006 13:22:28 -0700") References: <000001c6f7aa$20aac000$6ccc180a@amr.corp.intel.com> Message-ID: Thanks, queued for 2.6.19 From jwm at systemfabricworks.com Tue Oct 24 14:01:25 2006 From: jwm at systemfabricworks.com (JWM) Date: Tue, 24 Oct 2006 16:01:25 -0500 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? References: Message-ID: <00df01c6f7af$91af9aa0$7401a8c0@Maelstrom> On IA64 there are two memory barriers mf and mf.a. To protect against the scenario below mf.a (slower of course) would be required. ....JW ----- Original Message ----- From: "Roland Dreier" To: ; Cc: ; Sent: Tuesday, October 24, 2006 2:13 PM Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? > John Partridge found an interesting bug involving mthca (Mellanox > InfiniBand HCA driver) on IA64/Altix systems. Basically, during > initialization, mthca does: > > - do some config writes, including enabling BARs > - then start a firmware command > - read an MMIO register from a BAR (to check if FW is busy) > > However, John found that the Altix PCI-X bridge was allowing the MMIO > read to start before the config write was done (which is allowed by > the PCI spec). The PCI trace looked like: > > 23454: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) > Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 > 23462: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 > WAIT = 2 > 23470: Split compl. Lower A = 00 Req = (0,0,0) Tag = 0 Comp = > (0,2,0) WAIT = 1 (Error completion) > 23476: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = > (0,2,0) WAIT = 1 (Normal completion of WRITE) > > and that "Error completion" leads to a crash. > > John proposed the following patch to fix this, which looks good to > me. However, I have a couple of questions about this situation: > > 1) Is this something that should be fixed in the driver? The PCI > spec allows MMIO cycles to start before an earlier config cycle > completed, but do we want to expose this fact to drivers? Would > it be better for ia64 to use some sort of barrier to make sure > pci_write_config_xxx() is strongly ordered with MMIO? > > 2) Is this issue lurking in other drivers? > > Thanks, > Roland > > commit 424b50b6360b325ce642ece687756a600c25d28a > Author: John Partridge > Date: Tue Oct 24 11:54:16 2006 -0700 > > IB/mthca: Make sure all PCI config writes reach device before doing > MMIO > > During initialization, mthca writes some PCI config space registers > and then does an MMIO read from one of the BARs it just enabled. This > MMIO read sometimes failed and caused a crash on SGI Altix machines, > because the PCI-X host bridge (legitimately, according to the PCI > spec) allowed the MMIO read to start before the config write completed. > > To fix this, add a config read after all config writes to make sure > they are all done before starting the MMIO read. > > Signed-off-by: John Partridge > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c > b/drivers/infiniband/hw/mthca/mthca_reset.c > index 91934f2..578dc7c 100644 > --- a/drivers/infiniband/hw/mthca/mthca_reset.c > +++ b/drivers/infiniband/hw/mthca/mthca_reset.c > @@ -281,6 +281,20 @@ good: > goto out; > } > > + /* > + * Perform a "flush" of the PCI config writes here by reading > + * the PCI_COMMAND register. This is needed to make sure that > + * we don't try to touch other PCI BARs before the config > + * writes are done -- otherwise an MMIO cycle could start > + * before the config writes are done and reach the HCA before > + * the BAR is actually enabled. > + */ > + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { > + err = -ENODEV; > + mthca_err(mdev, "Couldn't access HCA memory after restoring, " > + "aborting.\n"); > + } > + > out: > if (bridge) > pci_dev_put(bridge); > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From alan at lxorguk.ukuu.org.uk Tue Oct 24 14:24:23 2006 From: alan at lxorguk.ukuu.org.uk (Alan Cox) Date: Tue, 24 Oct 2006 22:24:23 +0100 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: Message-ID: <1161725063.22348.39.camel@localhost.localdomain> Ar Maw, 2006-10-24 am 12:13 -0700, ysgrifennodd Roland Dreier: > 1) Is this something that should be fixed in the driver? The PCI > spec allows MMIO cycles to start before an earlier config cycle > completed, but do we want to expose this fact to drivers? Would > it be better for ia64 to use some sort of barrier to make sure > pci_write_config_xxx() is strongly ordered with MMIO? It is good to be conservative in this area. Some AMD chipsets at least had ordering problems with some configurations in the K7 era. From rdreier at cisco.com Tue Oct 24 14:29:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 24 Oct 2006 14:29:47 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <1161725063.22348.39.camel@localhost.localdomain> (Alan Cox's message of "Tue, 24 Oct 2006 22:24:23 +0100") References: <1161725063.22348.39.camel@localhost.localdomain> Message-ID: > It is good to be conservative in this area. Some AMD chipsets at least > had ordering problems with some configurations in the K7 era. Could you expand a little? Do you mean that the arch implementation of pci_write_config_xxx() should have extra barriers, or that drivers should do belt-and-suspenders flushes to make sure config writes are really done properly? - R. From jeff at garzik.org Tue Oct 24 14:37:44 2006 From: jeff at garzik.org (Jeff Garzik) Date: Tue, 24 Oct 2006 17:37:44 -0400 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: <1161725063.22348.39.camel@localhost.localdomain> Message-ID: <20061024213744.GH2043@havoc.gtf.org> On Tue, Oct 24, 2006 at 02:29:47PM -0700, Roland Dreier wrote: > > It is good to be conservative in this area. Some AMD chipsets at least > > had ordering problems with some configurations in the K7 era. > > Could you expand a little? Do you mean that the arch implementation > of pci_write_config_xxx() should have extra barriers, or that drivers > should do belt-and-suspenders flushes to make sure config writes are > really done properly? Drivers are -already- written to assume the pci_write_config_xxx() has the requisite barriers. The fix doesn't belong in the drivers. Jeff From matthew at wil.cx Tue Oct 24 14:47:24 2006 From: matthew at wil.cx (Matthew Wilcox) Date: Tue, 24 Oct 2006 15:47:24 -0600 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024192210.GE2043@havoc.gtf.org> References: <20061024192210.GE2043@havoc.gtf.org> Message-ID: <20061024214724.GS25210@parisc-linux.org> On Tue, Oct 24, 2006 at 03:22:10PM -0400, Jeff Garzik wrote: > The PCI config APIs have traditionally enforced very strong ordering. > Heck, the PCI config APIs often take a spinlock on each read or write; > so they are definitely not intended to be as fast as MMIO. s/often/always/. It's implemented in drivers/pci/access.c. I think the right way to fix this is to ensure mmio write ordering in the pci_write_config_*() implementations. Like this. Signed-off-by: Matthew Wilcox diff --git a/drivers/pci/access.c b/drivers/pci/access.c index ea16805..c80f1ba 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -1,6 +1,6 @@ #include #include -#include +#include #include "pci.h" @@ -45,6 +45,7 @@ int pci_bus_write_config_##size \ if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ spin_lock_irqsave(&pci_lock, flags); \ res = bus->ops->write(bus, devfn, pos, len, value); \ + mmiowb(); \ spin_unlock_irqrestore(&pci_lock, flags); \ return res; \ } @@ -102,6 +103,7 @@ int pci_user_write_config_##size \ if (likely(!dev->block_ucfg_access)) \ ret = dev->bus->ops->write(dev->bus, dev->devfn, \ pos, sizeof(type), val); \ + mmiowb(); \ spin_unlock_irqrestore(&pci_lock, flags); \ return ret; \ } From rdreier at cisco.com Tue Oct 24 14:51:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 24 Oct 2006 14:51:30 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024214724.GS25210@parisc-linux.org> (Matthew Wilcox's message of "Tue, 24 Oct 2006 15:47:24 -0600") References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> Message-ID: > I think the right way to fix this is to ensure mmio write ordering in > the pci_write_config_*() implementations. Like this. I'm happy to fix this in the PCI core and not force drivers to worry about this. John, can you confirm that this patch fixes the issue for you? Thanks, Roland From johnip at sgi.com Tue Oct 24 15:12:07 2006 From: johnip at sgi.com (John Partridge) Date: Tue, 24 Oct 2006 17:12:07 -0500 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> Message-ID: <453E8FB7.9090001@sgi.com> Roland Dreier wrote: > > I think the right way to fix this is to ensure mmio write ordering in > > the pci_write_config_*() implementations. Like this. > > I'm happy to fix this in the PCI core and not force drivers to worry > about this. > > John, can you confirm that this patch fixes the issue for you? > > Thanks, > Roland I'll give it a try and get back to you. John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From sean.hefty at intel.com Tue Oct 24 15:25:48 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:25:48 -0700 Subject: [openib-general] [PATCH 0/7 v2] for 2.6.20 rdma/cma: add userspace support Message-ID: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> The following set of patches expand the rdma_cm support to include UD and multicast, and expose the rdma_cm to userspace. I would like to target the 2.6.20 kernel, but at least getting them into one or more branches would be helpful for other developers to test against these changes. As mentioned in the RFC, the patches borrow heavily from the code checked into openfabrics svn, but there are some notable differences. The main difference from the patches submitted for the RFC is the integration of the ib_multicast module with the ib_sa module. The two modules are loosely coupled, with minimal changes made to the existing sa_query code. Signed-off-by: Sean Hefty From sean.hefty at intel.com Tue Oct 24 15:35:47 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:35:47 -0700 Subject: [openib-general] [PATCH 1/7 v2] for 2.6.20 ib/ib_sa: add tracking of multicast join / leave requests In-Reply-To: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> Message-ID: <000101c6f7bc$c0a46770$a6d4180a@amr.corp.intel.com> The IB SA tracks multicast join / leave requests on a per port basis. In order to support multiple users of the same multicast group from the same port, we need to perform local reference counting on each of the nodes. Modify the ib_sa module to perform reference counting of multicast join / leave requests. Add new interfaces to track join requests and allow matching leave requests with the corresponding join. Modify ib_ipoib to use the multicast interfaces. Signed-off-by: Sean Hefty --- Changes from v1: * Merged multicast handling into ib_sa module. * Add support for non-equal MTU, rate, and packet lifetime selectors. * Updated documentation. * Removed retry handling, and changed timeout value from a module parameter to a constant. diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 163d991..8873b63 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -10,7 +10,7 @@ ib_core-y := packer.o ud_header.o verb ib_mad-y := mad.o smi.o agent.o mad_rmpp.o -ib_sa-y := sa_query.o +ib_sa-y := sa_query.o multicast.o ib_cm-y := cm.o diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c new file mode 100644 index 0000000..88a9edf --- /dev/null +++ b/drivers/infiniband/core/multicast.c @@ -0,0 +1,842 @@ +/* + * Copyright (c) 2006 Intel Corporation.  All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include "sa.h" + +static void mcast_add_one(struct ib_device *device); +static void mcast_remove_one(struct ib_device *device); + +static struct ib_client mcast_client = { + .name = "ib_multicast", + .add = mcast_add_one, + .remove = mcast_remove_one +}; + +static struct ib_sa_client sa_client; +static struct ib_event_handler event_handler; +static struct workqueue_struct *mcast_wq; +static union ib_gid mgid0; + +struct mcast_device; + +struct mcast_port { + struct mcast_device *dev; + spinlock_t lock; + struct rb_root table; + atomic_t refcount; + struct completion comp; + u8 port_num; +}; + +struct mcast_device { + struct ib_device *device; + int start_port; + int end_port; + struct mcast_port port[0]; +}; + +enum mcast_state { + MCAST_IDLE, + MCAST_JOINING, + MCAST_MEMBER, + MCAST_BUSY, + MCAST_ERROR +}; + +struct mcast_member; + +struct mcast_group { + struct ib_sa_mcmember_rec rec; + struct rb_node node; + struct mcast_port *port; + spinlock_t lock; + struct work_struct work; + struct list_head pending_list; + struct list_head active_list; + struct mcast_member *last_join; + int members[3]; + atomic_t refcount; + enum mcast_state state; + struct ib_sa_query *query; + int query_id; +}; + +struct mcast_member { + struct ib_sa_multicast multicast; + struct ib_sa_client *client; + struct mcast_group *group; + struct list_head list; + enum mcast_state state; + atomic_t refcount; + struct completion comp; +}; + +static void join_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context); +static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context); + +static struct mcast_group *mcast_find(struct mcast_port *port, + union ib_gid *mgid) +{ + struct rb_node *node = port->table.rb_node; + struct mcast_group *group; + int ret; + + while (node) { + group = rb_entry(node, struct mcast_group, node); + ret = memcmp(mgid->raw, group->rec.mgid.raw, sizeof *mgid); + if (!ret) + return group; + + if (ret < 0) + node = node->rb_left; + else + node = node->rb_right; + } + return NULL; +} + +static struct mcast_group *mcast_insert(struct mcast_port *port, + struct mcast_group *group, + int allow_duplicates) +{ + struct rb_node **link = &port->table.rb_node; + struct rb_node *parent = NULL; + struct mcast_group *cur_group; + int ret; + + while (*link) { + parent = *link; + cur_group = rb_entry(parent, struct mcast_group, node); + + ret = memcmp(group->rec.mgid.raw, cur_group->rec.mgid.raw, + sizeof group->rec.mgid); + if (ret < 0) + link = &(*link)->rb_left; + else if (ret > 0) + link = &(*link)->rb_right; + else if (allow_duplicates) + link = &(*link)->rb_left; + else + return cur_group; + } + rb_link_node(&group->node, parent, link); + rb_insert_color(&group->node, &port->table); + return NULL; +} + +static void deref_port(struct mcast_port *port) +{ + if (atomic_dec_and_test(&port->refcount)) + complete(&port->comp); +} + +static void release_group(struct mcast_group *group) +{ + struct mcast_port *port = group->port; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + if (atomic_dec_and_test(&group->refcount)) { + rb_erase(&group->node, &port->table); + spin_unlock_irqrestore(&port->lock, flags); + kfree(group); + deref_port(port); + } else + spin_unlock_irqrestore(&port->lock, flags); +} + +static void deref_member(struct mcast_member *member) +{ + if (atomic_dec_and_test(&member->refcount)) + complete(&member->comp); +} + +static void queue_join(struct mcast_member *member) +{ + struct mcast_group *group = member->group; + unsigned long flags; + + spin_lock_irqsave(&group->lock, flags); + list_add(&member->list, &group->pending_list); + if (group->state == MCAST_IDLE) { + group->state = MCAST_BUSY; + atomic_inc(&group->refcount); + queue_work(mcast_wq, &group->work); + } + spin_unlock_irqrestore(&group->lock, flags); +} + +/* + * A multicast group has three types of members: full member, non member, and + * send only member. We need to keep track of the number of members of each + * type based on their join state. Adjust the number of members the belong to + * the specified join states. + */ +static void adjust_membership(struct mcast_group *group, u8 join_state, int inc) +{ + int i; + + for (i = 0; i < 3; i++, join_state >>= 1) + if (join_state & 0x1) + group->members[i] += inc; +} + +/* + * If a multicast group has zero members left for a particular join state, but + * the group is still a member with the SA, we need to leave that join state. + * Determine which join states we still belong to, but that do not have any + * active members. + */ +static u8 get_leave_state(struct mcast_group *group) +{ + u8 leave_state = 0; + int i; + + for (i = 0; i < 3; i++) + if (!group->members[i]) + leave_state |= (0x1 << i); + + return leave_state & group->rec.join_state; +} + +static int check_selector(ib_sa_comp_mask comp_mask, + ib_sa_comp_mask selector_mask, + ib_sa_comp_mask value_mask, + u8 selector, u8 src_value, u8 dst_value) +{ + int err; + + if (!(comp_mask & selector_mask) || !(comp_mask & value_mask)) + return 0; + + switch (selector) { + case IB_SA_GT: + err = (src_value <= dst_value); + break; + case IB_SA_LT: + err = (src_value >= dst_value); + break; + case IB_SA_EQ: + err = (src_value != dst_value); + break; + default: + err = 0; + break; + } + + return err; +} + +static int cmp_rec(struct ib_sa_mcmember_rec *src, + struct ib_sa_mcmember_rec *dst, ib_sa_comp_mask comp_mask) +{ + /* MGID must already match */ + + if (comp_mask & IB_SA_MCMEMBER_REC_PORT_GID && + memcmp(&src->port_gid, &dst->port_gid, sizeof src->port_gid)) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_QKEY && src->qkey != dst->qkey) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_MLID && src->mlid != dst->mlid) + return -EINVAL; + if (check_selector(comp_mask, IB_SA_MCMEMBER_REC_MTU_SELECTOR, + IB_SA_MCMEMBER_REC_MTU, dst->mtu_selector, + src->mtu, dst->mtu)) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_TRAFFIC_CLASS && + src->traffic_class != dst->traffic_class) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_PKEY && src->pkey != dst->pkey) + return -EINVAL; + if (check_selector(comp_mask, IB_SA_MCMEMBER_REC_RATE_SELECTOR, + IB_SA_MCMEMBER_REC_RATE, dst->rate_selector, + src->rate, dst->rate)) + return -EINVAL; + if (check_selector(comp_mask, + IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME_SELECTOR, + IB_SA_MCMEMBER_REC_PACKET_LIFE_TIME, + dst->packet_life_time_selector, + src->packet_life_time, dst->packet_life_time)) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_SL && src->sl != dst->sl) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_FLOW_LABEL && + src->flow_label != dst->flow_label) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_HOP_LIMIT && + src->hop_limit != dst->hop_limit) + return -EINVAL; + if (comp_mask & IB_SA_MCMEMBER_REC_SCOPE && src->scope != dst->scope) + return -EINVAL; + + /* join_state checked separately, proxy_join ignored */ + + return 0; +} + +static int send_join(struct mcast_group *group, struct mcast_member *member) +{ + struct mcast_port *port = group->port; + int ret; + + ret = ib_sa_mcmember_rec_query(&sa_client, port->dev->device, + port->port_num, IB_MGMT_METHOD_SET, + &member->multicast.rec, + member->multicast.comp_mask, + 3000, GFP_KERNEL, join_handler, group, + &group->query); + if (ret >= 0) { + group->query_id = ret; + ret = 0; + } + return ret; +} + +static int send_leave(struct mcast_group *group, u8 leave_state) +{ + struct mcast_port *port = group->port; + struct ib_sa_mcmember_rec rec; + int ret; + + rec = group->rec; + rec.join_state = leave_state; + + ret = ib_sa_mcmember_rec_query(&sa_client, port->dev->device, + port->port_num, IB_SA_METHOD_DELETE, &rec, + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_JOIN_STATE, + 3000, GFP_KERNEL, leave_handler, + group, &group->query); + if (ret >= 0) { + group->query_id = ret; + ret = 0; + } + return ret; +} + +static void join_group(struct mcast_group *group, struct mcast_member *member, + u8 join_state) +{ + member->state = MCAST_MEMBER; + adjust_membership(group, join_state, 1); + group->rec.join_state |= join_state; + member->multicast.rec = group->rec; + member->multicast.rec.join_state = join_state; + list_del(&member->list); + list_add(&member->list, &group->active_list); +} + +static int fail_join(struct mcast_group *group, struct mcast_member *member, + int status) +{ + spin_lock_irq(&group->lock); + list_del_init(&member->list); + spin_unlock_irq(&group->lock); + return member->multicast.callback(status, &member->multicast); +} + +static void process_group_error(struct mcast_group *group) +{ + struct mcast_member *member; + int ret; + + spin_lock_irq(&group->lock); + while (!list_empty(&group->active_list)) { + member = list_entry(group->active_list.next, + struct mcast_member, list); + atomic_inc(&member->refcount); + list_del_init(&member->list); + adjust_membership(group, member->multicast.rec.join_state, -1); + member->state = MCAST_ERROR; + spin_unlock_irq(&group->lock); + + ret = member->multicast.callback(-ENETRESET, + &member->multicast); + deref_member(member); + if (ret) + ib_sa_free_multicast(&member->multicast); + spin_lock_irq(&group->lock); + } + + group->rec.join_state = 0; + group->state = MCAST_BUSY; + spin_unlock_irq(&group->lock); +} + +static void mcast_work_handler(void *data) +{ + struct mcast_group *group = data; + struct mcast_member *member; + struct ib_sa_multicast *multicast; + int status, ret; + u8 join_state; + +retest: + spin_lock_irq(&group->lock); + if (group->state == MCAST_ERROR) { + spin_unlock_irq(&group->lock); + process_group_error(group); + goto retest; + } + + while (!list_empty(&group->pending_list)) { + member = list_entry(group->pending_list.next, + struct mcast_member, list); + multicast = &member->multicast; + join_state = multicast->rec.join_state; + atomic_inc(&member->refcount); + + if (join_state == (group->rec.join_state & join_state)) { + status = cmp_rec(&group->rec, &multicast->rec, + multicast->comp_mask); + if (!status) + join_group(group, member, join_state); + else + list_del_init(&member->list); + spin_unlock_irq(&group->lock); + ret = multicast->callback(status, multicast); + } else { + spin_unlock_irq(&group->lock); + status = send_join(group, member); + if (!status) { + deref_member(member); + return; + } + ret = fail_join(group, member, status); + } + + deref_member(member); + if (ret) + ib_sa_free_multicast(&member->multicast); + spin_lock_irq(&group->lock); + } + + join_state = get_leave_state(group); + if (join_state) { + group->rec.join_state &= ~join_state; + spin_unlock_irq(&group->lock); + if (send_leave(group, join_state)) + goto retest; + } else { + group->state = MCAST_IDLE; + spin_unlock_irq(&group->lock); + release_group(group); + } +} + +/* + * Fail a join request if it is still active - at the head of the pending queue. + */ +static void process_join_error(struct mcast_group *group, int status) +{ + struct mcast_member *member; + int ret; + + spin_lock_irq(&group->lock); + member = list_entry(group->pending_list.next, + struct mcast_member, list); + if (group->last_join == member) { + atomic_inc(&member->refcount); + list_del_init(&member->list); + spin_unlock_irq(&group->lock); + ret = member->multicast.callback(status, &member->multicast); + deref_member(member); + if (ret) + ib_sa_free_multicast(&member->multicast); + } else + spin_unlock_irq(&group->lock); +} + +static void join_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context) +{ + struct mcast_group *group = context; + + if (status) + process_join_error(group, status); + else { + spin_lock_irq(&group->port->lock); + group->rec = *rec; + if (!memcmp(&mgid0, &group->rec.mgid, sizeof mgid0)) { + rb_erase(&group->node, &group->port->table); + mcast_insert(group->port, group, 1); + } + spin_unlock_irq(&group->port->lock); + } + mcast_work_handler(group); +} + +static void leave_handler(int status, struct ib_sa_mcmember_rec *rec, + void *context) +{ + mcast_work_handler(context); +} + +static struct mcast_group *acquire_group(struct mcast_port *port, + union ib_gid *mgid, gfp_t gfp_mask) +{ + struct mcast_group *group, *cur_group; + unsigned long flags; + int is_mgid0; + + is_mgid0 = !memcmp(&mgid0, mgid, sizeof mgid0); + if (!is_mgid0) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + goto found; + spin_unlock_irqrestore(&port->lock, flags); + } + + group = kzalloc(sizeof *group, gfp_mask); + if (!group) + return NULL; + + group->port = port; + group->rec.mgid = *mgid; + INIT_LIST_HEAD(&group->pending_list); + INIT_LIST_HEAD(&group->active_list); + INIT_WORK(&group->work, mcast_work_handler, group); + spin_lock_init(&group->lock); + + spin_lock_irqsave(&port->lock, flags); + cur_group = mcast_insert(port, group, is_mgid0); + if (cur_group) { + kfree(group); + group = cur_group; + } else + atomic_inc(&port->refcount); +found: + atomic_inc(&group->refcount); + spin_unlock_irqrestore(&port->lock, flags); + return group; +} + +/* + * We serialize all join requests to a single group to make our lives much + * easier. Otherwise, two users could try to join the same group + * simultaneously, with different configurations, one could leave while the + * join is in progress, etc., which makes locking around error recovery + * difficult. + */ +struct ib_sa_multicast * +ib_sa_join_multicast(struct ib_sa_client *client, + struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, + int (*callback)(int status, + struct ib_sa_multicast *multicast), + void *context) +{ + struct mcast_device *dev; + struct mcast_member *member; + struct ib_sa_multicast *multicast; + int ret; + + dev = ib_get_client_data(device, &mcast_client); + if (!dev) + return ERR_PTR(-ENODEV); + + member = kzalloc(sizeof *member, gfp_mask); + if (!member) + return ERR_PTR(-ENOMEM); + + ib_sa_client_get(client); + member->client = client; + member->multicast.rec = *rec; + member->multicast.comp_mask = comp_mask; + member->multicast.callback = callback; + member->multicast.context = context; + init_completion(&member->comp); + atomic_set(&member->refcount, 1); + member->state = MCAST_JOINING; + + member->group = acquire_group(&dev->port[port_num - dev->start_port], + &rec->mgid, gfp_mask); + if (!member->group) { + ret = -ENOMEM; + goto err; + } + + /* + * The user will get the multicast structure in their callback. They + * could then free the multicast structure before we can return from + * this routine. So we save the pointer to return before queuing + * any callback. + */ + multicast = &member->multicast; + queue_join(member); + return multicast; + +err: + ib_sa_client_put(client); + kfree(member); + return ERR_PTR(ret); +} +EXPORT_SYMBOL(ib_sa_join_multicast); + +void ib_sa_free_multicast(struct ib_sa_multicast *multicast) +{ + struct mcast_member *member; + struct mcast_group *group; + + member = container_of(multicast, struct mcast_member, multicast); + group = member->group; + + spin_lock_irq(&group->lock); + if (member->state == MCAST_MEMBER) + adjust_membership(group, multicast->rec.join_state, -1); + + list_del_init(&member->list); + + if (group->state == MCAST_IDLE) { + group->state = MCAST_BUSY; + spin_unlock_irq(&group->lock); + /* Continue to hold reference on group until callback */ + queue_work(mcast_wq, &group->work); + } else { + spin_unlock_irq(&group->lock); + release_group(group); + } + + deref_member(member); + wait_for_completion(&member->comp); + ib_sa_client_put(member->client); + kfree(member); +} +EXPORT_SYMBOL(ib_sa_free_multicast); + +int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num, + union ib_gid *mgid, struct ib_sa_mcmember_rec *rec) +{ + struct mcast_device *dev; + struct mcast_port *port; + struct mcast_group *group; + unsigned long flags; + int ret = 0; + + dev = ib_get_client_data(device, &mcast_client); + if (!dev) + return -ENODEV; + + port = &dev->port[port_num - dev->start_port]; + if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + *rec = group->rec; + else + ret = -EADDRNOTAVAIL; + spin_unlock_irqrestore(&port->lock, flags); + } else { + memset(rec, 0, sizeof *rec); + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); + rec->pkey = 0xFFFF; + get_random_bytes(&rec->qkey, sizeof rec->qkey); + rec->join_state = 1; + } + + return ret; +} +EXPORT_SYMBOL(ib_sa_get_mcmember_rec); + +int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + struct ib_ah_attr *ah_attr) +{ + int ret; + u16 gid_index; + u8 p; + + ret = ib_find_cached_gid(device, &rec->port_gid, &p, &gid_index); + if (ret) + return ret; + + memset(ah_attr, 0, sizeof *ah_attr); + ah_attr->dlid = be16_to_cpu(rec->mlid); + ah_attr->sl = rec->sl; + ah_attr->port_num = port_num; + ah_attr->static_rate = rec->rate; + + ah_attr->ah_flags = IB_AH_GRH; + ah_attr->grh.dgid = rec->mgid; + + ah_attr->grh.sgid_index = (u8) gid_index; + ah_attr->grh.flow_label = be32_to_cpu(rec->flow_label); + ah_attr->grh.hop_limit = rec->hop_limit; + ah_attr->grh.traffic_class = rec->traffic_class; + + return 0; +} +EXPORT_SYMBOL(ib_init_ah_from_mcmember); + +static void mcast_groups_lost(struct mcast_port *port) +{ + struct mcast_group *group; + struct rb_node *node; + unsigned long flags; + + spin_lock_irqsave(&port->lock, flags); + for (node = rb_first(&port->table); node; node = rb_next(node)) { + group = rb_entry(node, struct mcast_group, node); + spin_lock(&group->lock); + if (group->state == MCAST_IDLE) { + atomic_inc(&group->refcount); + queue_work(mcast_wq, &group->work); + } + group->state = MCAST_ERROR; + spin_unlock(&group->lock); + } + spin_unlock_irqrestore(&port->lock, flags); +} + +static void mcast_event_handler(struct ib_event_handler *handler, + struct ib_event *event) +{ + struct mcast_device *dev; + + dev = ib_get_client_data(event->device, &mcast_client); + if (!dev) + return; + + switch (event->event) { + case IB_EVENT_PORT_ERR: + case IB_EVENT_LID_CHANGE: + case IB_EVENT_SM_CHANGE: + case IB_EVENT_CLIENT_REREGISTER: + mcast_groups_lost(&dev->port[event->element.port_num - + dev->start_port]); + break; + default: + break; + } +} + +static void mcast_add_one(struct ib_device *device) +{ + struct mcast_device *dev; + struct mcast_port *port; + int i; + + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port, + GFP_KERNEL); + if (!dev) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) + dev->start_port = dev->end_port = 0; + else { + dev->start_port = 1; + dev->end_port = device->phys_port_cnt; + } + + for (i = 0; i <= dev->end_port - dev->start_port; i++) { + port = &dev->port[i]; + port->dev = dev; + port->port_num = dev->start_port + i; + spin_lock_init(&port->lock); + port->table = RB_ROOT; + init_completion(&port->comp); + atomic_set(&port->refcount, 1); + } + + dev->device = device; + ib_set_client_data(device, &mcast_client, dev); + + INIT_IB_EVENT_HANDLER(&event_handler, device, mcast_event_handler); + ib_register_event_handler(&event_handler); +} + +static void mcast_remove_one(struct ib_device *device) +{ + struct mcast_device *dev; + struct mcast_port *port; + int i; + + dev = ib_get_client_data(device, &mcast_client); + if (!dev) + return; + + ib_unregister_event_handler(&event_handler); + flush_workqueue(mcast_wq); + + for (i = 0; i < dev->end_port - dev->start_port; i++) { + port = &dev->port[i]; + deref_port(port); + wait_for_completion(&port->comp); + } + + kfree(dev); +} + +int mcast_init(void) +{ + int ret; + + mcast_wq = create_singlethread_workqueue("ib_mcast_wq"); + if (!mcast_wq) + return -ENOMEM; + + ib_sa_register_client(&sa_client); + + ret = ib_register_client(&mcast_client); + if (ret) + goto err; + return 0; + +err: + ib_sa_unregister_client(&sa_client); + destroy_workqueue(mcast_wq); + return ret; +} + +void mcast_cleanup(void) +{ + ib_unregister_client(&mcast_client); + ib_sa_unregister_client(&sa_client); + destroy_workqueue(mcast_wq); +} diff --git a/drivers/infiniband/core/sa.h b/drivers/infiniband/core/sa.h new file mode 100644 index 0000000..24c93fd --- /dev/null +++ b/drivers/infiniband/core/sa.h @@ -0,0 +1,66 @@ +/* + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Voltaire, Inc.  All rights reserved. + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef SA_H +#define SA_H + +#include + +static inline void ib_sa_client_get(struct ib_sa_client *client) +{ + atomic_inc(&client->users); +} + +static inline void ib_sa_client_put(struct ib_sa_client *client) +{ + if (atomic_dec_and_test(&client->users)) + complete(&client->comp); +} + +int ib_sa_mcmember_rec_query(struct ib_sa_client *client, + struct ib_device *device, u8 port_num, + u8 method, + struct ib_sa_mcmember_rec *rec, + ib_sa_comp_mask comp_mask, + int timeout_ms, gfp_t gfp_mask, + void (*callback)(int status, + struct ib_sa_mcmember_rec *resp, + void *context), + void *context, + struct ib_sa_query **sa_query); + +int mcast_init(void); +void mcast_cleanup(void); + +#endif /* SA_H */ diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 1706d3c..5b4c1c3 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -47,8 +47,8 @@ #include #include #include -#include #include +#include "sa.h" MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("InfiniBand subnet administration query support"); @@ -424,17 +424,6 @@ void ib_sa_register_client(struct ib_sa_ } EXPORT_SYMBOL(ib_sa_register_client); -static inline void ib_sa_client_get(struct ib_sa_client *client) -{ - atomic_inc(&client->users); -} - -static inline void ib_sa_client_put(struct ib_sa_client *client) -{ - if (atomic_dec_and_test(&client->users)) - complete(&client->comp); -} - void ib_sa_unregister_client(struct ib_sa_client *client) { ib_sa_client_put(client); @@ -900,7 +889,6 @@ err1: kfree(query); return ret; } -EXPORT_SYMBOL(ib_sa_mcmember_rec_query); static void send_handler(struct ib_mad_agent *agent, struct ib_mad_send_wc *mad_send_wc) @@ -1052,15 +1040,27 @@ static int __init ib_sa_init(void) get_random_bytes(&tid, sizeof tid); + ret = mcast_init(); + if (ret) { + printk(KERN_ERR "Couldn't initialize multicast handling\n"); + goto err1; + } + ret = ib_register_client(&sa_client); - if (ret) + if (ret) { printk(KERN_ERR "Couldn't register ib_sa client\n"); - + goto err2; + } + return 0; +err2: + mcast_cleanup(); +err1: return ret; } static void __exit ib_sa_cleanup(void) { + mcast_cleanup(); ib_unregister_client(&sa_client); idr_destroy(&query_idr); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 3faa182..d90f804 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -60,14 +60,11 @@ static DEFINE_MUTEX(mcast_mutex); /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */ struct ipoib_mcast { struct ib_sa_mcmember_rec mcmember; + struct ib_sa_multicast *mc; struct ipoib_ah *ah; struct rb_node rb_node; struct list_head list; - struct completion done; - - int query_id; - struct ib_sa_query *query; unsigned long created; unsigned long backoff; @@ -299,18 +296,22 @@ static int ipoib_mcast_join_finish(struc return 0; } -static void +static int ipoib_mcast_sendonly_join_complete(int status, - struct ib_sa_mcmember_rec *mcmember, - void *mcast_ptr) + struct ib_sa_multicast *multicast) { - struct ipoib_mcast *mcast = mcast_ptr; + struct ipoib_mcast *mcast = multicast->context; struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); + /* We trap for port events ourselves. */ + if (status == -ENETRESET) + return 0; + if (!status) - ipoib_mcast_join_finish(mcast, mcmember); - else { + status = ipoib_mcast_join_finish(mcast, &multicast->rec); + + if (status) { if (mcast->logcount++ < 20) ipoib_dbg_mcast(netdev_priv(dev), "multicast join failed for " IPOIB_GID_FMT ", status %d\n", @@ -325,11 +326,10 @@ ipoib_mcast_sendonly_join_complete(int s spin_unlock_irq(&priv->tx_lock); /* Clear the busy flag so we try again */ - clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); - mcast->query = NULL; + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, + &mcast->flags); } - - complete(&mcast->done); + return status; } static int ipoib_mcast_sendonly_join(struct ipoib_mcast *mcast) @@ -359,35 +359,33 @@ #endif rec.port_gid = priv->local_gid; rec.pkey = cpu_to_be16(priv->pkey); - init_completion(&mcast->done); - - ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, &rec, - IB_SA_MCMEMBER_REC_MGID | - IB_SA_MCMEMBER_REC_PORT_GID | - IB_SA_MCMEMBER_REC_PKEY | - IB_SA_MCMEMBER_REC_JOIN_STATE, - 1000, GFP_ATOMIC, - ipoib_mcast_sendonly_join_complete, - mcast, &mcast->query); - if (ret < 0) { - ipoib_warn(priv, "ib_sa_mcmember_rec_set failed (ret = %d)\n", + mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, + priv->port, &rec, + IB_SA_MCMEMBER_REC_MGID | + IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | + IB_SA_MCMEMBER_REC_JOIN_STATE, + GFP_ATOMIC, + ipoib_mcast_sendonly_join_complete, + mcast); + if (IS_ERR(mcast->mc)) { + ret = PTR_ERR(mcast->mc); + clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + ipoib_warn(priv, "ib_sa_join_multicast failed (ret = %d)\n", ret); } else { ipoib_dbg_mcast(priv, "no multicast record for " IPOIB_GID_FMT ", starting join\n", IPOIB_GID_ARG(mcast->mcmember.mgid)); - - mcast->query_id = ret; } return ret; } -static void ipoib_mcast_join_complete(int status, - struct ib_sa_mcmember_rec *mcmember, - void *mcast_ptr) +static int ipoib_mcast_join_complete(int status, + struct ib_sa_multicast *multicast) { - struct ipoib_mcast *mcast = mcast_ptr; + struct ipoib_mcast *mcast = multicast->context; struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -395,23 +393,24 @@ static void ipoib_mcast_join_complete(in " (status %d)\n", IPOIB_GID_ARG(mcast->mcmember.mgid), status); - if (!status && !ipoib_mcast_join_finish(mcast, mcmember)) { + /* We trap for port events ourselves. */ + if (status == -ENETRESET) + return 0; + + if (!status) + status = ipoib_mcast_join_finish(mcast, &multicast->rec); + + if (!status) { mcast->backoff = 1; mutex_lock(&mcast_mutex); if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_work(ipoib_workqueue, &priv->mcast_task); mutex_unlock(&mcast_mutex); - complete(&mcast->done); - return; - } - - if (status == -EINTR) { - complete(&mcast->done); - return; + return 0; } - if (status && mcast->logcount++ < 20) { - if (status == -ETIMEDOUT || status == -EINTR) { + if (mcast->logcount++ < 20) { + if (status == -ETIMEDOUT) { ipoib_dbg_mcast(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), @@ -428,23 +427,18 @@ static void ipoib_mcast_join_complete(in if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) mcast->backoff = IPOIB_MAX_BACKOFF_SECONDS; - mutex_lock(&mcast_mutex); + /* Clear the busy flag so we try again */ + status = test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + mutex_lock(&mcast_mutex); spin_lock_irq(&priv->lock); - mcast->query = NULL; - - if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { - if (status == -ETIMEDOUT) - queue_work(ipoib_workqueue, &priv->mcast_task); - else - queue_delayed_work(ipoib_workqueue, &priv->mcast_task, - mcast->backoff * HZ); - } else - complete(&mcast->done); + if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) + queue_delayed_work(ipoib_workqueue, &priv->mcast_task, + mcast->backoff * HZ); spin_unlock_irq(&priv->lock); mutex_unlock(&mcast_mutex); - return; + return status; } static void ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast, @@ -493,15 +487,14 @@ static void ipoib_mcast_join(struct net_ rec.hop_limit = priv->broadcast->mcmember.hop_limit; } - init_completion(&mcast->done); - - ret = ib_sa_mcmember_rec_set(&ipoib_sa_client, priv->ca, priv->port, - &rec, comp_mask, mcast->backoff * 1000, - GFP_ATOMIC, ipoib_mcast_join_complete, - mcast, &mcast->query); - - if (ret < 0) { - ipoib_warn(priv, "ib_sa_mcmember_rec_set failed, status %d\n", ret); + set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + mcast->mc = ib_sa_join_multicast(&ipoib_sa_client, priv->ca, priv->port, + &rec, comp_mask, GFP_KERNEL, + ipoib_mcast_join_complete, mcast); + if (IS_ERR(mcast->mc)) { + clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags); + ret = PTR_ERR(mcast->mc); + ipoib_warn(priv, "ib_sa_join_multicast failed, status %d\n", ret); mcast->backoff *= 2; if (mcast->backoff > IPOIB_MAX_BACKOFF_SECONDS) @@ -513,8 +506,7 @@ static void ipoib_mcast_join(struct net_ &priv->mcast_task, mcast->backoff * HZ); mutex_unlock(&mcast_mutex); - } else - mcast->query_id = ret; + } } void ipoib_mcast_join_task(void *dev_ptr) @@ -538,7 +530,7 @@ void ipoib_mcast_join_task(void *dev_ptr priv->local_rate = attr.active_speed * ib_width_enum_to_int(attr.active_width); } else - ipoib_warn(priv, "ib_query_port failed\n"); + ipoib_warn(priv, "ib_query_port failed\n"); } if (!priv->broadcast) { @@ -565,7 +557,8 @@ void ipoib_mcast_join_task(void *dev_ptr } if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { - ipoib_mcast_join(dev, priv->broadcast, 0); + if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) + ipoib_mcast_join(dev, priv->broadcast, 0); return; } @@ -620,26 +613,9 @@ int ipoib_mcast_start_thread(struct net_ return 0; } -static void wait_for_mcast_join(struct ipoib_dev_priv *priv, - struct ipoib_mcast *mcast) -{ - spin_lock_irq(&priv->lock); - if (mcast && mcast->query) { - ib_sa_cancel_query(mcast->query_id, mcast->query); - mcast->query = NULL; - spin_unlock_irq(&priv->lock); - ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); - wait_for_completion(&mcast->done); - } - else - spin_unlock_irq(&priv->lock); -} - int ipoib_mcast_stop_thread(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_mcast *mcast; ipoib_dbg_mcast(priv, "stopping multicast thread\n"); @@ -655,52 +631,27 @@ int ipoib_mcast_stop_thread(struct net_d if (flush) flush_workqueue(ipoib_workqueue); - wait_for_mcast_join(priv, priv->broadcast); - - list_for_each_entry(mcast, &priv->multicast_list, list) - wait_for_mcast_join(priv, mcast); - return 0; } static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ib_sa_mcmember_rec rec = { - .join_state = 1 - }; int ret = 0; - if (!test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) - return 0; - - ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", - IPOIB_GID_ARG(mcast->mcmember.mgid)); - - rec.mgid = mcast->mcmember.mgid; - rec.port_gid = priv->local_gid; - rec.pkey = cpu_to_be16(priv->pkey); + if (test_and_clear_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { + ipoib_dbg_mcast(priv, "leaving MGID " IPOIB_GID_FMT "\n", + IPOIB_GID_ARG(mcast->mcmember.mgid)); - /* Remove ourselves from the multicast group */ - ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), - &mcast->mcmember.mgid); - if (ret) - ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); + /* Remove ourselves from the multicast group */ + ret = ipoib_mcast_detach(dev, be16_to_cpu(mcast->mcmember.mlid), + &mcast->mcmember.mgid); + if (ret) + ipoib_warn(priv, "ipoib_mcast_detach failed (result = %d)\n", ret); + } - /* - * Just make one shot at leaving and don't wait for a reply; - * if we fail, too bad. - */ - ret = ib_sa_mcmember_rec_delete(&ipoib_sa_client, priv->ca, priv->port, &rec, - IB_SA_MCMEMBER_REC_MGID | - IB_SA_MCMEMBER_REC_PORT_GID | - IB_SA_MCMEMBER_REC_PKEY | - IB_SA_MCMEMBER_REC_JOIN_STATE, - 0, GFP_ATOMIC, NULL, - mcast, &mcast->query); - if (ret < 0) - ipoib_warn(priv, "ib_sa_mcmember_rec_delete failed " - "for leave (result = %d)\n", ret); + if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) + ib_sa_free_multicast(mcast->mc); return 0; } @@ -753,7 +704,7 @@ void ipoib_mcast_send(struct net_device dev_kfree_skb_any(skb); } - if (mcast->query) + if (test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)) ipoib_dbg_mcast(priv, "no address vector, " "but multicast join already started\n"); else if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) @@ -910,7 +861,6 @@ void ipoib_mcast_restart_task(void *dev_ /* We have to cancel outside of the spinlock */ list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { - wait_for_mcast_join(priv, mcast); ipoib_mcast_leave(mcast->dev, mcast); ipoib_mcast_free(mcast); } diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h index 97715b0..3b957e5 100644 --- a/include/rdma/ib_sa.h +++ b/include/rdma/ib_sa.h @@ -285,18 +285,6 @@ int ib_sa_path_rec_get(struct ib_sa_clie void *context, struct ib_sa_query **query); -int ib_sa_mcmember_rec_query(struct ib_sa_client *client, - struct ib_device *device, u8 port_num, - u8 method, - struct ib_sa_mcmember_rec *rec, - ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, - void (*callback)(int status, - struct ib_sa_mcmember_rec *resp, - void *context), - void *context, - struct ib_sa_query **query); - int ib_sa_service_rec_query(struct ib_sa_client *client, struct ib_device *device, u8 port_num, u8 method, @@ -309,93 +297,87 @@ int ib_sa_service_rec_query(struct ib_sa void *context, struct ib_sa_query **sa_query); +struct ib_sa_multicast { + struct ib_sa_mcmember_rec rec; + ib_sa_comp_mask comp_mask; + int (*callback)(int status, + struct ib_sa_multicast *multicast); + void *context; +}; + /** - * ib_sa_mcmember_rec_set - Start an MCMember set query - * @client:SA client - * @device:device to send query on - * @port_num: port number to send query on - * @rec:MCMember Record to send in query - * @comp_mask:component mask to send in query - * @timeout_ms:time to wait for response - * @gfp_mask:GFP mask to use for internal allocations - * @callback:function called when query completes, times out or is - * canceled - * @context:opaque user context passed to callback - * @sa_query:query context, used to cancel query + * ib_sa_join_multicast - Initiates a join request to the specified multicast + * group. + * @client: SA client + * @device: Device associated with the multicast group. + * @port_num: Port on the specified device to associate with the multicast + * group. + * @rec: SA multicast member record specifying group attributes. + * @comp_mask: Component mask indicating which group attributes of %rec are + * valid. + * @gfp_mask: GFP mask for memory allocations. + * @callback: User callback invoked once the join operation completes. + * @context: User specified context stored with the ib_sa_multicast structure. * - * Send an MCMember Set query to the SA (eg to join a multicast - * group). The callback function will be called when the query - * completes (or fails); status is 0 for a successful response, -EINTR - * if the query is canceled, -ETIMEDOUT is the query timed out, or - * -EIO if an error occurred sending the query. The resp parameter of - * the callback is only valid if status is 0. + * This call initiates a multicast join request with the SA for the specified + * multicast group. If the join operation is started successfully, it returns + * an ib_sa_multicast structure that is used to track the multicast operation. + * Users must free this structure by calling ib_free_multicast, even if the + * join operation later fails. (The callback status is non-zero.) * - * If the return value of ib_sa_mcmember_rec_set() is negative, it is - * an error code. Otherwise it is a query ID that can be used to - * cancel the query. + * If the join operation fails; status will be non-zero, with the following + * failures possible: + * -ETIMEDOUT: The request timed out. + * -EIO: An error occurred sending the query. + * -EINVAL: The MCMemberRecord values differed from the existing group's. + * -ENETRESET: Indicates that an fatal error has occurred on the multicast + * group, and the user must rejoin the group to continue using it. */ -static inline int -ib_sa_mcmember_rec_set(struct ib_sa_client *client, - struct ib_device *device, u8 port_num, - struct ib_sa_mcmember_rec *rec, - ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, - void (*callback)(int status, - struct ib_sa_mcmember_rec *resp, - void *context), - void *context, - struct ib_sa_query **query) -{ - return ib_sa_mcmember_rec_query(client, device, port_num, - IB_MGMT_METHOD_SET, - rec, comp_mask, - timeout_ms, gfp_mask, callback, - context, query); -} +struct ib_sa_multicast *ib_sa_join_multicast(struct ib_sa_client *client, + struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + ib_sa_comp_mask comp_mask, gfp_t gfp_mask, + int (*callback)(int status, + struct ib_sa_multicast + *multicast), + void *context); /** - * ib_sa_mcmember_rec_delete - Start an MCMember delete query - * @client:SA client - * @device:device to send query on - * @port_num: port number to send query on - * @rec:MCMember Record to send in query - * @comp_mask:component mask to send in query - * @timeout_ms:time to wait for response - * @gfp_mask:GFP mask to use for internal allocations - * @callback:function called when query completes, times out or is - * canceled - * @context:opaque user context passed to callback - * @sa_query:query context, used to cancel query + * ib_free_multicast - Frees the multicast tracking structure, and releases + * any reference on the multicast group. + * @multicast: Multicast tracking structure allocated by ib_join_multicast. * - * Send an MCMember Delete query to the SA (eg to leave a multicast - * group). The callback function will be called when the query - * completes (or fails); status is 0 for a successful response, -EINTR - * if the query is canceled, -ETIMEDOUT is the query timed out, or - * -EIO if an error occurred sending the query. The resp parameter of - * the callback is only valid if status is 0. + * This call blocks until the multicast identifier is destroyed. It may + * not be called from within the multicast callback; however, returning a non- + * zero value from the callback will result in destroying the multicast + * tracking structure. + */ +void ib_sa_free_multicast(struct ib_sa_multicast *multicast); + +/** + * ib_get_mcmember_rec - Looks up a multicast member record by its MGID and + * returns it if found. + * @device: Device associated with the multicast group. + * @port_num: Port on the specified device to associate with the multicast + * group. + * @mgid: optional MGID of multicast group. + * @rec: Location to copy SA multicast member record. * - * If the return value of ib_sa_mcmember_rec_delete() is negative, it - * is an error code. Otherwise it is a query ID that can be used to - * cancel the query. + * If an MGID is specified, returns an existing multicast member record if + * one is found for the local port. If no MGID is specified, or the specified + * MGID is 0, returns a multicast member record filled in with default values + * that may be used to create a new multicast group. */ -static inline int -ib_sa_mcmember_rec_delete(struct ib_sa_client *client, - struct ib_device *device, u8 port_num, - struct ib_sa_mcmember_rec *rec, - ib_sa_comp_mask comp_mask, - int timeout_ms, gfp_t gfp_mask, - void (*callback)(int status, - struct ib_sa_mcmember_rec *resp, - void *context), - void *context, - struct ib_sa_query **query) -{ - return ib_sa_mcmember_rec_query(client, device, port_num, - IB_SA_METHOD_DELETE, - rec, comp_mask, - timeout_ms, gfp_mask, callback, - context, query); -} +int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num, + union ib_gid *mgid, struct ib_sa_mcmember_rec *rec); + +/** + * ib_init_ah_from_mcmember - Initialize address handle attributes based on + * an SA multicast member record. + */ +int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, + struct ib_sa_mcmember_rec *rec, + struct ib_ah_attr *ah_attr); /** * ib_init_ah_from_path - Initialize address handle attributes based on an SA From matthew at wil.cx Tue Oct 24 15:36:32 2006 From: matthew at wil.cx (Matthew Wilcox) Date: Tue, 24 Oct 2006 16:36:32 -0600 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> Message-ID: <20061024223631.GT25210@parisc-linux.org> On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote: > > I think the right way to fix this is to ensure mmio write ordering in > > the pci_write_config_*() implementations. Like this. > > I'm happy to fix this in the PCI core and not force drivers to worry > about this. > > John, can you confirm that this patch fixes the issue for you? Hang on. I wasn't thinking clearly. mmiowb() only ensures the write has got as far as the shub. There's no way to fix this in the pci core -- any PCI-PCI bridge can reorder the two. This is only really a problem for setup (when we program the BARs), so it seems silly to enforce an ordering at any other time. Reluctantly, I must disagree with Jeff -- drivers need to fix this. From somenath at veritas.com Mon Oct 23 15:42:26 2006 From: somenath at veritas.com (somenath) Date: Mon, 23 Oct 2006 15:42:26 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <45393194.40707@3leafnetworks.com> References: <45393194.40707@3leafnetworks.com> Message-ID: <453D4552.4040304@veritas.com> hi Venkatesh: Two questions: 1. does re-enabling Migration (as defined in vol1 of ib spec in 17.2.8.1.4) work for you? (I mean after the 1st path failure, you do lap/apr packet transfer) 2. What applications you are testing with? thanks, som. Venkatesh Babu wrote: > > I have added couple of patches to the OFED stack as described in > bug#160, bug#172, and bug#159 and with this successfully tested the > APM functionality, except one issue. > > Configuration: > 2 Nodes > CPU: AMD Opteron(tm) Processor 252 Dual processor > CA type: MT25208 > Firmware version: 5.1.4 > OS: CentOS release 4.2 > IB: OFED 1.0 > > 2 Flextronics 24 port switchs > > Node1 Port1 connected to Switch1 > Node1 Port2 connected to Switch2 > Node2 Port1 connected to switch1 > Node 2 Port 2 connected to Switch2 > > Node1 : Active side of the RC QP > Node 2 : Passive side of the RC QP > > Test1: > Failover simulation on Node1 > 1. Simulate the port1 failure, RC QP migrates the path to port2 > 2. Simulate the port1 UP to rearm the alternate path from port1 > 3. Simulate the port2 failure, RC QP migrate the path to port1 > 4. Simulate the port2 IP to rearm the alternate path from port2 > > Test2: > Real failover my manually pulling the cable > 1. Simulate the failover/failback by pulling cable of Node1 port1 > 2. Simulate the failover/failback by pulling cable of Node1 port2 > 3. Simulate the failover/failback by pulling cable of Node2 port1 > 4. Simulate the failover/failback by pulling cable of Node2 port2 > > > ISSUE: > If I pull the both the cables then there are no paths to the > destination, so RC QP connection is supposed to tear down. But it is > not working. > > 1. Create a RC QP and load both primary and alternate path > (I was setting rnr_retry_count = 6, retry_count = 6, > packet_life_time field of struct ib_sa_path_rec to 15 and also tried > with 12) > 2. Send some traffic over RC QP > 3. Disconnect the cable belonging to the primary path > 4. It smoothly fails over to alternate path and it becomes primary path. > > No affect to the traffic on that RC QP > 5. Remove the second cable belonging to the new primary path. > 6. Obviously traffic stops since there are no paths to the > destination. But for the outstanding WRs in the RC QP I don't get any > callback from the verbs layer describing whether it succeeded or > failed due to some error like IB_WC_RETRY_EXC_ERR. > When I query the RC QP properties it still shows that it is in > IB_QPS_RTS state. > > > Without APM functionality it behaves correctly - > 1. Create a RC QP and load only primary path > (I was setting rnr_retry_count = 6, retry_count = 6, > packet_life_time field of struct ib_sa_path_rec to 15 and also tried > with 12) > 2. Send some traffic over RC QP > 3. Disconnect the cable belonging to the primary path > 4. Obviously traffic stops since there are no paths to the > destination. For the outstanding WRs in the RC QP I do get a callback > from the verbs layer describing the first WR that it failed due to > error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR. > I will close this RC QP. > > VBabu > > Date: Mon, 16 Oct 2006 14:03:50 -0700 > From: "Sean Hefty" > Subject: Re: [openib-general] APM support in openib stack > To: somenath at veritas.com > Cc: openib-general at openib.org > Message-ID: <4533F3B6.1030509 at ichips.intel.com> > Content-Type: text/plain; charset=iso-8859-1; format=flowed > > somenath wrote: > >>>>> Doesn't ib_cm_init_qp_attr() set this for you? >>>> >>> >>> >>> No, it doesn't. it returns me >>> attr_mask= 0x12d181 >>> port=0x0 alt_port=0x0 >> >> >> > > Okay - there was a fix to the cm.c file (svn rev 8267) that added > setting the alternate port number when initializing the QP > attributes. Apparently that fix did not make it into the release that > you're using. > > - Sean > > > > > From sean.hefty at intel.com Tue Oct 24 15:39:18 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:39:18 -0700 Subject: [openib-general] [PATCH 2/7 v2] for 2.6.20 rdma/cma: remove specifying qp_type when connecting In-Reply-To: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> Message-ID: <000201c6f7bd$3e5f4b30$a6d4180a@amr.corp.intel.com> There is a 1:1 correspondence between the qp_type for a connection and the port space associated with an rdma_cm_id. Remove the qp_type from the rdma_cm interface. Signed-off-by: Sean Hefty --- Removal makes the resulting userspace interface a little cleaner. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2b4748e..1c6d8e3 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -132,7 +132,6 @@ struct rdma_id_private { u32 seq_num; u32 qp_num; - enum ib_qp_type qp_type; u8 srq; }; @@ -391,7 +390,6 @@ int rdma_create_qp(struct rdma_cm_id *id id->qp = qp; id_priv->qp_num = qp->qp_num; - id_priv->qp_type = qp->qp_type; id_priv->srq = (qp->srq != NULL); return 0; err: @@ -1861,7 +1859,7 @@ static int cma_connect_ib(struct rdma_id req.service_id = cma_get_service_id(id_priv->id.ps, &route->addr.dst_addr); req.qp_num = id_priv->qp_num; - req.qp_type = id_priv->qp_type; + req.qp_type = IB_QPT_RC; req.starting_psn = id_priv->seq_num; req.responder_resources = conn_param->responder_resources; req.initiator_depth = conn_param->initiator_depth; @@ -1938,7 +1936,6 @@ int rdma_connect(struct rdma_cm_id *id, if (!id->qp) { id_priv->qp_num = conn_param->qp_num; - id_priv->qp_type = conn_param->qp_type; id_priv->srq = conn_param->srq; } @@ -2022,7 +2019,6 @@ int rdma_accept(struct rdma_cm_id *id, s if (!id->qp && conn_param) { id_priv->qp_num = conn_param->qp_num; - id_priv->qp_type = conn_param->qp_type; id_priv->srq = conn_param->srq; } diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index deb5a0a..4c07f96 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -215,7 +215,6 @@ struct rdma_conn_param { /* Fields below ignored if a QP is created on the rdma_cm_id. */ u8 srq; u32 qp_num; - enum ib_qp_type qp_type; }; /** From sean.hefty at intel.com Tue Oct 24 15:41:45 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:41:45 -0700 Subject: [openib-general] [PATCH 3/7 v2] for 2.6.20 rdma/cma: report connection data with event In-Reply-To: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> Message-ID: <000301c6f7bd$95d3e790$a6d4180a@amr.corp.intel.com> When establishing a connection, users of the rdma_cm provide connection parameters during calls to rdma_connect() and rdma_accept(). These parameters are not given to the remote side during connection establishment. The result is that the remote side does not know parameters such as initiator_depth and responder_resources until after a connection is established, and then only by querying the QP attributes. This makes it difficult to optimize resources before connecting or reject a connection if it cannot provide the required resources. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 1c6d8e3..622c8f9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -591,20 +591,6 @@ static inline int cma_user_data_offset(e } } -static int cma_notify_user(struct rdma_id_private *id_priv, - enum rdma_cm_event_type type, int status, - void *data, u8 data_len) -{ - struct rdma_cm_event event; - - event.event = type; - event.status = status; - event.private_data = data; - event.private_data_len = data_len; - - return id_priv->id.event_handler(&id_priv->id, &event); -} - static void cma_cancel_route(struct rdma_id_private *id_priv) { switch (rdma_node_get_transport(id_priv->id.device->node_type)) { @@ -789,47 +775,62 @@ reject: return ret; } +static void cma_set_rep_event_data(struct rdma_cm_event *event, + struct ib_cm_rep_event_param *rep_data, + void *private_data) +{ + event->param.conn.private_data = private_data; + event->param.conn.private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; + event->param.conn.responder_resources = rep_data->responder_resources; + event->param.conn.initiator_depth = rep_data->initiator_depth; + event->param.conn.flow_control = rep_data->flow_control; + event->param.conn.rnr_retry_count = rep_data->rnr_retry_count; + event->param.conn.srq = rep_data->srq; + event->param.conn.qp_num = rep_data->remote_qpn; +} + static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; - enum rdma_cm_event_type event; - u8 private_data_len = 0; - int ret = 0, status = 0; + struct rdma_cm_event event; + int ret = 0; atomic_inc(&id_priv->dev_remove); if (!cma_comp(id_priv, CMA_CONNECT)) goto out; + memset(&event, 0, sizeof event); switch (ib_event->event) { case IB_CM_REQ_ERROR: case IB_CM_REP_ERROR: - event = RDMA_CM_EVENT_UNREACHABLE; - status = -ETIMEDOUT; + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = -ETIMEDOUT; break; case IB_CM_REP_RECEIVED: - status = cma_verify_rep(id_priv, ib_event->private_data); - if (status) - event = RDMA_CM_EVENT_CONNECT_ERROR; + event.status = cma_verify_rep(id_priv, ib_event->private_data); + if (event.status) + event.event = RDMA_CM_EVENT_CONNECT_ERROR; else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { - status = cma_rep_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + event.status = cma_rep_recv(id_priv); + event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : + RDMA_CM_EVENT_ESTABLISHED; } else - event = RDMA_CM_EVENT_CONNECT_RESPONSE; - private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; + event.event = RDMA_CM_EVENT_CONNECT_RESPONSE; + cma_set_rep_event_data(&event, &ib_event->param.rep_rcvd, + ib_event->private_data); break; case IB_CM_RTU_RECEIVED: - status = cma_rtu_recv(id_priv); - event = status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + event.status = cma_rtu_recv(id_priv); + event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : + RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: - status = -ETIMEDOUT; /* fall through */ + event.status = -ETIMEDOUT; /* fall through */ case IB_CM_DREQ_RECEIVED: case IB_CM_DREP_RECEIVED: if (!cma_comp_exch(id_priv, CMA_CONNECT, CMA_DISCONNECT)) goto out; - event = RDMA_CM_EVENT_DISCONNECTED; + event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: case IB_CM_MRA_RECEIVED: @@ -837,9 +838,10 @@ static int cma_ib_handler(struct ib_cm_i goto out; case IB_CM_REJ_RECEIVED: cma_modify_qp_err(&id_priv->id); - status = ib_event->param.rej_rcvd.reason; - event = RDMA_CM_EVENT_REJECTED; - private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; + event.status = ib_event->param.rej_rcvd.reason; + event.event = RDMA_CM_EVENT_REJECTED; + event.param.conn.private_data = ib_event->private_data; + event.param.conn.private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", @@ -847,8 +849,7 @@ static int cma_ib_handler(struct ib_cm_i goto out; } - ret = cma_notify_user(id_priv, event, status, ib_event->private_data, - private_data_len); + ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; @@ -910,9 +911,25 @@ err: return NULL; } +static void cma_set_req_event_data(struct rdma_cm_event *event, + struct ib_cm_req_event_param *req_data, + void *private_data, int offset) +{ + event->param.conn.private_data = private_data + offset; + event->param.conn.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - offset; + event->param.conn.responder_resources = req_data->responder_resources; + event->param.conn.initiator_depth = req_data->initiator_depth; + event->param.conn.flow_control = req_data->flow_control; + event->param.conn.retry_count = req_data->retry_count; + event->param.conn.rnr_retry_count = req_data->rnr_retry_count; + event->param.conn.srq = req_data->srq; + event->param.conn.qp_num = req_data->remote_qpn; +} + static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *listen_id, *conn_id; + struct rdma_cm_event event; int offset, ret; listen_id = cm_id->context; @@ -945,9 +962,11 @@ static int cma_req_handler(struct ib_cm_ cm_id->cm_handler = cma_ib_handler; offset = cma_user_data_offset(listen_id->id.ps); - ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, - ib_event->private_data + offset, - IB_CM_REQ_PRIVATE_DATA_SIZE - offset); + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + cma_set_req_event_data(&event, &ib_event->param.req_rcvd, + ib_event->private_data, offset); + ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ conn_id->cm_id.ib = NULL; @@ -1019,15 +1038,16 @@ static void cma_set_compare_data(enum rd static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) { struct rdma_id_private *id_priv = iw_id->context; - enum rdma_cm_event_type event = 0; + struct rdma_cm_event event; struct sockaddr_in *sin; int ret = 0; + memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); switch (iw_event->event) { case IW_CM_EVENT_CLOSE: - event = RDMA_CM_EVENT_DISCONNECTED; + event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IW_CM_EVENT_CONNECT_REPLY: sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; @@ -1035,20 +1055,21 @@ static int cma_iw_handler(struct iw_cm_i sin = (struct sockaddr_in *) &id_priv->id.route.addr.dst_addr; *sin = iw_event->remote_addr; if (iw_event->status) - event = RDMA_CM_EVENT_REJECTED; + event.event = RDMA_CM_EVENT_REJECTED; else - event = RDMA_CM_EVENT_ESTABLISHED; + event.event = RDMA_CM_EVENT_ESTABLISHED; break; case IW_CM_EVENT_ESTABLISHED: - event = RDMA_CM_EVENT_ESTABLISHED; + event.event = RDMA_CM_EVENT_ESTABLISHED; break; default: BUG_ON(1); } - ret = cma_notify_user(id_priv, event, iw_event->status, - iw_event->private_data, - iw_event->private_data_len); + event.status = iw_event->status; + event.param.conn.private_data = iw_event->private_data; + event.param.conn.private_data_len = iw_event->private_data_len; + ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.iw = NULL; @@ -1069,6 +1090,7 @@ static int iw_conn_req_handler(struct iw struct rdma_id_private *listen_id, *conn_id; struct sockaddr_in *sin; struct net_device *dev = NULL; + struct rdma_cm_event event; int ret; listen_id = cm_id->context; @@ -1122,9 +1144,11 @@ static int iw_conn_req_handler(struct iw sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; *sin = iw_event->remote_addr; - ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, - iw_event->private_data, - iw_event->private_data_len); + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + event.param.conn.private_data = iw_event->private_data; + event.param.conn.private_data_len = iw_event->private_data_len; + ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* User wants to destroy the CM ID */ conn_id->cm_id.iw = NULL; @@ -1514,8 +1538,9 @@ static void addr_handler(int status, str struct rdma_dev_addr *dev_addr, void *context) { struct rdma_id_private *id_priv = context; - enum rdma_cm_event_type event; + struct rdma_cm_event event; + memset(&event, 0, sizeof event); atomic_inc(&id_priv->dev_remove); /* @@ -1535,14 +1560,15 @@ static void addr_handler(int status, str if (status) { if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ADDR_BOUND)) goto out; - event = RDMA_CM_EVENT_ADDR_ERROR; + event.event = RDMA_CM_EVENT_ADDR_ERROR; + event.status = status; } else { memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); - event = RDMA_CM_EVENT_ADDR_RESOLVED; + event.event = RDMA_CM_EVENT_ADDR_RESOLVED; } - if (cma_notify_user(id_priv, event, status, NULL, 0)) { + if (id_priv->id.event_handler(&id_priv->id, &event)) { cma_exch(id_priv, CMA_DESTROYING); cma_release_remove(id_priv); cma_deref_id(id_priv); @@ -2138,6 +2164,7 @@ err: static int cma_remove_id_dev(struct rdma_id_private *id_priv) { + struct rdma_cm_event event; enum cma_state state; /* Record that we want to remove the device */ @@ -2152,8 +2179,9 @@ static int cma_remove_id_dev(struct rdma if (!cma_comp(id_priv, CMA_DEVICE_REMOVAL)) return 0; - return cma_notify_user(id_priv, RDMA_CM_EVENT_DEVICE_REMOVAL, - 0, NULL, 0); + memset(&event, 0, sizeof event); + event.event = RDMA_CM_EVENT_DEVICE_REMOVAL; + return id_priv->id.event_handler(&id_priv->id, &event); } static void cma_process_remove(struct cma_device *cma_dev) diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 4c07f96..aa6ce47 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -77,11 +77,25 @@ struct rdma_route { int num_paths; }; +struct rdma_conn_param { + const void *private_data; + u8 private_data_len; + u8 responder_resources; + u8 initiator_depth; + u8 flow_control; + u8 retry_count; /* ignored when accepting */ + u8 rnr_retry_count; + /* Fields below ignored if a QP is created on the rdma_cm_id. */ + u8 srq; + u32 qp_num; +}; + struct rdma_cm_event { enum rdma_cm_event_type event; int status; - void *private_data; - u8 private_data_len; + union { + struct rdma_conn_param conn; + } param; }; struct rdma_cm_id; @@ -204,19 +218,6 @@ void rdma_destroy_qp(struct rdma_cm_id * int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int *qp_attr_mask); -struct rdma_conn_param { - const void *private_data; - u8 private_data_len; - u8 responder_resources; - u8 initiator_depth; - u8 flow_control; - u8 retry_count; /* ignored when accepting */ - u8 rnr_retry_count; - /* Fields below ignored if a QP is created on the rdma_cm_id. */ - u8 srq; - u32 qp_num; -}; - /** * rdma_connect - Initiate an active connection request. * From davem at davemloft.net Tue Oct 24 15:43:47 2006 From: davem at davemloft.net (David Miller) Date: Tue, 24 Oct 2006 15:43:47 -0700 (PDT) Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024223631.GT25210@parisc-linux.org> References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> Message-ID: <20061024.154347.77057163.davem@davemloft.net> From: Matthew Wilcox Date: Tue, 24 Oct 2006 16:36:32 -0600 > This is only really a problem for setup (when we program the BARs), so > it seems silly to enforce an ordering at any other time. Reluctantly, I > must disagree with Jeff -- drivers need to fix this. One thing is that we definitely don't want to fix this by, for example, reading back the PCI_COMMAND register or something like that. That causes two problems: 1) Some PCI config writes shut the device down and make it no respond to some kinds of PCI config transactions. One example is putting the device into D3 or similar power state, another is performing a device reset. 2) Several drivers use PCI config space accesses to touch the main registers in order to workaround bugs in the PCI-X implementation of their chip or similar (tg3 has a few cases like this), doing a PCI config space readback will kill performance quite a bit for an already slow situation. In fact, I do recall that one of the x86 PCI config space access implementations did a readback like this, and we had to remove it because it caused problems when doing a reset on tg3 chips when using PCI config space register write to do the reset. From venkatesh.babu at 3leafnetworks.com Tue Oct 24 16:09:05 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Tue, 24 Oct 2006 16:09:05 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453D4552.4040304@veritas.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> Message-ID: <453E9D11.3060304@3leafnetworks.com> 1. Yes, I can rearm the alternate path by sending LAP and APR messages. 2. I was sending some network traffic (netperf) while doing these failovers. VBabu somenath wrote: > hi Venkatesh: > > Two questions: > > 1. does re-enabling Migration (as defined in vol1 of ib spec in > 17.2.8.1.4) work for you? > (I mean after the 1st path failure, you do lap/apr packet transfer) > > 2. What applications you are testing with? > > thanks, som. > > Venkatesh Babu wrote: > >> >> I have added couple of patches to the OFED stack as described in >> bug#160, bug#172, and bug#159 and with this successfully tested the >> APM functionality, except one issue. >> >> Configuration: >> 2 Nodes >> CPU: AMD Opteron(tm) Processor 252 Dual processor >> CA type: MT25208 >> Firmware version: 5.1.4 >> OS: CentOS release 4.2 >> IB: OFED 1.0 >> >> 2 Flextronics 24 port switchs >> >> Node1 Port1 connected to Switch1 >> Node1 Port2 connected to Switch2 >> Node2 Port1 connected to switch1 >> Node 2 Port 2 connected to Switch2 >> >> Node1 : Active side of the RC QP >> Node 2 : Passive side of the RC QP >> >> Test1: >> Failover simulation on Node1 >> 1. Simulate the port1 failure, RC QP migrates the path to port2 >> 2. Simulate the port1 UP to rearm the alternate path from port1 >> 3. Simulate the port2 failure, RC QP migrate the path to port1 >> 4. Simulate the port2 IP to rearm the alternate path from port2 >> >> Test2: >> Real failover my manually pulling the cable >> 1. Simulate the failover/failback by pulling cable of Node1 port1 >> 2. Simulate the failover/failback by pulling cable of Node1 port2 >> 3. Simulate the failover/failback by pulling cable of Node2 port1 >> 4. Simulate the failover/failback by pulling cable of Node2 port2 >> >> >> ISSUE: >> If I pull the both the cables then there are no paths to the >> destination, so RC QP connection is supposed to tear down. But it is >> not working. >> >> 1. Create a RC QP and load both primary and alternate path >> (I was setting rnr_retry_count = 6, retry_count = 6, >> packet_life_time field of struct ib_sa_path_rec to 15 and also tried >> with 12) >> 2. Send some traffic over RC QP >> 3. Disconnect the cable belonging to the primary path >> 4. It smoothly fails over to alternate path and it becomes primary path. >> >> No affect to the traffic on that RC QP >> 5. Remove the second cable belonging to the new primary path. >> 6. Obviously traffic stops since there are no paths to the >> destination. But for the outstanding WRs in the RC QP I don't get any >> callback from the verbs layer describing whether it succeeded or >> failed due to some error like IB_WC_RETRY_EXC_ERR. >> When I query the RC QP properties it still shows that it is in >> IB_QPS_RTS state. >> >> >> Without APM functionality it behaves correctly - >> 1. Create a RC QP and load only primary path >> (I was setting rnr_retry_count = 6, retry_count = 6, >> packet_life_time field of struct ib_sa_path_rec to 15 and also tried >> with 12) >> 2. Send some traffic over RC QP >> 3. Disconnect the cable belonging to the primary path >> 4. Obviously traffic stops since there are no paths to the >> destination. For the outstanding WRs in the RC QP I do get a callback >> from the verbs layer describing the first WR that it failed due to >> error IB_WC_RETRY_EXC_ERR and for all other WRs I get >> IB_WC_WR_FLUSH_ERR. >> I will close this RC QP. >> >> VBabu >> >> Date: Mon, 16 Oct 2006 14:03:50 -0700 >> From: "Sean Hefty" >> Subject: Re: [openib-general] APM support in openib stack >> To: somenath at veritas.com >> Cc: openib-general at openib.org >> Message-ID: <4533F3B6.1030509 at ichips.intel.com> >> Content-Type: text/plain; charset=iso-8859-1; format=flowed >> >> somenath wrote: >> >>>>>> Doesn't ib_cm_init_qp_attr() set this for you? >>>>> >>>>> >>>> >>>> No, it doesn't. it returns me >>>> attr_mask= 0x12d181 >>>> port=0x0 alt_port=0x0 >>> >>> >>> >>> >> >> Okay - there was a fix to the cm.c file (svn rev 8267) that added >> setting the alternate port number when initializing the QP >> attributes. Apparently that fix did not make it into the release >> that you're using. >> >> - Sean >> >> >> >> >> > From sean.hefty at intel.com Tue Oct 24 15:45:51 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:45:51 -0700 Subject: [openib-general] [PATCH 4/7 v2] for 2.6.20 rdma/cma: add rdma_establish to force connection if RTU is lost In-Reply-To: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> Message-ID: <000401c6f7be$287fd130$a6d4180a@amr.corp.intel.com> Allow ULPs to transition to RTS before sending a REP. This allows the ULP to respond to a received message if it arrives before the RTU or communication established event. Modify the RDMA CM to transition to RTS when sending a REP over IB, and expose a new rdma_establish interface that a user can invoke to force a connection into the established state if it polls a receive completion before an RTU arrives. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1cf0d42..492d4ce 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3242,6 +3242,10 @@ static int cm_init_qp_rts_attr(struct cm spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { + /* Allow transition to RTS before sending REP */ + case IB_CM_REQ_RCVD: + case IB_CM_MRA_REQ_SENT: + case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: case IB_CM_REP_SENT: diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 622c8f9..416fee8 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -759,22 +759,6 @@ static int cma_verify_rep(struct rdma_id return 0; } -static int cma_rtu_recv(struct rdma_id_private *id_priv) -{ - int ret; - - ret = cma_modify_qp_rts(&id_priv->id); - if (ret) - goto reject; - - return 0; -reject: - cma_modify_qp_err(&id_priv->id); - ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, - NULL, 0, NULL, 0); - return ret; -} - static void cma_set_rep_event_data(struct rdma_cm_event *event, struct ib_cm_rep_event_param *rep_data, void *private_data) @@ -820,9 +804,8 @@ static int cma_ib_handler(struct ib_cm_i ib_event->private_data); break; case IB_CM_RTU_RECEIVED: - event.status = cma_rtu_recv(id_priv); - event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : - RDMA_CM_EVENT_ESTABLISHED; + case IB_CM_USER_ESTABLISHED: + event.event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: event.status = -ETIMEDOUT; /* fall through */ @@ -1990,11 +1973,25 @@ static int cma_accept_ib(struct rdma_id_ struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; - int ret; + struct ib_qp_attr qp_attr; + int qp_attr_mask, ret; - ret = cma_modify_qp_rtr(&id_priv->id); - if (ret) - return ret; + if (id_priv->id.qp) { + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + goto out; + + qp_attr.qp_state = IB_QPS_RTS; + ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, &qp_attr, + &qp_attr_mask); + if (ret) + goto out; + + qp_attr.max_rd_atomic = conn_param->initiator_depth; + ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); + if (ret) + goto out; + } memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; @@ -2009,7 +2006,9 @@ static int cma_accept_ib(struct rdma_id_ rep.rnr_retry_count = conn_param->rnr_retry_count; rep.srq = id_priv->srq ? 1 : 0; - return ib_send_cm_rep(id_priv->cm_id.ib, &rep); + ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); +out: + return ret; } static int cma_accept_iw(struct rdma_id_private *id_priv, @@ -2074,6 +2073,27 @@ reject: } EXPORT_SYMBOL(rdma_accept); +int rdma_establish(struct rdma_cm_id *id) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_CONNECT)) + return -EINVAL; + + switch (id->device->node_type) { + case RDMA_NODE_IB_CA: + ret = ib_cm_establish(id_priv->cm_id.ib); + break; + default: + ret = 0; + break; + } + return ret; +} +EXPORT_SYMBOL(rdma_establish); + int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index aa6ce47..dbc7c56 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -253,6 +253,16 @@ int rdma_listen(struct rdma_cm_id *id, i int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** + * rdma_establish - Forces a connection state to established. + * @id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * QP before being notified that the connection has been established by the + * RDMA CM. + */ +int rdma_establish(struct rdma_cm_id *id); + +/** * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, From sean.hefty at intel.com Tue Oct 24 15:50:19 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:50:19 -0700 Subject: [openib-general] [PATCH 5/7 v2] for 2.6.20 rdma/cma: add missing support for RDMA_PS_UDP In-Reply-To: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> Message-ID: <000501c6f7be$c8169d50$a6d4180a@amr.corp.intel.com> Add missing support for RDMA_PS_UDP. This allows the use of UD QPs through the rdma_cm, which provides address translation services over IB. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 416fee8..9bfd427 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -69,6 +69,7 @@ static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); +static DEFINE_IDR(udp_ps); struct cma_device { struct list_head list; @@ -507,9 +508,17 @@ static inline int cma_any_addr(struct so return cma_zero_addr(addr) || cma_loopback_addr(addr); } +static inline __be16 cma_port(struct sockaddr *addr) +{ + if (addr->sa_family == AF_INET) + return ((struct sockaddr_in *) addr)->sin_port; + else + return ((struct sockaddr_in6 *) addr)->sin6_port; +} + static inline int cma_any_port(struct sockaddr *addr) { - return !((struct sockaddr_in *) addr)->sin_port; + return !cma_port(addr); } static int cma_get_net_info(void *hdr, enum rdma_port_space ps, @@ -846,8 +855,8 @@ out: return ret; } -static struct rdma_id_private *cma_new_id(struct rdma_cm_id *listen_id, - struct ib_cm_event *ib_event) +static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; @@ -894,6 +903,42 @@ err: return NULL; } +static struct rdma_id_private* cma_new_udp_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv; + struct rdma_cm_id *id; + union cma_ip_addr *src, *dst; + __u16 port; + u8 ip_ver; + int ret; + + id = rdma_create_id(listen_id->event_handler, listen_id->context, + listen_id->ps); + if (IS_ERR(id)) + return NULL; + + + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); + + ret = rdma_translate_ip(&id->route.addr.src_addr, + &id->route.addr.dev_addr); + if (ret) + goto err; + + id_priv = container_of(id, struct rdma_id_private, id); + id_priv->state = CMA_CONNECT; + return id_priv; +err: + rdma_destroy_id(id); + return NULL; +} + static void cma_set_req_event_data(struct rdma_cm_event *event, struct ib_cm_req_event_param *req_data, void *private_data, int offset) @@ -922,7 +967,19 @@ static int cma_req_handler(struct ib_cm_ goto out; } - conn_id = cma_new_id(&listen_id->id, ib_event); + memset(&event, 0, sizeof event); + offset = cma_user_data_offset(listen_id->id.ps); + event.event = RDMA_CM_EVENT_CONNECT_REQUEST; + if (listen_id->id.ps == RDMA_PS_UDP) { + conn_id = cma_new_udp_id(&listen_id->id, ib_event); + event.param.ud.private_data = ib_event->private_data + offset; + event.param.ud.private_data_len = + IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset; + } else { + conn_id = cma_new_conn_id(&listen_id->id, ib_event); + cma_set_req_event_data(&event, &ib_event->param.req_rcvd, + ib_event->private_data, offset); + } if (!conn_id) { ret = -ENOMEM; goto out; @@ -944,11 +1001,6 @@ static int cma_req_handler(struct ib_cm_ cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; - offset = cma_user_data_offset(listen_id->id.ps); - memset(&event, 0, sizeof event); - event.event = RDMA_CM_EVENT_CONNECT_REQUEST; - cma_set_req_event_data(&event, &ib_event->param.req_rcvd, - ib_event->private_data, offset); ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ @@ -964,8 +1016,7 @@ out: static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) { - return cpu_to_be64(((u64)ps << 16) + - be16_to_cpu(((struct sockaddr_in *) addr)->sin_port)); + return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); } static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, @@ -1741,6 +1792,9 @@ static int cma_get_port(struct rdma_id_p case RDMA_PS_TCP: ps = &tcp_ps; break; + case RDMA_PS_UDP: + ps = &udp_ps; + break; default: return -EPROTONOSUPPORT; } @@ -1829,6 +1883,110 @@ static int cma_format_hdr(void *hdr, enu return 0; } +static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv = cm_id->context; + struct rdma_cm_event event; + struct ib_cm_sidr_rep_event_param *rep = &ib_event->param.sidr_rep_rcvd; + int ret = 0; + + memset(&event, 0, sizeof event); + atomic_inc(&id_priv->dev_remove); + if (!cma_comp(id_priv, CMA_CONNECT)) + goto out; + + switch (ib_event->event) { + case IB_CM_SIDR_REQ_ERROR: + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = -ETIMEDOUT; + break; + case IB_CM_SIDR_REP_RECEIVED: + event.param.ud.private_data = ib_event->private_data; + event.param.ud.private_data_len = IB_CM_SIDR_REP_PRIVATE_DATA_SIZE; + if (rep->status != IB_SIDR_SUCCESS) { + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = ib_event->param.sidr_rep_rcvd.status; + break; + } + if (rep->qkey != RDMA_UD_QKEY) { + event.event = RDMA_CM_EVENT_UNREACHABLE; + event.status = -EINVAL; + break; + } + ib_init_ah_from_path(id_priv->id.device, id_priv->id.port_num, + id_priv->id.route.path_rec, + &event.param.ud.ah_attr); + event.param.ud.qp_num = rep->qpn; + event.param.ud.qkey = rep->qkey; + event.event = RDMA_CM_EVENT_ESTABLISHED; + event.status = 0; + break; + default: + printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", + ib_event->event); + goto out; + } + + ret = id_priv->id.event_handler(&id_priv->id, &event); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + id_priv->cm_id.ib = NULL; + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return ret; + } +out: + cma_release_remove(id_priv); + return ret; +} + +static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct ib_cm_sidr_req_param req; + struct rdma_route *route; + int ret; + + req.private_data_len = sizeof(struct cma_hdr) + + conn_param->private_data_len; + req.private_data = kzalloc(req.private_data_len, GFP_ATOMIC); + if (!req.private_data) + return -ENOMEM; + + if (conn_param->private_data && conn_param->private_data_len) + memcpy((void *) req.private_data + sizeof(struct cma_hdr), + conn_param->private_data, conn_param->private_data_len); + + route = &id_priv->id.route; + ret = cma_format_hdr((void *) req.private_data, id_priv->id.ps, route); + if (ret) + goto out; + + id_priv->cm_id.ib = ib_create_cm_id(id_priv->id.device, + cma_sidr_rep_handler, id_priv); + if (IS_ERR(id_priv->cm_id.ib)) { + ret = PTR_ERR(id_priv->cm_id.ib); + goto out; + } + + req.path = route->path_rec; + req.service_id = cma_get_service_id(id_priv->id.ps, + &route->addr.dst_addr); + req.timeout_ms = 1 << (CMA_CM_RESPONSE_TIMEOUT - 8); + req.max_cm_retries = CMA_MAX_CM_RETRIES; + + ret = ib_send_cm_sidr_req(id_priv->cm_id.ib, &req); + if (ret) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } +out: + kfree(req.private_data); + return ret; +} + static int cma_connect_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { @@ -1950,7 +2108,10 @@ int rdma_connect(struct rdma_cm_id *id, switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - ret = cma_connect_ib(id_priv, conn_param); + if (id->ps == RDMA_PS_UDP) + ret = cma_resolve_ib_udp(id_priv, conn_param); + else + ret = cma_connect_ib(id_priv, conn_param); break; case RDMA_TRANSPORT_IWARP: ret = cma_connect_iw(id_priv, conn_param); @@ -2033,6 +2194,24 @@ static int cma_accept_iw(struct rdma_id_ return iw_cm_accept(id_priv->cm_id.iw, &iw_param); } +static int cma_send_sidr_rep(struct rdma_id_private *id_priv, + enum ib_cm_sidr_status status, + const void *private_data, int private_data_len) +{ + struct ib_cm_sidr_rep_param rep; + + memset(&rep, 0, sizeof rep); + rep.status = status; + if (status == IB_SIDR_SUCCESS) { + rep.qp_num = id_priv->qp_num; + rep.qkey = RDMA_UD_QKEY; + } + rep.private_data = private_data; + rep.private_data_len = private_data_len; + + return ib_send_cm_sidr_rep(id_priv->cm_id.ib, &rep); +} + int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -2049,7 +2228,11 @@ int rdma_accept(struct rdma_cm_id *id, s switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - if (conn_param) + if (id->ps == RDMA_PS_UDP) + ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS, + conn_param->private_data, + conn_param->private_data_len); + else if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); @@ -2106,9 +2289,13 @@ int rdma_reject(struct rdma_cm_id *id, c switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - ret = ib_send_cm_rej(id_priv->cm_id.ib, - IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, - private_data, private_data_len); + if (id->ps == RDMA_PS_UDP) + ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, + private_data, private_data_len); + else + ret = ib_send_cm_rej(id_priv->cm_id.ib, + IB_CM_REJ_CONSUMER_DEFINED, NULL, + 0, private_data, private_data_len); break; case RDMA_TRANSPORT_IWARP: ret = iw_cm_reject(id_priv->cm_id.iw, @@ -2280,6 +2467,7 @@ static void cma_cleanup(void) destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); idr_destroy(&tcp_ps); + idr_destroy(&udp_ps); } module_init(cma_init); diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index dbc7c56..595f1a7 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -90,11 +90,20 @@ struct rdma_conn_param { u32 qp_num; }; +struct rdma_ud_param { + const void *private_data; + u8 private_data_len; + struct ib_ah_attr ah_attr; + u32 qp_num; + u32 qkey; +}; + struct rdma_cm_event { enum rdma_cm_event_type event; int status; union { struct rdma_conn_param conn; + struct rdma_ud_param ud; } param; }; @@ -220,9 +229,15 @@ int rdma_init_qp_attr(struct rdma_cm_id /** * rdma_connect - Initiate an active connection request. + * @id: Connection identifier to connect. + * @conn_param: Connection information used for connected QPs. * * Users must have resolved a route for the rdma_cm_id to connect with * by having called rdma_resolve_route before calling this routine. + * + * This call will either connect to a remote QP or obtain remote QP + * information for unconnected rdma_cm_id's. The actual operation is + * based on the rdma_cm_id's port space. */ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); diff --git a/include/rdma/rdma_cm_ib.h b/include/rdma/rdma_cm_ib.h index e8c3af1..9b176df 100644 --- a/include/rdma/rdma_cm_ib.h +++ b/include/rdma/rdma_cm_ib.h @@ -44,4 +44,7 @@ #include int rdma_set_ib_paths(struct rdma_cm_id *id, struct ib_sa_path_rec *path_rec, int num_paths); +/* Global qkey for UD QPs and multicast groups. */ +#define RDMA_UD_QKEY 0x01234567 + #endif /* RDMA_CM_IB_H */ From sean.hefty at intel.com Tue Oct 24 15:54:32 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:54:32 -0700 Subject: [openib-general] [PATCH 6/7 v2] for 2.6.20 rdma/cma: add multicast support through the rdma_cm In-Reply-To: <000501c6f7be$c8169d50$a6d4180a@amr.corp.intel.com> Message-ID: <000601c6f7bf$5ed9e3f0$a6d4180a@amr.corp.intel.com> Add multicast QP support to the rdma_cm. - Users identify multicast groups by using a multicast IP address. Normal IP address translation services are used to map the address to a local RDMA port. - IB multicast group parameters are based on the ipoib broadcast group. The MGID is derived using a method similar to ipoib, but with a different signature. - QPs are automatically attached and detached from groups. - A QP may join multiple groups. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9bfd427..b83ea5d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -114,6 +114,7 @@ struct rdma_id_private { struct list_head list; struct list_head listen_list; struct cma_device *cma_dev; + struct list_head mc_list; enum cma_state state; spinlock_t lock; @@ -136,6 +137,18 @@ struct rdma_id_private { u8 srq; }; +struct cma_multicast { + struct rdma_id_private *id_priv; + union { + struct ib_sa_multicast *ib; + } multicast; + struct list_head list; + void *context; + struct sockaddr addr; + u8 pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; +}; + struct cma_work { struct work_struct work; struct rdma_id_private *id; @@ -323,6 +336,7 @@ struct rdma_cm_id *rdma_create_id(rdma_c init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); INIT_LIST_HEAD(&id_priv->listen_list); + INIT_LIST_HEAD(&id_priv->mc_list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); return &id_priv->id; @@ -696,6 +710,19 @@ static void cma_release_port(struct rdma mutex_unlock(&lock); } +static void cma_leave_mc_groups(struct rdma_id_private *id_priv) +{ + struct cma_multicast *mc; + + while (!list_empty(&id_priv->mc_list)) { + mc = container_of(id_priv->mc_list.next, + struct cma_multicast, list); + list_del(&mc->list); + ib_sa_free_multicast(mc->multicast.ib); + kfree(mc); + } +} + void rdma_destroy_id(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; @@ -720,6 +747,7 @@ void rdma_destroy_id(struct rdma_cm_id * default: break; } + cma_leave_mc_groups(id_priv); mutex_lock(&lock); cma_detach_from_dev(id_priv); } @@ -2340,6 +2368,159 @@ out: } EXPORT_SYMBOL(rdma_disconnect); +static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast) +{ + struct rdma_id_private *id_priv; + struct cma_multicast *mc = multicast->context; + struct rdma_cm_event event; + int ret; + + id_priv = mc->id_priv; + atomic_inc(&id_priv->dev_remove); + if (!cma_comp(id_priv, CMA_ADDR_BOUND) && + !cma_comp(id_priv, CMA_ADDR_RESOLVED)) + goto out; + + if (!status && id_priv->id.qp) + status = ib_attach_mcast(id_priv->id.qp, &multicast->rec.mgid, + multicast->rec.mlid); + + memset(&event, 0, sizeof event); + event.status = status; + event.param.ud.private_data = mc->context; + if (!status) { + event.event = RDMA_CM_EVENT_MULTICAST_JOIN; + ib_init_ah_from_mcmember(id_priv->id.device, + id_priv->id.port_num, &multicast->rec, + &event.param.ud.ah_attr); + event.param.ud.qp_num = 0xFFFFFF; + event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey); + } else + event.event = RDMA_CM_EVENT_MULTICAST_ERROR; + + ret = id_priv->id.event_handler(&id_priv->id, &event); + if (ret) { + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return 0; + } +out: + cma_release_remove(id_priv); + return 0; +} + +static int cma_join_ib_multicast(struct rdma_id_private *id_priv, + struct cma_multicast *mc) +{ + struct ib_sa_mcmember_rec rec; + unsigned char mc_map[MAX_ADDR_LEN]; + struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; + struct sockaddr_in *sin = (struct sockaddr_in *) &mc->addr; + ib_sa_comp_mask comp_mask; + int ret; + + ib_addr_get_mgid(dev_addr, &rec.mgid); + ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, + &rec.mgid, &rec); + if (ret) + return ret; + + ip_ib_mc_map(sin->sin_addr.s_addr, mc_map); + mc_map[7] = 0x01; /* Use RDMA CM signature */ + mc_map[8] = ib_addr_get_pkey(dev_addr) >> 8; + mc_map[9] = (unsigned char) ib_addr_get_pkey(dev_addr); + + rec.mgid = *(union ib_gid *) (mc_map + 4); + ib_addr_get_sgid(dev_addr, &rec.port_gid); + rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); + rec.join_state = 1; + rec.qkey = sin->sin_addr.s_addr; + + comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID | + IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE | + IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_SL | + IB_SA_MCMEMBER_REC_FLOW_LABEL | + IB_SA_MCMEMBER_REC_TRAFFIC_CLASS; + + mc->multicast.ib = ib_sa_join_multicast(&sa_client, id_priv->id.device, + id_priv->id.port_num, &rec, + comp_mask, GFP_KERNEL, + cma_ib_mc_handler, mc); + if (IS_ERR(mc->multicast.ib)) + return PTR_ERR(mc->multicast.ib); + + return 0; +} + +int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, + void *context) +{ + struct rdma_id_private *id_priv; + struct cma_multicast *mc; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp(id_priv, CMA_ADDR_BOUND) && + !cma_comp(id_priv, CMA_ADDR_RESOLVED)) + return -EINVAL; + + mc = kmalloc(sizeof *mc, GFP_KERNEL); + if (!mc) + return -ENOMEM; + + memcpy(&mc->addr, addr, ip_addr_size(addr)); + mc->context = context; + mc->id_priv = id_priv; + + spin_lock(&id_priv->lock); + list_add(&mc->list, &id_priv->mc_list); + spin_unlock(&id_priv->lock); + + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: + ret = cma_join_ib_multicast(id_priv, mc); + break; + default: + ret = -ENOSYS; + break; + } + + if (ret) { + spin_lock_irq(&id_priv->lock); + list_del(&mc->list); + spin_unlock_irq(&id_priv->lock); + kfree(mc); + } + return ret; +} +EXPORT_SYMBOL(rdma_join_multicast); + +void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) +{ + struct rdma_id_private *id_priv; + struct cma_multicast *mc; + + id_priv = container_of(id, struct rdma_id_private, id); + spin_lock_irq(&id_priv->lock); + list_for_each_entry(mc, &id_priv->mc_list, list) { + if (!memcmp(&mc->addr, addr, ip_addr_size(addr))) { + list_del(&mc->list); + spin_unlock_irq(&id_priv->lock); + + if (id->qp) + ib_detach_mcast(id->qp, + &mc->multicast.ib->rec.mgid, + mc->multicast.ib->rec.mlid); + ib_sa_free_multicast(mc->multicast.ib); + kfree(mc); + return; + } + } + spin_unlock_irq(&id_priv->lock); +} +EXPORT_SYMBOL(rdma_leave_multicast); + static void cma_add_one(struct ib_device *device) { struct cma_device *cma_dev; diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 81b6230..5bc318c 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -92,6 +92,12 @@ static inline void ib_addr_set_pkey(stru dev_addr->broadcast[9] = (unsigned char) pkey; } +static inline void ib_addr_get_mgid(struct rdma_dev_addr *dev_addr, + union ib_gid *gid) +{ + memcpy(gid, dev_addr->broadcast + 4, sizeof *gid); +} + static inline void ib_addr_get_sgid(struct rdma_dev_addr *dev_addr, union ib_gid *gid) { diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 595f1a7..9efbbdc 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -52,6 +52,8 @@ enum rdma_cm_event_type { RDMA_CM_EVENT_ESTABLISHED, RDMA_CM_EVENT_DISCONNECTED, RDMA_CM_EVENT_DEVICE_REMOVAL, + RDMA_CM_EVENT_MULTICAST_JOIN, + RDMA_CM_EVENT_MULTICAST_ERROR }; enum rdma_port_space { @@ -289,5 +291,21 @@ int rdma_reject(struct rdma_cm_id *id, c */ int rdma_disconnect(struct rdma_cm_id *id); -#endif /* RDMA_CM_H */ +/** + * rdma_join_multicast - Join the multicast group specified by the given + * address. + * @id: Communication identifier associated with the request. + * @addr: Multicast address identifying the group to join. + * @context: User-defined context associated with the join request, returned + * to the user through the private_data pointer in multicast events. + */ +int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, + void *context); +/** + * rdma_leave_multicast - Leave the multicast group specified by the given + * address. + */ +void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr); + +#endif /* RDMA_CM_H */ From somenath at veritas.com Mon Oct 23 15:59:58 2006 From: somenath at veritas.com (somenath) Date: Mon, 23 Oct 2006 15:59:58 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453E9D11.3060304@3leafnetworks.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9D11.3060304@3leafnetworks.com> Message-ID: <453D496E.10805@veritas.com> Venkatesh Babu wrote: > 1. Yes, I can rearm the alternate path by sending LAP and APR messages. does the qpair go to rearm state just by sending LAP and APR messages? I mean, you don't have to change the QP state to REARM explicitely? > > 2. I was sending some network traffic (netperf) while doing these > failovers. > so, I assume its SDP's APM feature gets tested? is that true? thanks, som. > VBabu > > somenath wrote: > >> hi Venkatesh: >> >> Two questions: >> >> 1. does re-enabling Migration (as defined in vol1 of ib spec in >> 17.2.8.1.4) work for you? >> (I mean after the 1st path failure, you do lap/apr packet transfer) >> >> 2. What applications you are testing with? >> >> thanks, som. >> >> Venkatesh Babu wrote: >> >>> >>> I have added couple of patches to the OFED stack as described in >>> bug#160, bug#172, and bug#159 and with this successfully tested the >>> APM functionality, except one issue. >>> >>> Configuration: >>> 2 Nodes >>> CPU: AMD Opteron(tm) Processor 252 Dual processor >>> CA type: MT25208 >>> Firmware version: 5.1.4 >>> OS: CentOS release 4.2 >>> IB: OFED 1.0 >>> >>> 2 Flextronics 24 port switchs >>> >>> Node1 Port1 connected to Switch1 >>> Node1 Port2 connected to Switch2 >>> Node2 Port1 connected to switch1 >>> Node 2 Port 2 connected to Switch2 >>> >>> Node1 : Active side of the RC QP >>> Node 2 : Passive side of the RC QP >>> >>> Test1: >>> Failover simulation on Node1 >>> 1. Simulate the port1 failure, RC QP migrates the path to port2 >>> 2. Simulate the port1 UP to rearm the alternate path from port1 >>> 3. Simulate the port2 failure, RC QP migrate the path to port1 >>> 4. Simulate the port2 IP to rearm the alternate path from port2 >>> >>> Test2: >>> Real failover my manually pulling the cable >>> 1. Simulate the failover/failback by pulling cable of Node1 port1 >>> 2. Simulate the failover/failback by pulling cable of Node1 port2 >>> 3. Simulate the failover/failback by pulling cable of Node2 port1 >>> 4. Simulate the failover/failback by pulling cable of Node2 port2 >>> >>> >>> ISSUE: >>> If I pull the both the cables then there are no paths to the >>> destination, so RC QP connection is supposed to tear down. But it >>> is not working. >>> >>> 1. Create a RC QP and load both primary and alternate path >>> (I was setting rnr_retry_count = 6, retry_count = 6, >>> packet_life_time field of struct ib_sa_path_rec to 15 and also tried >>> with 12) >>> 2. Send some traffic over RC QP >>> 3. Disconnect the cable belonging to the primary path >>> 4. It smoothly fails over to alternate path and it becomes primary >>> path. >>> >>> No affect to the traffic on that RC QP >>> 5. Remove the second cable belonging to the new primary path. >>> 6. Obviously traffic stops since there are no paths to the >>> destination. But for the outstanding WRs in the RC QP I don't get >>> any callback from the verbs layer describing whether it succeeded or >>> failed due to some error like IB_WC_RETRY_EXC_ERR. >>> When I query the RC QP properties it still shows that it is in >>> IB_QPS_RTS state. >>> >>> >>> Without APM functionality it behaves correctly - >>> 1. Create a RC QP and load only primary path >>> (I was setting rnr_retry_count = 6, retry_count = 6, >>> packet_life_time field of struct ib_sa_path_rec to 15 and also tried >>> with 12) >>> 2. Send some traffic over RC QP >>> 3. Disconnect the cable belonging to the primary path >>> 4. Obviously traffic stops since there are no paths to the >>> destination. For the outstanding WRs in the RC QP I do get a >>> callback from the verbs layer describing the first WR that it failed >>> due to error IB_WC_RETRY_EXC_ERR and for all other WRs I get >>> IB_WC_WR_FLUSH_ERR. >>> I will close this RC QP. >>> >>> VBabu >>> >>> Date: Mon, 16 Oct 2006 14:03:50 -0700 >>> From: "Sean Hefty" >>> Subject: Re: [openib-general] APM support in openib stack >>> To: somenath at veritas.com >>> Cc: openib-general at openib.org >>> Message-ID: <4533F3B6.1030509 at ichips.intel.com> >>> Content-Type: text/plain; charset=iso-8859-1; format=flowed >>> >>> somenath wrote: >>> >>>>>>> Doesn't ib_cm_init_qp_attr() set this for you? >>>>>> >>>>>> >>>>>> >>>>> >>>>> No, it doesn't. it returns me >>>>> attr_mask= 0x12d181 >>>>> port=0x0 alt_port=0x0 >>>> >>>> >>>> >>>> >>>> >>> >>> Okay - there was a fix to the cm.c file (svn rev 8267) that added >>> setting the alternate port number when initializing the QP >>> attributes. Apparently that fix did not make it into the release >>> that you're using. >>> >>> - Sean >>> >>> >>> >>> >>> >> From sean.hefty at intel.com Tue Oct 24 15:58:12 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 15:58:12 -0700 Subject: [openib-general] [PATCH 7/7 v2] for 2.6.20 rdma/ucma: add userspace support for the rdma_cm In-Reply-To: <000601c6f7bf$5ed9e3f0$a6d4180a@amr.corp.intel.com> Message-ID: <000701c6f7bf$e1fc9980$a6d4180a@amr.corp.intel.com> Export the rdma_cm capabilities to userspace. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 8873b63..189e5d4 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,9 +1,11 @@ infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o +user_access-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_ucm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o -obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o +obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ + $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o @@ -18,6 +20,8 @@ iw_cm-y := iwcm.o rdma_cm-y := cma.o +rdma_ucm-y := ucma.o + ib_addr-y := addr.o ib_umad-y := user_mad.o diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c new file mode 100644 index 0000000..4dae930 --- /dev/null +++ b/drivers/infiniband/core/ucma.c @@ -0,0 +1,1067 @@ +/* + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +MODULE_AUTHOR("Sean Hefty"); +MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access"); +MODULE_LICENSE("Dual BSD/GPL"); + +enum { + UCMA_MAX_BACKLOG = 128 +}; + +struct ucma_file { + struct mutex mut; + struct file *filp; + struct list_head ctx_list; + struct list_head event_list; + wait_queue_head_t poll_wait; +}; + +struct ucma_context { + int id; + struct completion comp; + atomic_t ref; + int events_reported; + int backlog; + + struct ucma_file *file; + struct rdma_cm_id *cm_id; + __u64 uid; + + struct list_head list; + struct list_head mc_list; +}; + +struct ucma_multicast { + struct ucma_context *ctx; + int id; + int events_reported; + + __u64 uid; + struct list_head list; + struct sockaddr addr; + u8 pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; +}; + +struct ucma_event { + struct ucma_context *ctx; + struct ucma_multicast *mc; + struct list_head list; + struct rdma_cm_id *cm_id; + struct rdma_ucm_event_resp resp; +}; + +static DEFINE_MUTEX(mut); +static DEFINE_IDR(ctx_idr); +static DEFINE_IDR(multicast_idr); + +static inline struct ucma_context* _ucma_find_context(int id, + struct ucma_file *file) +{ + struct ucma_context *ctx; + + ctx = idr_find(&ctx_idr, id); + if (!ctx) + ctx = ERR_PTR(-ENOENT); + else if (ctx->file != file) + ctx = ERR_PTR(-EINVAL); + return ctx; +} + +static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id) +{ + struct ucma_context *ctx; + + mutex_lock(&mut); + ctx = _ucma_find_context(id, file); + if (!IS_ERR(ctx)) + atomic_inc(&ctx->ref); + mutex_unlock(&mut); + return ctx; +} + +static void ucma_put_ctx(struct ucma_context *ctx) +{ + if (atomic_dec_and_test(&ctx->ref)) + complete(&ctx->comp); +} + +static struct ucma_context* ucma_alloc_ctx(struct ucma_file *file) +{ + struct ucma_context *ctx; + int ret; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return NULL; + + atomic_set(&ctx->ref, 1); + init_completion(&ctx->comp); + INIT_LIST_HEAD(&ctx->mc_list); + ctx->file = file; + + do { + ret = idr_pre_get(&ctx_idr, GFP_KERNEL); + if (!ret) + goto error; + + mutex_lock(&mut); + ret = idr_get_new(&ctx_idr, ctx, &ctx->id); + mutex_unlock(&mut); + } while (ret == -EAGAIN); + + if (ret) + goto error; + + list_add_tail(&ctx->list, &file->ctx_list); + return ctx; + +error: + kfree(ctx); + return NULL; +} + +static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx) +{ + struct ucma_multicast *mc; + int ret; + + mc = kzalloc(sizeof(*mc), GFP_KERNEL); + if (!mc) + return NULL; + + do { + ret = idr_pre_get(&multicast_idr, GFP_KERNEL); + if (!ret) + goto error; + + mutex_lock(&mut); + ret = idr_get_new(&multicast_idr, mc, &mc->id); + mutex_unlock(&mut); + } while (ret == -EAGAIN); + + if (ret) + goto error; + + mc->ctx = ctx; + list_add_tail(&mc->list, &ctx->mc_list); + return mc; + +error: + kfree(mc); + return NULL; +} + +static void ucma_copy_conn_event(struct rdma_ucm_conn_param *dst, + struct rdma_conn_param *src) +{ + if (src->private_data_len) + memcpy(dst->private_data, src->private_data, + src->private_data_len); + dst->private_data_len = src->private_data_len; + dst->responder_resources =src->responder_resources; + dst->initiator_depth = src->initiator_depth; + dst->flow_control = src->flow_control; + dst->retry_count = src->retry_count; + dst->rnr_retry_count = src->rnr_retry_count; + dst->srq = src->srq; + dst->qp_num = src->qp_num; +} + +static void ucma_copy_ud_event(struct rdma_ucm_ud_param *dst, + struct rdma_ud_param *src) +{ + if (src->private_data_len) + memcpy(dst->private_data, src->private_data, + src->private_data_len); + dst->private_data_len = src->private_data_len; + ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr); + dst->qp_num = src->qp_num; + dst->qkey = src->qkey; +} + +static void ucma_set_event_context(struct ucma_context *ctx, + struct rdma_cm_event *event, + struct ucma_event *uevent) +{ + uevent->ctx = ctx; + switch (event->event) { + case RDMA_CM_EVENT_MULTICAST_JOIN: + case RDMA_CM_EVENT_MULTICAST_ERROR: + uevent->mc = (struct ucma_multicast *) + event->param.ud.private_data; + uevent->resp.uid = uevent->mc->uid; + uevent->resp.id = uevent->mc->id; + break; + default: + uevent->resp.uid = ctx->uid; + uevent->resp.id = ctx->id; + break; + } +} + +static int ucma_event_handler(struct rdma_cm_id *cm_id, + struct rdma_cm_event *event) +{ + struct ucma_event *uevent; + struct ucma_context *ctx = cm_id->context; + int ret = 0; + + uevent = kzalloc(sizeof(*uevent), GFP_KERNEL); + if (!uevent) + return event->event == RDMA_CM_EVENT_CONNECT_REQUEST; + + uevent->cm_id = cm_id; + ucma_set_event_context(ctx, event, uevent); + uevent->resp.event = event->event; + uevent->resp.status = event->status; + if (cm_id->ps == RDMA_PS_UDP) + ucma_copy_ud_event(&uevent->resp.param.ud, &event->param.ud); + else + ucma_copy_conn_event(&uevent->resp.param.conn, + &event->param.conn); + + mutex_lock(&ctx->file->mut); + if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) { + if (!ctx->backlog) { + ret = -EDQUOT; + goto out; + } + ctx->backlog--; + } + list_add_tail(&uevent->list, &ctx->file->event_list); + wake_up_interruptible(&ctx->file->poll_wait); +out: + mutex_unlock(&ctx->file->mut); + return ret; +} + +static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct ucma_context *ctx; + struct rdma_ucm_get_event cmd; + struct ucma_event *uevent; + int ret = 0; + DEFINE_WAIT(wait); + + if (out_len < sizeof uevent->resp) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->mut); + while (list_empty(&file->event_list)) { + if (file->filp->f_flags & O_NONBLOCK) { + ret = -EAGAIN; + break; + } + + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + + prepare_to_wait(&file->poll_wait, &wait, TASK_INTERRUPTIBLE); + mutex_unlock(&file->mut); + schedule(); + mutex_lock(&file->mut); + finish_wait(&file->poll_wait, &wait); + } + + if (ret) + goto done; + + uevent = list_entry(file->event_list.next, struct ucma_event, list); + + if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) { + ctx = ucma_alloc_ctx(file); + if (!ctx) { + ret = -ENOMEM; + goto done; + } + uevent->ctx->backlog++; + ctx->cm_id = uevent->cm_id; + ctx->cm_id->context = ctx; + uevent->resp.id = ctx->id; + } + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &uevent->resp, sizeof uevent->resp)) { + ret = -EFAULT; + goto done; + } + + list_del(&uevent->list); + uevent->ctx->events_reported++; + if (uevent->mc) + uevent->mc->events_reported++; + kfree(uevent); +done: + mutex_unlock(&file->mut); + return ret; +} + +static ssize_t ucma_create_id(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_create_id cmd; + struct rdma_ucm_create_id_resp resp; + struct ucma_context *ctx; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&file->mut); + ctx = ucma_alloc_ctx(file); + mutex_unlock(&file->mut); + if (!ctx) + return -ENOMEM; + + ctx->uid = cmd.uid; + ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps); + if (IS_ERR(ctx->cm_id)) { + ret = PTR_ERR(ctx->cm_id); + goto err1; + } + + resp.id = ctx->id; + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) { + ret = -EFAULT; + goto err2; + } + return 0; + +err2: + rdma_destroy_id(ctx->cm_id); +err1: + mutex_lock(&mut); + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&mut); + kfree(ctx); + return ret; +} + +static void ucma_cleanup_multicast(struct ucma_context *ctx) +{ + struct ucma_multicast *mc, *tmp; + + mutex_lock(&mut); + list_for_each_entry_safe(mc, tmp, &ctx->mc_list, list) { + list_del(&mc->list); + idr_remove(&multicast_idr, mc->id); + kfree(mc); + } + mutex_unlock(&mut); +} + +static void ucma_cleanup_events(struct ucma_context *ctx) +{ + struct ucma_event *uevent, *tmp; + + list_for_each_entry_safe(uevent, tmp, &ctx->file->event_list, list) { + if (uevent->ctx != ctx) + continue; + + list_del(&uevent->list); + + /* clear incoming connections. */ + if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) + rdma_destroy_id(uevent->cm_id); + + kfree(uevent); + } +} + +static void ucma_cleanup_mc_events(struct ucma_multicast *mc) +{ + struct ucma_event *uevent, *tmp; + + list_for_each_entry_safe(uevent, tmp, &mc->ctx->file->event_list, list) { + if (uevent->mc != mc) + continue; + + list_del(&uevent->list); + kfree(uevent); + } +} + +static int ucma_free_ctx(struct ucma_context *ctx) +{ + int events_reported; + + /* No new events will be generated after destroying the id. */ + rdma_destroy_id(ctx->cm_id); + + ucma_cleanup_multicast(ctx); + + /* Cleanup events not yet reported to the user. */ + mutex_lock(&ctx->file->mut); + ucma_cleanup_events(ctx); + list_del(&ctx->list); + mutex_unlock(&ctx->file->mut); + + events_reported = ctx->events_reported; + kfree(ctx); + return events_reported; +} + +static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_destroy_id cmd; + struct rdma_ucm_destroy_id_resp resp; + struct ucma_context *ctx; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&mut); + ctx = _ucma_find_context(cmd.id, file); + if (!IS_ERR(ctx)) + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&mut); + + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ucma_put_ctx(ctx); + wait_for_completion(&ctx->comp); + resp.events_reported = ucma_free_ctx(ctx); + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + + return ret; +} + +static ssize_t ucma_bind_addr(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_bind_addr cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_resolve_addr(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_resolve_addr cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr, + (struct sockaddr *) &cmd.dst_addr, + cmd.timeout_ms); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_resolve_route(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_resolve_route cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms); + ucma_put_ctx(ctx); + return ret; +} + +static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp, + struct rdma_route *route) +{ + struct rdma_dev_addr *dev_addr; + + resp->num_paths = route->num_paths; + switch (route->num_paths) { + case 0: + dev_addr = &route->addr.dev_addr; + ib_addr_get_dgid(dev_addr, + (union ib_gid *) &resp->ib_route[0].dgid); + ib_addr_get_sgid(dev_addr, + (union ib_gid *) &resp->ib_route[0].sgid); + resp->ib_route[0].pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); + break; + case 2: + ib_copy_path_rec_to_user(&resp->ib_route[1], + &route->path_rec[1]); + /* fall through */ + case 1: + ib_copy_path_rec_to_user(&resp->ib_route[0], + &route->path_rec[0]); + break; + default: + break; + } +} + +static ssize_t ucma_query_route(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_query_route cmd; + struct rdma_ucm_query_route_resp resp; + struct ucma_context *ctx; + struct sockaddr *addr; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + memset(&resp, 0, sizeof resp); + addr = &ctx->cm_id->route.addr.src_addr; + memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + addr = &ctx->cm_id->route.addr.dst_addr; + memcpy(&resp.dst_addr, addr, addr->sa_family == AF_INET ? + sizeof(struct sockaddr_in) : + sizeof(struct sockaddr_in6)); + if (!ctx->cm_id->device) + goto out; + + resp.node_guid = ctx->cm_id->device->node_guid; + resp.port_num = ctx->cm_id->port_num; + switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) { + case RDMA_TRANSPORT_IB: + ucma_copy_ib_route(&resp, &ctx->cm_id->route); + break; + default: + break; + } + +out: + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + + ucma_put_ctx(ctx); + return ret; +} + +static void ucma_copy_conn_param(struct rdma_conn_param *dst, + struct rdma_ucm_conn_param *src) +{ + dst->private_data = src->private_data; + dst->private_data_len = src->private_data_len; + dst->responder_resources =src->responder_resources; + dst->initiator_depth = src->initiator_depth; + dst->flow_control = src->flow_control; + dst->retry_count = src->retry_count; + dst->rnr_retry_count = src->rnr_retry_count; + dst->srq = src->srq; + dst->qp_num = src->qp_num; +} + +static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_connect cmd; + struct rdma_conn_param conn_param; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + if (!cmd.conn_param.valid) + return -EINVAL; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ucma_copy_conn_param(&conn_param, &cmd.conn_param); + ret = rdma_connect(ctx->cm_id, &conn_param); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_listen cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ctx->backlog = cmd.backlog > 0 && cmd.backlog < UCMA_MAX_BACKLOG ? + cmd.backlog : UCMA_MAX_BACKLOG; + ret = rdma_listen(ctx->cm_id, ctx->backlog); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_accept cmd; + struct rdma_conn_param conn_param; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + if (cmd.conn_param.valid) { + ctx->uid = cmd.uid; + ucma_copy_conn_param(&conn_param, &cmd.conn_param); + ret = rdma_accept(ctx->cm_id, &conn_param); + } else + ret = rdma_accept(ctx->cm_id, NULL); + + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_reject cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_disconnect cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_disconnect(ctx->cm_id); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_init_qp_attr(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_init_qp_attr cmd; + struct ib_uverbs_qp_attr resp; + struct ucma_context *ctx; + struct ib_qp_attr qp_attr; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + resp.qp_attr_mask = 0; + memset(&qp_attr, 0, sizeof qp_attr); + qp_attr.qp_state = cmd.qp_state; + ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask); + if (ret) + goto out; + + ib_copy_qp_attr_to_user(&resp, &qp_attr); + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; + +out: + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_establish(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_establish cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + ret = rdma_establish(ctx->cm_id); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_join_multicast(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_join_mcast cmd; + struct rdma_ucm_create_id_resp resp; + struct ucma_context *ctx; + struct ucma_multicast *mc; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + mutex_lock(&file->mut); + mc = ucma_alloc_multicast(ctx); + if (IS_ERR(mc)) { + ret = PTR_ERR(mc); + goto err1; + } + + mc->uid = cmd.uid; + memcpy(&mc->addr, &cmd.addr, sizeof cmd.addr); + ret = rdma_join_multicast(ctx->cm_id, &mc->addr, mc); + if (ret) + goto err2; + + resp.id = mc->id; + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) { + ret = -EFAULT; + goto err3; + } + + mutex_unlock(&file->mut); + ucma_put_ctx(ctx); + return 0; + +err3: + rdma_leave_multicast(ctx->cm_id, &mc->addr); + ucma_cleanup_mc_events(mc); +err2: + mutex_lock(&mut); + idr_remove(&multicast_idr, mc->id); + mutex_unlock(&mut); + list_del(&mc->list); + kfree(mc); +err1: + mutex_unlock(&file->mut); + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_leave_multicast(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_destroy_id cmd; + struct rdma_ucm_destroy_id_resp resp; + struct ucma_multicast *mc; + int ret = 0; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + mutex_lock(&mut); + mc = idr_find(&multicast_idr, cmd.id); + if (!mc) + mc = ERR_PTR(-ENOENT); + else if (mc->ctx->file != file) + mc = ERR_PTR(-EINVAL); + else { + idr_remove(&multicast_idr, mc->id); + atomic_inc(&mc->ctx->ref); + } + mutex_unlock(&mut); + + if (IS_ERR(mc)) { + ret = PTR_ERR(mc); + goto out; + } + + rdma_leave_multicast(mc->ctx->cm_id, &mc->addr); + mutex_lock(&mc->ctx->file->mut); + ucma_cleanup_mc_events(mc); + list_del(&mc->list); + mutex_unlock(&mc->ctx->file->mut); + + ucma_put_ctx(mc->ctx); + kfree(mc); +out: + return ret; +} + +static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, + const char __user *inbuf, + int in_len, int out_len) = { + [RDMA_USER_CM_CMD_CREATE_ID] = ucma_create_id, + [RDMA_USER_CM_CMD_DESTROY_ID] = ucma_destroy_id, + [RDMA_USER_CM_CMD_BIND_ADDR] = ucma_bind_addr, + [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr, + [RDMA_USER_CM_CMD_RESOLVE_ROUTE]= ucma_resolve_route, + [RDMA_USER_CM_CMD_QUERY_ROUTE] = ucma_query_route, + [RDMA_USER_CM_CMD_CONNECT] = ucma_connect, + [RDMA_USER_CM_CMD_LISTEN] = ucma_listen, + [RDMA_USER_CM_CMD_ACCEPT] = ucma_accept, + [RDMA_USER_CM_CMD_REJECT] = ucma_reject, + [RDMA_USER_CM_CMD_DISCONNECT] = ucma_disconnect, + [RDMA_USER_CM_CMD_INIT_QP_ATTR] = ucma_init_qp_attr, + [RDMA_USER_CM_CMD_GET_EVENT] = ucma_get_event, + [RDMA_USER_CM_CMD_GET_OPTION] = NULL, + [RDMA_USER_CM_CMD_SET_OPTION] = NULL, + [RDMA_USER_CM_CMD_ESTABLISH] = ucma_establish, + [RDMA_USER_CM_CMD_JOIN_MCAST] = ucma_join_multicast, + [RDMA_USER_CM_CMD_LEAVE_MCAST] = ucma_leave_multicast, +}; + +static ssize_t ucma_write(struct file *filp, const char __user *buf, + size_t len, loff_t *pos) +{ + struct ucma_file *file = filp->private_data; + struct rdma_ucm_cmd_hdr hdr; + ssize_t ret; + + if (len < sizeof(hdr)) + return -EINVAL; + + if (copy_from_user(&hdr, buf, sizeof(hdr))) + return -EFAULT; + + if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(ucma_cmd_table)) + return -EINVAL; + + if (hdr.in + sizeof(hdr) > len) + return -EINVAL; + + if (!ucma_cmd_table[hdr.cmd]) + return -ENOSYS; + + ret = ucma_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out); + if (!ret) + ret = len; + + return ret; +} + +static unsigned int ucma_poll(struct file *filp, struct poll_table_struct *wait) +{ + struct ucma_file *file = filp->private_data; + unsigned int mask = 0; + + poll_wait(filp, &file->poll_wait, wait); + + if (!list_empty(&file->event_list)) + mask = POLLIN | POLLRDNORM; + + return mask; +} + +static int ucma_open(struct inode *inode, struct file *filp) +{ + struct ucma_file *file; + + file = kmalloc(sizeof *file, GFP_KERNEL); + if (!file) + return -ENOMEM; + + INIT_LIST_HEAD(&file->event_list); + INIT_LIST_HEAD(&file->ctx_list); + init_waitqueue_head(&file->poll_wait); + mutex_init(&file->mut); + + filp->private_data = file; + file->filp = filp; + return 0; +} + +static int ucma_close(struct inode *inode, struct file *filp) +{ + struct ucma_file *file = filp->private_data; + struct ucma_context *ctx, *tmp; + + mutex_lock(&file->mut); + list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { + mutex_unlock(&file->mut); + + mutex_lock(&mut); + idr_remove(&ctx_idr, ctx->id); + mutex_unlock(&mut); + + ucma_free_ctx(ctx); + mutex_lock(&file->mut); + } + mutex_unlock(&file->mut); + kfree(file); + return 0; +} + +static struct file_operations ucma_fops = { + .owner = THIS_MODULE, + .open = ucma_open, + .release = ucma_close, + .write = ucma_write, + .poll = ucma_poll, +}; + +static struct miscdevice ucma_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "rdma_cm", + .fops = &ucma_fops, +}; + +static ssize_t show_abi_version(struct class_device *class_dev, char *buf) +{ + return sprintf(buf, "%d\n", RDMA_USER_CM_ABI_VERSION); +} +static CLASS_DEVICE_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); + +static int __init ucma_init(void) +{ + int ret; + + ret = misc_register(&ucma_misc); + if (ret) + return ret; + + ret = class_device_create_file(ucma_misc.class, + &class_device_attr_abi_version); + if (ret) { + printk(KERN_ERR "rdma_ucm: couldn't create abi_version attr\n"); + goto err; + } + return 0; +err: + misc_deregister(&ucma_misc); + return ret; +} + +static void __exit ucma_cleanup(void) +{ + class_device_remove_file(ucma_misc.class, + &class_device_attr_abi_version); + misc_deregister(&ucma_misc); + idr_destroy(&ctx_idr); +} + +module_init(ucma_init); +module_exit(ucma_cleanup); diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c index ce46b13..5440da0 100644 --- a/drivers/infiniband/core/uverbs_marshall.c +++ b/drivers/infiniband/core/uverbs_marshall.c @@ -32,8 +32,8 @@ #include -static void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, - struct ib_ah_attr *src) +void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, + struct ib_ah_attr *src) { memcpy(dst->grh.dgid, src->grh.dgid.raw, sizeof src->grh.dgid); dst->grh.flow_label = src->grh.flow_label; @@ -47,6 +47,7 @@ static void ib_copy_ah_attr_to_user(stru dst->is_global = src->ah_flags & IB_AH_GRH ? 1 : 0; dst->port_num = src->port_num; } +EXPORT_SYMBOL(ib_copy_ah_attr_to_user); void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, struct ib_qp_attr *src) diff --git a/include/rdma/ib_marshall.h b/include/rdma/ib_marshall.h index 66bf4d7..db03720 100644 --- a/include/rdma/ib_marshall.h +++ b/include/rdma/ib_marshall.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -41,6 +41,9 @@ #include void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, struct ib_qp_attr *src); +void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, + struct ib_ah_attr *src); + void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst, struct ib_sa_path_rec *src); diff --git a/include/rdma/rdma_user_cm.h b/include/rdma/rdma_user_cm.h new file mode 100644 index 0000000..5e76fb2 --- /dev/null +++ b/include/rdma/rdma_user_cm.h @@ -0,0 +1,214 @@ +/* + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef RDMA_USER_CM_H +#define RDMA_USER_CM_H + +#include +#include +#include +#include + +#define RDMA_USER_CM_ABI_VERSION 3 + +#define RDMA_MAX_PRIVATE_DATA 256 + +enum { + RDMA_USER_CM_CMD_CREATE_ID, + RDMA_USER_CM_CMD_DESTROY_ID, + RDMA_USER_CM_CMD_BIND_ADDR, + RDMA_USER_CM_CMD_RESOLVE_ADDR, + RDMA_USER_CM_CMD_RESOLVE_ROUTE, + RDMA_USER_CM_CMD_QUERY_ROUTE, + RDMA_USER_CM_CMD_CONNECT, + RDMA_USER_CM_CMD_LISTEN, + RDMA_USER_CM_CMD_ACCEPT, + RDMA_USER_CM_CMD_REJECT, + RDMA_USER_CM_CMD_DISCONNECT, + RDMA_USER_CM_CMD_INIT_QP_ATTR, + RDMA_USER_CM_CMD_GET_EVENT, + RDMA_USER_CM_CMD_GET_OPTION, + RDMA_USER_CM_CMD_SET_OPTION, + RDMA_USER_CM_CMD_ESTABLISH, + RDMA_USER_CM_CMD_JOIN_MCAST, + RDMA_USER_CM_CMD_LEAVE_MCAST +}; + +/* + * command ABI structures. + */ +struct rdma_ucm_cmd_hdr { + __u32 cmd; + __u16 in; + __u16 out; +}; + +struct rdma_ucm_create_id { + __u64 uid; + __u64 response; + __u16 ps; + __u8 reserved[6]; +}; + +struct rdma_ucm_create_id_resp { + __u32 id; +}; + +struct rdma_ucm_destroy_id { + __u64 response; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_destroy_id_resp { + __u32 events_reported; +}; + +struct rdma_ucm_bind_addr { + __u64 response; + struct sockaddr_in6 addr; + __u32 id; +}; + +struct rdma_ucm_resolve_addr { + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 id; + __u32 timeout_ms; +}; + +struct rdma_ucm_resolve_route { + __u32 id; + __u32 timeout_ms; +}; + +struct rdma_ucm_query_route { + __u64 response; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_query_route_resp { + __u64 node_guid; + struct ib_user_path_rec ib_route[2]; + struct sockaddr_in6 src_addr; + struct sockaddr_in6 dst_addr; + __u32 num_paths; + __u8 port_num; + __u8 reserved[3]; +}; + +struct rdma_ucm_conn_param { + __u32 qp_num; + __u32 reserved; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + __u8 private_data_len; + __u8 srq; + __u8 responder_resources; + __u8 initiator_depth; + __u8 flow_control; + __u8 retry_count; + __u8 rnr_retry_count; + __u8 valid; +}; + +struct rdma_ucm_ud_param { + __u32 qp_num; + __u32 qkey; + struct ib_uverbs_ah_attr ah_attr; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + __u8 private_data_len; + __u8 reserved[7]; +}; + +struct rdma_ucm_connect { + struct rdma_ucm_conn_param conn_param; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_listen { + __u32 id; + __u32 backlog; +}; + +struct rdma_ucm_accept { + __u64 uid; + struct rdma_ucm_conn_param conn_param; + __u32 id; + __u32 reserved; +}; + +struct rdma_ucm_reject { + __u32 id; + __u8 private_data_len; + __u8 reserved[3]; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; +}; + +struct rdma_ucm_disconnect { + __u32 id; +}; + +struct rdma_ucm_init_qp_attr { + __u64 response; + __u32 id; + __u32 qp_state; +}; + +struct rdma_ucm_establish { + __u32 id; +}; + +struct rdma_ucm_join_mcast { + __u64 response; /* rdma_ucm_create_id_resp */ + __u64 uid; + struct sockaddr_in6 addr; + __u32 id; +}; + +struct rdma_ucm_get_event { + __u64 response; +}; + +struct rdma_ucm_event_resp { + __u64 uid; + __u32 id; + __u32 event; + __u32 status; + union { + struct rdma_ucm_conn_param conn; + struct rdma_ucm_ud_param ud; + } param; +}; + +#endif /* RDMA_USER_CM_H */ From jgunthorpe at obsidianresearch.com Tue Oct 24 15:59:35 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 24 Oct 2006 16:59:35 -0600 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024223631.GT25210@parisc-linux.org> References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> Message-ID: <20061024225935.GK4054@obsidianresearch.com> On Tue, Oct 24, 2006 at 04:36:32PM -0600, Matthew Wilcox wrote: > On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote: > > > I think the right way to fix this is to ensure mmio write ordering in > > > the pci_write_config_*() implementations. Like this. > > > > I'm happy to fix this in the PCI core and not force drivers to worry > > about this. > > > > John, can you confirm that this patch fixes the issue for you? > Hang on. I wasn't thinking clearly. mmiowb() only ensures the write > has got as far as the shub. There's no way to fix this in the pci core What about shifting the requirement down to the platform? Ie on ia64 it would seem that inb/outb already solve this problem via mf.a. All platforms that support inb/outb correctly must have a synchronizing primitive for outb.. > This is only really a problem for setup (when we program the BARs), so > it seems silly to enforce an ordering at any other time. Reluctantly, I > must disagree with Jeff -- drivers need to fix this. I'm not sure that can work either. The PCI-X spec is very clear, you must wait for a non-posted completion if you care about order. Doing a config read in the driver as a surrogate flush is not good enough in the general case. Like you say, a pci bridge is free to reorder all in flight non-posted operations. Jason From mshefty at ichips.intel.com Tue Oct 24 16:04:39 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 16:04:39 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453D4552.4040304@veritas.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> Message-ID: <453E9C07.2070608@ichips.intel.com> > 1. does re-enabling Migration (as defined in vol1 of ib spec in > 17.2.8.1.4) work for you? > (I mean after the 1st path failure, you do lap/apr packet transfer) I believe that there's other issues that need to be fixed for this to fully work. The ib_cm uses the primary_path specified during connection establishment to send CM MADs related to that connection. If the primary_path fails and stays unavailable, future MADs (e.g. LAP) sent by the CM would also fail. The CM records the alternate path, but currently doesn't do anything with it. It needs to know when to switch to the alternate path for its MADs. - Sean From mst at mellanox.co.il Tue Oct 24 16:09:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 01:09:08 +0200 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024223631.GT25210@parisc-linux.org> References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> Message-ID: <20061024230908.GB13022@mellanox.co.il> Quoting r. Matthew Wilcox : > Subject: Re: Ordering between PCI config space writes and MMIO reads? > > On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote: > > > I think the right way to fix this is to ensure mmio write ordering in > > > the pci_write_config_*() implementations. Like this. > > > > I'm happy to fix this in the PCI core and not force drivers to worry > > about this. > > > > John, can you confirm that this patch fixes the issue for you? > > Hang on. I wasn't thinking clearly. mmiowb() only ensures the write > has got as far as the shub. There's no way to fix this in the pci core > -- any PCI-PCI bridge can reorder the two. > > This is only really a problem for setup (when we program the BARs), so > it seems silly to enforce an ordering at any other time. Reluctantly, I > must disagree with Jeff -- drivers need to fix this. This can be true for any bridge. Most arches, however, simply block until config write completes - this is why driver doesn't issue any MMIO writes - and this is what we are looking for here - a way to block the CPU until split completion for config write arrives. By the way, e.g. the PCI Express spec says: "Read Requests and I/O or Configuration Write Requests are permitted to be blocked by or to pass other Read Requests and I/O or Configuration Write Requests." so it is not clear that doing a config read will always flush all config writes as you want. -- MST From shubbell at dbresearch.net Tue Oct 24 16:01:28 2006 From: shubbell at dbresearch.net (Sean Hubbell) Date: Tue, 24 Oct 2006 18:01:28 -0500 Subject: [openib-general] IPoIB Question In-Reply-To: References: Message-ID: <453E9B48.3040508@dbresearch.net> Is this with a combination of TCP and UDP or just TCP? Sean Scott Weitzenkamp (sweitzen) wrote: > We see 3.6 Gb/sec with IPoIB using RHEL4U4 2.6.9-42 x86_64 kernel on > Dell PE1950 Woodcrest systems. > > In my testing, faster hardware is more important than newer kernels, but > I don't try newer kernels much. > > From steiner at sgi.com Tue Oct 24 16:27:55 2006 From: steiner at sgi.com (Jack Steiner) Date: Tue, 24 Oct 2006 18:27:55 -0500 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024223631.GT25210@parisc-linux.org> References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> Message-ID: <20061024232755.GA26521@sgi.com> On Tue, Oct 24, 2006 at 04:36:32PM -0600, Matthew Wilcox wrote: > On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote: > > > I think the right way to fix this is to ensure mmio write ordering in > > > the pci_write_config_*() implementations. Like this. > > > > I'm happy to fix this in the PCI core and not force drivers to worry > > about this. > > > > John, can you confirm that this patch fixes the issue for you? > > Hang on. I wasn't thinking clearly. mmiowb() only ensures the write > has got as far as the shub. I think mmiowb() should work on SN hardware. mmiowb() delays until shub reports that all previously issued PIO writes have completed. The processor "mf.a" guarantees "platform acceptance" which on SN means that shub has accepted the write - not that it has actually completed (or even forwarded anywhere by shub). That makes "mf.a" more-or-less useless on SN. However, shub has an additional MMR register (PIO_WRITE_COUNT) that counts actual outstanding PIOs. mmiob() delays until that count goes to zero. I'll check if there is any additional reordering that can occur AFTER the PIO_WRITE_COUNT goes to zero. If so, it would be at bus level - not in shub or routers. > There's no way to fix this in the pci core > -- any PCI-PCI bridge can reorder the two. > > This is only really a problem for setup (when we program the BARs), so > it seems silly to enforce an ordering at any other time. Reluctantly, I > must disagree with Jeff -- drivers need to fix this. -- jack From somenath at veritas.com Mon Oct 23 17:04:05 2006 From: somenath at veritas.com (somenath) Date: Mon, 23 Oct 2006 17:04:05 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453E9C07.2070608@ichips.intel.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9C07.2070608@ichips.intel.com> Message-ID: <453D5875.4050901@veritas.com> Sean Hefty wrote: >> 1. does re-enabling Migration (as defined in vol1 of ib spec in >> 17.2.8.1.4) work for you? >> (I mean after the 1st path failure, you do lap/apr packet transfer) > > > I believe that there's other issues that need to be fixed for this to > fully work. The ib_cm uses the primary_path specified during > connection establishment to send CM MADs related to that connection. > If the primary_path fails and stays unavailable, future MADs (e.g. > LAP) sent by the CM would also fail. The CM records the alternate > path, but currently doesn't do anything with it. It needs to know > when to switch to the alternate path for its MADs. > > - Sean any idea when this stuff will get done? another question: If one brings back the old path (after the first failure) and use the old path record to do lap/apr then reenabling migration using LAP/APR should work, right? next question is: which component is suppossed to change QP state to REARM? spec just says: "based on a command from a management entity, the QP state is set to REARM" . does the current CM code do it automatically? In my case, its not happening... thanks, som. From mshefty at ichips.intel.com Tue Oct 24 17:10:51 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 17:10:51 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453D5875.4050901@veritas.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9C07.2070608@ichips.intel.com> <453D5875.4050901@veritas.com> Message-ID: <453EAB8B.402@ichips.intel.com> somenath wrote: > any idea when this stuff will get done? As a rough estimate, Q1, though I will try to get to it after adding support for SA events, which likely would be near late November, early December. The framework in the ib_cm is there; I just need a way to signal that failover has occurred. > another question: If one brings back the old path (after the first > failure) and use the old path record > to do lap/apr then reenabling migration using LAP/APR should work, right? Yes. > next question is: which component is suppossed to change QP state to > REARM? spec just says: > "based on a command from a management entity, the QP state is set to > REARM" . The ib_cm does not perform any QP state transitions. That is left up to the user, since the CM is unaware of other actions related to the QP that the user may be performing. - Sean From somenath at veritas.com Mon Oct 23 17:30:01 2006 From: somenath at veritas.com (somenath) Date: Mon, 23 Oct 2006 17:30:01 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453EAB8B.402@ichips.intel.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9C07.2070608@ichips.intel.com> <453D5875.4050901@veritas.com> <453EAB8B.402@ichips.intel.com> Message-ID: <453D5E89.5040801@veritas.com> Sean Hefty wrote: > somenath wrote: > >> any idea when this stuff will get done? > > > As a rough estimate, Q1, though I will try to get to it after adding > support for SA events, which likely would be near late November, early > December. The framework in the ib_cm is there; I just need a way to > signal that failover has occurred. > >> another question: If one brings back the old path (after the first >> failure) and use the old path record >> to do lap/apr then reenabling migration using LAP/APR should work, >> right? > > > Yes. > >> next question is: which component is suppossed to change QP state to >> REARM? spec just says: >> "based on a command from a management entity, the QP state is set to >> REARM" . > > > The ib_cm does not perform any QP state transitions. That is left up > to the user, since the CM is unaware of other actions related to the > QP that the user may be performing. but this function cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) does perform the state transition using if (cm_id_priv->alt_av.ah_attr.dlid) { *qp_attr_mask |= IB_QP_PATH_MIG_STATE; qp_attr->path_mig_state = IB_MIG_REARM; } which works for me the first time I load path (using send REQ) and change state to RTS. so, I am bit confused here. do you mean that CM will do the state transition first time and not the next time? thanks, som. > > - Sean From venkatesh.babu at 3leafnetworks.com Tue Oct 24 18:36:40 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Tue, 24 Oct 2006 18:36:40 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453D496E.10805@veritas.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9D11.3060304@3leafnetworks.com> <453D496E.10805@veritas.com> Message-ID: <453EBFA8.1000709@3leafnetworks.com> 1. No my application has to make the state trasition from RTS to RTS. In case of failover set path_mig_state to IB_MIG_MIGRATED. In case of rearming call the function ib_cm_init_rearm_attr(). This function is defined in the bug#172. 2. No, I have my own ULP module which sits alongside to the SDP, SRP, IPoIB modules. VBabu somenath wrote: > Venkatesh Babu wrote: > >> 1. Yes, I can rearm the alternate path by sending LAP and APR messages. > > > > does the qpair go to rearm state just by sending LAP and APR messages? > I mean, you don't have to change the QP state to REARM explicitely? > >> >> 2. I was sending some network traffic (netperf) while doing these >> failovers. >> > so, I assume its SDP's APM feature gets tested? is that true? > > thanks, som. > From somenath at veritas.com Mon Oct 23 18:37:42 2006 From: somenath at veritas.com (somenath) Date: Mon, 23 Oct 2006 18:37:42 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453EBFA8.1000709@3leafnetworks.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9D11.3060304@3leafnetworks.com> <453D496E.10805@veritas.com> <453EBFA8.1000709@3leafnetworks.com> Message-ID: <453D6E66.4050902@veritas.com> I don't see that API( ib_cm_init_rearm_attr() the build I am using. so I I use a function like this: modifyqp_rearm(con_t *con) { struct ib_qp_attr qp_attr; int qp_attr_mask =0; int ib_stat =0; memset(&qp_attr, 0, sizeof(struct ib_qp_attr)); qp_attr.qp_state = IB_QPS_RTS; ib_stat = ib_cm_init_qp_attr( con ->cm_id, &qp_attr, &qp_attr_mask); if (ib_stat) { return ib_stat; } qp_attr.path_mig_state = IB_MIG_REARM; qp_attr_mask |= IB_QP_PATH_MIG_STATE; ib_stat = ib_modify_qp(con->qp, &qp_attr, qp_attr_mask); if (ib_stat) { return ib_stat; } return 0; } Active side sends the LAP. Both side is called with APR event. I call this function at the APR handler and it returns me the error -22 when tried to ib_modify_qp(). see anything wrong? do I have to set anything else? or is it broken in the build I am using? thanks, som. Venkatesh Babu wrote: > 1. No my application has to make the state trasition from RTS to RTS. > In case of failover set path_mig_state to IB_MIG_MIGRATED. > > In case of rearming call the function ib_cm_init_rearm_attr(). This > function is defined in the bug#172. > > 2. No, I have my own ULP module which sits alongside to the SDP, SRP, > IPoIB modules. > > VBabu > > somenath wrote: > >> Venkatesh Babu wrote: >> >>> 1. Yes, I can rearm the alternate path by sending LAP and APR messages. >> >> >> >> >> does the qpair go to rearm state just by sending LAP and APR messages? >> I mean, you don't have to change the QP state to REARM explicitely? >> >>> >>> 2. I was sending some network traffic (netperf) while doing these >>> failovers. >>> >> so, I assume its SDP's APM feature gets tested? is that true? >> >> thanks, som. >> From venkatesh.babu at 3leafnetworks.com Tue Oct 24 18:57:02 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Tue, 24 Oct 2006 18:57:02 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453EAB8B.402@ichips.intel.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9C07.2070608@ichips.intel.com> <453D5875.4050901@veritas.com> <453EAB8B.402@ichips.intel.com> Message-ID: <453EC46E.30803@3leafnetworks.com> Sean Hefty wrote: > somenath wrote: > >> any idea when this stuff will get done? > > > As a rough estimate, Q1, though I will try to get to it after adding > support for SA events, which likely would be near late November, early > December. The framework in the ib_cm is there; I just need a way to > signal that failover has occurred. I have proposed patches to the ib_cm and ib_sa modules in bug#172 and bug#159 to get this APM functionality working. See the following description on how these interfaces can be used to resolve the APM issues. On a node where RC QP connection is initiated (Active side) the IB_EVENT_PORT_ERR event is generated when port failure occurs and in this event handler RC QP's path_mig_state can be changed to IB_MIG_MIGRATED to cause the failover. The problem is for the the remote node where RC QP listen was accepted this connection (Passive side). There is no way for this node to know that port failure has occurred on the Active side. So it requires some interfaces to get this notification. So ib_sa_serv_notice_hdlr() interface as described in the patch attached to bug#159 can be used to register for the remote port events. This interface has to be called separately for for PORT_ERR and PORT_ACTIVE events. When the handler for remote PORT_ERR occurs, then RC QP's path_mig_state can be changed to IB_MIG_MIGRATED to cause the failover. For the rearming, IB_EVENT_PORT_ACTIVE event handler can be used on Active side to reload the alternate path by sending the LAP message. When LAP is received on the Passive side or when APR is received on the Active side, alternate path can be reloaded with the interface ib_cm_init_rearm_attr() as described on bug#172. When a port is rearmed on the Passive side same ib_sa_serv_notice_hdlr() interface's callback handler for PORT_ACTIVE event can be used to send the LAP/APR messages and same ib_cm_init_rearm_attr() can be used to rearm the alternate path. VBabu From venkatesh.babu at 3leafnetworks.com Tue Oct 24 18:59:41 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Tue, 24 Oct 2006 18:59:41 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453D6E66.4050902@veritas.com> References: <45393194.40707@3leafnetworks.com> <453D4552.4040304@veritas.com> <453E9D11.3060304@3leafnetworks.com> <453D496E.10805@veritas.com> <453EBFA8.1000709@3leafnetworks.com> <453D6E66.4050902@veritas.com> Message-ID: <453EC50D.8000803@3leafnetworks.com> I have implemented this interface and proposing this in bug#172. You have to patch it manually to get it working. VBabu somenath wrote: > I don't see that API( ib_cm_init_rearm_attr() the build I am using. > > so I I use a function like this: > > modifyqp_rearm(con_t *con) { > struct ib_qp_attr qp_attr; > int qp_attr_mask =0; > int ib_stat =0; > > memset(&qp_attr, 0, sizeof(struct ib_qp_attr)); > > qp_attr.qp_state = IB_QPS_RTS; > ib_stat = ib_cm_init_qp_attr( > con ->cm_id, > &qp_attr, > &qp_attr_mask); > if (ib_stat) { > return ib_stat; > } > qp_attr.path_mig_state = IB_MIG_REARM; > qp_attr_mask |= IB_QP_PATH_MIG_STATE; > ib_stat = ib_modify_qp(con->qp, &qp_attr, qp_attr_mask); > if (ib_stat) { > return ib_stat; > } > return 0; > } > > Active side sends the LAP. Both side is called with APR event. > I call this function at the APR handler and it returns me the error -22 > when tried to ib_modify_qp(). > > see anything wrong? do I have to set anything else? > or is it broken in the build I am using? > > thanks, som. > > > Venkatesh Babu wrote: > >> 1. No my application has to make the state trasition from RTS to RTS. >> In case of failover set path_mig_state to IB_MIG_MIGRATED. >> >> In case of rearming call the function ib_cm_init_rearm_attr(). This >> function is defined in the bug#172. >> >> 2. No, I have my own ULP module which sits alongside to the SDP, SRP, >> IPoIB modules. >> >> VBabu >> >> somenath wrote: >> >>> Venkatesh Babu wrote: >>> >>>> 1. Yes, I can rearm the alternate path by sending LAP and APR >>>> messages. >>> >>> >>> >>> >>> >>> does the qpair go to rearm state just by sending LAP and APR messages? >>> I mean, you don't have to change the QP state to REARM explicitely? >>> >>>> >>>> 2. I was sending some network traffic (netperf) while doing these >>>> failovers. >>>> >>> so, I assume its SDP's APM feature gets tested? is that true? >>> >>> thanks, som. >>> > From sean.hefty at intel.com Tue Oct 24 21:14:36 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 21:14:36 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453EC50D.8000803@3leafnetworks.com> Message-ID: <000001c6f7ec$15567d10$17fd070a@amr.corp.intel.com> > I have implemented this interface and proposing this in bug#172. You >have to patch it manually to get it working. Can you post the patch to the list? If the functionality is needed, I'd like to queue it for 2.6.20. Although, I'd prefer if the existing ib_cm_init_qp_attr() routine were used with state handling instead of adding a new API routine. And thinking about it more, we might be able to use that call as the indication to the ib_cm to failover to the alternate path. - Sean From sean.hefty at intel.com Tue Oct 24 21:20:09 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 21:20:09 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453D5E89.5040801@veritas.com> Message-ID: <000101c6f7ec$dbc8f2c0$17fd070a@amr.corp.intel.com> >but this function > >cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv, > struct ib_qp_attr *qp_attr, > int *qp_attr_mask) > >does perform the state transition using >if (cm_id_priv->alt_av.ah_attr.dlid) { > *qp_attr_mask |= IB_QP_PATH_MIG_STATE; > qp_attr->path_mig_state = IB_MIG_REARM; > } > >which works for me the first time I load path (using send REQ) and >change state to RTS. This routine only sets the qp attributes needed for the transition. The user must still call ib_modify_qp(). This was the compromise for not having the cm modify the qp state; it tried to make it easy on the user. - Sean From sean.hefty at intel.com Tue Oct 24 21:27:17 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 24 Oct 2006 21:27:17 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453EC46E.30803@3leafnetworks.com> Message-ID: <000201c6f7ed$db0c15f0$17fd070a@amr.corp.intel.com> >The problem is for the the remote node where RC QP listen was accepted >this connection (Passive side). There is no way for this node to know >that port failure has occurred on the Active side. So it requires some >interfaces to get this notification. So ib_sa_serv_notice_hdlr() >interface as described in the patch attached to bug#159 can be used to >register for the remote port events. This interface has to be called >separately for for PORT_ERR and PORT_ACTIVE events. When the handler for >remote PORT_ERR occurs, then RC QP's path_mig_state can be changed to >IB_MIG_MIGRATED to cause the failover. Hal pointed me to your patches for this, since I'm working on adding InformInfo / Notice support. I believe that a good portion of the code there is usable. What I didn't see in the patch was reference counting to handle multiple users registering for the same event, but I'm planning on leveraging the multicast handling code for that. - Sean From grundler at parisc-linux.org Tue Oct 24 23:30:22 2006 From: grundler at parisc-linux.org (Grant Grundler) Date: Wed, 25 Oct 2006 00:30:22 -0600 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: Message-ID: <20061025063022.GC12319@colo.lackof.org> On Tue, Oct 24, 2006 at 12:13:19PM -0700, Roland Dreier wrote: > John Partridge found an interesting bug involving mthca (Mellanox > InfiniBand HCA driver) on IA64/Altix systems. Basically, during > initialization, mthca does: > > - do some config writes, including enabling BARs > - then start a firmware command > - read an MMIO register from a BAR (to check if FW is busy) > > However, John found that the Altix PCI-X bridge was allowing the MMIO > read to start before the config write was done (which is allowed by > the PCI spec). Can someone provide a quote of the PCI Local bus spec that allows this? (Or at least a reference to a spec version and section number) > The PCI trace looked like: > > 23454: Config Write REG = 01 TYPE = 1 BE = 0000 Req = (0,0,0) Tag = 1 Bus = 1 Device = 0 Function = 0 WAIT = 2 > 23462: Memory Rd DW A = 00280698 BE = 0000 Req = (0,0,0) Tag = 0 WAIT = 2 > 23470: Split compl. Lower A = 00 Req = (0,0,0) Tag = 0 Comp = (0,2,0) WAIT = 1 (Error completion) > 23476: Split compl. Lower A = 00 Req = (0,0,0) Tag = 1 Comp = (0,2,0) WAIT = 1 (Normal completion of WRITE) > > and that "Error completion" leads to a crash. > > John proposed the following patch to fix this, which looks good to > me. However, I have a couple of questions about this situation: > > 1) Is this something that should be fixed in the driver? The PCI > spec allows MMIO cycles to start before an earlier config cycle > completed, but do we want to expose this fact to drivers? I would prefer we did not. > Would > it be better for ia64 to use some sort of barrier to make sure > pci_write_config_xxx() is strongly ordered with MMIO? That would be my preference. > 2) Is this issue lurking in other drivers? > > Thanks, > Roland > > commit 424b50b6360b325ce642ece687756a600c25d28a > Author: John Partridge > Date: Tue Oct 24 11:54:16 2006 -0700 > > IB/mthca: Make sure all PCI config writes reach device before doing MMIO > > During initialization, mthca writes some PCI config space registers > and then does an MMIO read from one of the BARs it just enabled. This > MMIO read sometimes failed and caused a crash on SGI Altix machines, > because the PCI-X host bridge (legitimately, according to the PCI > spec) allowed the MMIO read to start before the config write completed. Because of this past discussion with jesse barnes, I'm leary of any kind of writes traveling through SN2 fabric. The issue is described pretty well here: http://www.usenetlinux.com/archive/topic.php/t-49141.html I don't know that this is the same (or similar) problem. > To fix this, add a config read after all config writes to make sure > they are all done before starting the MMIO read. > > Signed-off-by: John Partridge > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c > index 91934f2..578dc7c 100644 > --- a/drivers/infiniband/hw/mthca/mthca_reset.c > +++ b/drivers/infiniband/hw/mthca/mthca_reset.c > @@ -281,6 +281,20 @@ good: > goto out; > } > > + /* > + * Perform a "flush" of the PCI config writes here by reading > + * the PCI_COMMAND register. This is needed to make sure that > + * we don't try to touch other PCI BARs before the config > + * writes are done -- otherwise an MMIO cycle could start > + * before the config writes are done and reach the HCA before > + * the BAR is actually enabled. > + */ If this code is accepted, the comment should provide a specific reference (PCI Version + section) to the PCI spec that allows the out-of-order. I agree with jgarzik that the drivers already expect config cycles to be ordered with respect to MMIO cycles. I'm looking at arch/ia64/pci/pci.c. Wouldn't it be reasonable to include memory barriers around calls to SAL config space access functions? thanks, grant From bugzilla-daemon at openib.org Wed Oct 25 00:30:29 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 25 Oct 2006 00:30:29 -0700 (PDT) Subject: [openib-general] [Bug 286] "ifconfig ib# down" hangs telnet connection-- NETDEV WATCHDOG: ib0: transmit timed out Message-ID: <20061025073029.AC2002283D8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=286 ------- Comment #1 from tziporet at mellanox.co.il 2006-10-25 00:30 ------- Can you check it with OFED 1.1? Also please upgrade FW version to 4.7.600 since this is the version we qualified OFED 1.1 with Thanks, Tziporet ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Wed Oct 25 03:50:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 12:50:58 +0200 Subject: [openib-general] new server up and running In-Reply-To: <20061023232250.GA3118@cuprite.pathscale.com> References: <20061023232250.GA3118@cuprite.pathscale.com> Message-ID: <20061025105058.GA11682@mellanox.co.il> OK, I tested this a bit. The machine and the inet connection seem very fast, thanks for that! An issue I've run into: mst at hosting:~$ git clone -n 'git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git' fatal: read error (Connection reset by peer) fetch-pack from 'git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git' failed. mst at hosting:~$ git clone -n 'git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git' remote: Generating pack... remote: Done counting 357146 objects. remote: Deltifying 357146 objects. remote: 100% (357146/357146) done remote: Total 357146, written 357146 (delta 282678), reused 357057 (delta 282610) This seems to happen each second clone I do. What's up? Now, what's the URL to access git from outside? I do: git clone git://69.55.231.195/~mst/linux-2.6 and this fails. I think it is best to be running with --user-path=scm so that I can put git trees under scm sub-directory and have that part exported. -- MST From jackm at dev.mellanox.co.il Wed Oct 25 03:54:24 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 25 Oct 2006 12:54:24 +0200 Subject: [openib-general] [PATCH 2/2] libibverbs-1.0: return sq_draining value in query_qp response Message-ID: <200610251254.24827.jackm@dev.mellanox.co.il> Return the sq_draining value back to user space for query_qp instead of the en_sqd_async notify value. This last is valid only for modify_qp. For query_qp (according to the IB Spec V1.2), the draining status should returned. Signed-off-by: Jack Morgenstein --- Roland, Without this patch, sq_draining management can be done only via the draining event. Furthermore, the current situation does not comply with the IB Spec. Finally, the ABI structure change is only a name change - no need to increment the ABI version. Also, please enter this change for libibverbs 1.1 as well. Index: 1.1/src/userspace/libibverbs-1.0/src/cmd.c =================================================================== --- 1.1.orig/src/userspace/libibverbs-1.0/src/cmd.c 2006-08-06 10:27:11.000000000 +0300 +++ 1.1/src/userspace/libibverbs-1.0/src/cmd.c 2006-10-25 12:20:00.385513000 +0200 @@ -633,7 +633,7 @@ int ibv_cmd_query_qp(struct ibv_qp *qp, attr->cur_qp_state = resp.cur_qp_state; attr->path_mtu = resp.path_mtu; attr->path_mig_state = resp.path_mig_state; - attr->en_sqd_async_notify = resp.en_sqd_async_notify; + attr->sq_draining = resp.sq_draining; attr->max_rd_atomic = resp.max_rd_atomic; attr->max_dest_rd_atomic = resp.max_dest_rd_atomic; attr->min_rnr_timer = resp.min_rnr_timer; Index: 1.1/src/userspace/libibverbs-1.0/include/infiniband/kern-abi.h =================================================================== --- 1.1.orig/src/userspace/libibverbs-1.0/include/infiniband/kern-abi.h 2006-08-06 10:27:09.000000000 +0300 +++ 1.1/src/userspace/libibverbs-1.0/include/infiniband/kern-abi.h 2006-10-25 12:06:17.282533000 +0200 @@ -506,7 +506,7 @@ struct ibv_query_qp_resp { __u8 cur_qp_state; __u8 path_mtu; __u8 path_mig_state; - __u8 en_sqd_async_notify; + __u8 sq_draining; __u8 max_rd_atomic; __u8 max_dest_rd_atomic; __u8 min_rnr_timer; From jackm at dev.mellanox.co.il Wed Oct 25 03:54:20 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 25 Oct 2006 12:54:20 +0200 Subject: [openib-general] [PATCH 1/2] ib/core/uverbs: return sq_draining value in query_qp response Message-ID: <200610251254.20996.jackm@dev.mellanox.co.il> Return the sq_draining value back to user space for query_qp instead of the en_sqd_async notify value. This last is valid only for modify_qp. For query_qp, the draining status should returned. Signed-off-by: Jack Morgenstein --- Roland, Without this patch, can't manage sq draining via polling. Furthermore, without the patch, userspace query_qp does not comply with the IB Spec. Finally, the ABI structure change is only a name change - no need to increment the ABI version. Can this be queued for 2.6.19? Index: ofed_1_1/drivers/infiniband/core/uverbs_cmd.c =================================================================== --- ofed_1_1.orig/drivers/infiniband/core/uverbs_cmd.c 2006-08-03 14:30:21.000000000 +0300 +++ ofed_1_1/drivers/infiniband/core/uverbs_cmd.c 2006-10-25 12:16:02.818144000 +0200 @@ -1216,7 +1216,7 @@ ssize_t ib_uverbs_query_qp(struct ib_uve resp.qp_access_flags = attr->qp_access_flags; resp.pkey_index = attr->pkey_index; resp.alt_pkey_index = attr->alt_pkey_index; - resp.en_sqd_async_notify = attr->en_sqd_async_notify; + resp.sq_draining = attr->sq_draining; resp.max_rd_atomic = attr->max_rd_atomic; resp.max_dest_rd_atomic = attr->max_dest_rd_atomic; resp.min_rnr_timer = attr->min_rnr_timer; Index: ofed_1_1/include/rdma/ib_user_verbs.h =================================================================== --- ofed_1_1.orig/include/rdma/ib_user_verbs.h 2006-08-03 14:31:48.000000000 +0300 +++ ofed_1_1/include/rdma/ib_user_verbs.h 2006-10-25 12:15:29.119097000 +0200 @@ -456,7 +456,7 @@ struct ib_uverbs_query_qp_resp { __u8 cur_qp_state; __u8 path_mtu; __u8 path_mig_state; - __u8 en_sqd_async_notify; + __u8 sq_draining; __u8 max_rd_atomic; __u8 max_dest_rd_atomic; __u8 min_rnr_timer; From mst at mellanox.co.il Wed Oct 25 05:01:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 14:01:30 +0200 Subject: [openib-general] Fwd: [ANNOUNCE] GIT 1.4.3.2 Message-ID: <20061025120130.GB11732@mellanox.co.il> FYI. Of interest to openib is the change fixing backwards-compatibility - this might help people running old git clients. -- MST -------------- next part -------------- An embedded message was scrubbed... From: "Junio C Hamano" Subject: [ANNOUNCE] GIT 1.4.3.2 Date: Tue, 24 Oct 2006 08:27:07 +0200 Size: 4455 URL: From mst at mellanox.co.il Wed Oct 25 05:06:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 14:06:37 +0200 Subject: [openib-general] new server up and running In-Reply-To: <20061025105058.GA11682@mellanox.co.il> References: <20061023232250.GA3118@cuprite.pathscale.com> <20061025105058.GA11682@mellanox.co.il> Message-ID: <20061025120637.GC11732@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: new server up and running > > OK, I tested this a bit. > The machine and the inet connection seem very fast, thanks for that! > > An issue I've run into: > > mst at hosting:~$ git clone -n > 'git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git' > fatal: read error (Connection reset by peer) > fetch-pack from > 'git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git' failed. OK, this was kernel.org problem. > Now, what's the URL to access git from outside? > I do: > git clone git://69.55.231.195/~mst/linux-2.6 > and this fails. > > I think it is best to be running with --user-path=scm so that I can > put git trees under scm sub-directory and have that part exported. But this still seem to be an issue. -- MST From mst at mellanox.co.il Wed Oct 25 05:17:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 14:17:33 +0200 Subject: [openib-general] ofed git layout ideas In-Reply-To: References: <1161626885.10502.4.camel@skippy.ornl.gov> <20061023184918.GA19781@sashak.voltaire.com> <537C6C0940C6C143AA46A88946B854170386CA55@ORNLEXCHANGE.ornl.gov> Message-ID: <20061025121733.GD11732@mellanox.co.il> Roland (as probably our most experienced git user), all, would like to ask your ideas about ofed git layout. Basically, here's what ofed kernel code needs to include: - upstream kernel (we really only used the infiniband/rdma subdirectories from there) - ofed addons, including: - "backport patches" - patches to infiniband directory to make it work on older kernels - code for out of kernel modules (mostly experimental/debugging stuff) - "fixes" - experimental/debugging additions for upstream modules we also used this for last-minute fixes after code freeze, to make it easier to roll changes back immediately in case of issues - build, uverbs etc scripts So, what we had for ofed 1.1, was a common git tree that had the addons and additionally pulled from upstream from time to time. This means, however, that life was harder for people working just on backports or just scripts as they always had to check out the full linux tree. So I wandered whether it would be cleaner to just have 2 git trees - one for ofed addons - another one tracking upstream that addons are tested to work against The advantage would be that its easier to see what ofed adds on top of linux kernel, and what kernel version ofed is based on, a disadvantage would be that we how have 2 git checksums as ofed kernel identifiers, and not 1 as previously. For users of release not much would change as we still can always tag both with the same tag. What do you think? -- MST From rdreier at cisco.com Wed Oct 25 07:04:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 07:04:56 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024225935.GK4054@obsidianresearch.com> (Jason Gunthorpe's message of "Tue, 24 Oct 2006 16:59:35 -0600") References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024225935.GK4054@obsidianresearch.com> Message-ID: > I'm not sure that can work either. The PCI-X spec is very clear, you > must wait for a non-posted completion if you care about order. Doing a > config read in the driver as a surrogate flush is not good enough in > the general case. Like you say, a pci bridge is free to reorder all > in flight non-posted operations. No, hang on. Nothing can reorder a dependent read to start after a write that it depends on, can it? So a config read of PCI_COMMAND can't start until the completion of a config write of the same register, right? - R. From rdreier at cisco.com Wed Oct 25 07:05:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 07:05:59 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024232755.GA26521@sgi.com> (Jack Steiner's message of "Tue, 24 Oct 2006 18:27:55 -0500") References: <20061024192210.GE2043@havoc.gtf.org> <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024232755.GA26521@sgi.com> Message-ID: > I'll check if there is any additional reordering that can occur AFTER the > PIO_WRITE_COUNT goes to zero. If so, it would be at bus level - not in > shub or routers. Unfortunately, at least in theory, the reordering can occur. For example a bridge on some card plugged into an SN slot is allowed to reorder things too. - R. From rdreier at cisco.com Wed Oct 25 07:09:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 07:09:47 -0700 Subject: [openib-general] [PATCH 1/2] ib/core/uverbs: return sq_draining value in query_qp response In-Reply-To: <200610251254.20996.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 25 Oct 2006 12:54:20 +0200") References: <200610251254.20996.jackm@dev.mellanox.co.il> Message-ID: > Finally, the ABI structure change is only a name change - > no need to increment the ABI version. I don't believe this is true unfortunately. If you have a new libibverbs and an old kernel, then you will return the value of en_sqd_async_notify to the consumer but tell the consumer that the value represents sq_draining. Fortunately ib_uverbs_query_qp_resp has some reserved slots we can use for sq_draining. However, this means that an old kernel will always return 0 for sq_draining. However I'm not sure if we could do better... - R. From rdreier at cisco.com Wed Oct 25 07:11:06 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 07:11:06 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061025063022.GC12319@colo.lackof.org> (Grant Grundler's message of "Wed, 25 Oct 2006 00:30:22 -0600") References: <20061025063022.GC12319@colo.lackof.org> Message-ID: > I'm looking at arch/ia64/pci/pci.c. > Wouldn't it be reasonable to include memory barriers around calls > to SAL config space access functions? It's reasonable, but is there a memory barrier strong enough to guarantee that a config write has actually completed? - R. From sanjeewah at millenniumit.com Wed Oct 25 07:10:28 2006 From: sanjeewah at millenniumit.com (Sanjeewa Herath) Date: Wed, 25 Oct 2006 20:10:28 +0600 Subject: [openib-general] building OFED-1.1 References: Message-ID: <00d301c6f83f$52c1f9b0$140a19ac@SanjeewaHerath> i tried to built the OFED-1.1 and got the following error when building ibutils. What is going wrong? If this is not the appropiate mailing list pls let me know where to inform. Thanks. Sanjeewa updating cache /var/tmp/OFED/ibutils.cache configure: creating ./config.status config.status: creating ibutils.spec config.status: creating Makefile configure: configuring in ibis configure: running /bin/sh './configure' --prefix=/usr/local/ofed '--build=i686-redha t-linux-gnu' '--host=i686-redhat-linux-gnu' '--target=i386-redhat-linux-gnu' '--progra m-prefix=' '--exec-prefix=/usr/local/ofed' '--bindir=/usr/local/ofed/bin' '--sbindir=/ usr/local/ofed/sbin' '--sysconfdir=/etc' '--datadir=/usr/local/ofed/share' '--included ir=/usr/local/ofed/include' '--libdir=/usr/local/ofed/lib' '--libexecdir=/usr/local/of ed/libexec' '--localstatedir=/var' '--sharedstatedir=/usr/local/ofed/com' '--infodir=/ usr/share/info' '--prefix=/usr/local/ofed' '--mandir=/usr/local/ofed/man' '--cache-fil e=/var/tmp/OFED/ibutils.cache' '--with-osm=/var/tmp/OFED/usr/local/ofed' 'build_alias= i686-redhat-linux-gnu' 'host_alias=i686-redhat-linux-gnu' 'target_alias=i386-redhat-li nux-gnu' --cache-file=/var/tmp/OFED/ibutils.cache --srcdir=. configure: loading cache /var/tmp/OFED/ibutils.cache checking for a BSD-compatible install... (cached) /usr/bin/install -c checking whether build environment is sane... yes checking for gawk... (cached) gawk checking whether make sets $(MAKE)... (cached) yes checking whether to enable maintainer-specific portions of Makefiles... no checking whether make sets $(MAKE)... (cached) yes checking for a BSD-compatible install... /usr/bin/install -c checking whether ln -s works... yes checking for swig... /usr/local/bin/swig checking for SWIG version... SWIG Version 1.1 (Patch 5) checking for i686-redhat-linux-gnu-g++... no checking for i686-redhat-linux-gnu-c++... no checking for i686-redhat-linux-gnu-gpp... no checking for i686-redhat-linux-gnu-aCC... no checking for i686-redhat-linux-gnu-CC... no checking for i686-redhat-linux-gnu-cxx... no checking for i686-redhat-linux-gnu-cc++... no checking for i686-redhat-linux-gnu-cl... no checking for i686-redhat-linux-gnu-FCC... no checking for i686-redhat-linux-gnu-KCC... no checking for i686-redhat-linux-gnu-RCC... no checking for i686-redhat-linux-gnu-xlC_r... no checking for i686-redhat-linux-gnu-xlC... no checking for g++... g++ checking for C++ compiler default output file name... configure: error: C++ compiler c annot create executables See `config.log' for more details. configure: error: /bin/sh './configure' failed for ibis error: Bad exit status from /var/tmp/rpm-tmp.67058 (%install) RPM build errors: user vlad does not exist - using root group mtl does not exist - using root user vlad does not exist - using root group mtl does not exist - using root Bad exit status from /var/tmp/rpm-tmp.67058 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --defi ne 'configure_options --prefix=/usr/local/ofed --mandir=/usr/local/ofed/man --cache-fi le=/var/tmp/OFED/ibutils.cache --with-osm=/var/tmp/OFED/usr/local/ofed' --define '_pre fix /usr/local/ofed' --define '_libdir /usr/local/ofed/lib' --define '_mandir %{_prefi x}/man' --define 'build_root /var/tmp/OFED' /root/OFED-1.1/SRPMS/ibutils-1.0-0.src.rpm " ********************************************** The information contained in this email is confidential and is meant to be read only by the person to whom it is addressed. Please visit http://www.millenniumit.com/legal/email.htm to read the entire confidentiality clause. ********************************************** From rdreier at cisco.com Wed Oct 25 07:15:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 07:15:16 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061024.154347.77057163.davem@davemloft.net> (David Miller's message of "Tue, 24 Oct 2006 15:43:47 -0700 (PDT)") References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> Message-ID: > One thing is that we definitely don't want to fix this by, > for example, reading back the PCI_COMMAND register or something > like that. That causes two problems: > > 1) Some PCI config writes shut the device down and make it > no respond to some kinds of PCI config transactions. > One example is putting the device into D3 or similar > power state, another is performing a device reset. Hmm... it seems there is no other guaranteed way to make sure a config write has really completed except doing a config read. And only the driver knows what the config access it's doing means. So the conclusion we seem to be forced into is that drivers need to include these reads in the cases where they are needed. - R. From matthew at wil.cx Wed Oct 25 07:18:59 2006 From: matthew at wil.cx (Matthew Wilcox) Date: Wed, 25 Oct 2006 08:18:59 -0600 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061025063022.GC12319@colo.lackof.org> References: <20061025063022.GC12319@colo.lackof.org> Message-ID: <20061025141859.GC5591@parisc-linux.org> On Wed, Oct 25, 2006 at 12:30:22AM -0600, Grant Grundler wrote: > Can someone provide a quote of the PCI Local bus spec that allows this? > (Or at least a reference to a spec version and section number) PCI-PCI bridges are allowed to do it. If you look in table E-1 of PCI 2.3, or table 8-3 of PCI-X 2.0, you'll see that a Posted Memory Write can pass a Delayed Write Request (or in PCI-X, a Memory Write can pass a Split Write Request). So mmiowb() will solve the problem for Altix, but leave everybody else vulnerable. I actually don't see a way of forcing the config write to complete before a memory write -- everything is allowed to pass a config write, even a config read. I initially thought "But only a crack monkey would implement a system where a config read could pass a config write", but the spec explains that: In most PCI-X implementations, Split Requests are managed in separate buffers from Split Completions, so Split Requests naturally pass Split Completions. However, no deadlocks occur if Split Completions block Split Requests. So all this code that checks to see if a write had an effect is unsafe. I'm a little perturbed by this. It means the only way to reliably distinguish between a write that hasn't taken effect yet and a bit (say, MWI) the device hasn't implemented is to do a memory access to the device. Which is hard when you're trying to program the BARs. I suppose this hasn't bitten us before in, what, 7 years of PCI-X, so it can't be *that* common a thing for bridges to do. And we would have noticed the BAR sizing code going wrong (as it does config write followed immediately by config read), so maybe implementations aren't as crackful as the PCI spec seems to permit them to be. I find it really hard to believe the PCI committee have done something this stupid. There must be another rule somewhere that I'm missing. From changquing.tang at hp.com Wed Oct 25 07:59:23 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Wed, 25 Oct 2006 09:59:23 -0500 Subject: [openib-general] APM support in openib stack In-Reply-To: <453EAB8B.402@ichips.intel.com> Message-ID: How do you know that the old path is back ? Do you have a notification handler called, or you need to query periodically ? Thanks. --CQ > another question: If one brings back the old path (after the first > failure) and use the old path record > to do lap/apr then reenabling migration using LAP/APR should work, right? Yes. > next question is: which component is suppossed to change QP state to > REARM? spec just says: > "based on a command from a management entity, the QP state is set to > REARM" . The ib_cm does not perform any QP state transitions. That is left up to the user, since the CM is unaware of other actions related to the QP that the user may be performing. - Sean _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at dev.mellanox.co.il Wed Oct 25 08:30:36 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 25 Oct 2006 17:30:36 +0200 Subject: [openib-general] creating releases for the libraries you own Message-ID: <453F831C.4010204@dev.mellanox.co.il> Hi Hal & Sean, I want to suggest that you will create releases to the libraries you own (like Roland maintains for libibverbs). This will help us in OFED integration since we will be able to start the release from a known stable version, instead of taking code with unknown stability from svn. Thanks, Tziporet From mshefty at ichips.intel.com Wed Oct 25 09:38:01 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Oct 2006 09:38:01 -0700 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <453F831C.4010204@dev.mellanox.co.il> References: <453F831C.4010204@dev.mellanox.co.il> Message-ID: <453F92E9.8050908@ichips.intel.com> Tziporet Koren wrote: > I want to suggest that you will create releases to the libraries you own > (like Roland maintains for libibverbs). > This will help us in OFED integration since we will be able to start the > release from a known stable version, instead of taking code with unknown > stability from svn. I'm aware of this for libibcm and librdmacm. I will start a release of librdmacm once we have upstream support for a userspace rdma_cm. I've delayed releasing libibcm until there's better userspace support for SA queries, but I could be convinced otherwise. - Sean From mshefty at ichips.intel.com Wed Oct 25 09:41:19 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Oct 2006 09:41:19 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: References: Message-ID: <453F93AF.1020706@ichips.intel.com> > How do you know that the old path is back ? Do you have a notification > handler called, or you need to query periodically ? The cm doesn't know that it's back. It records the original path used when sending the REQ and uses that same path for all future messages. - Sean From jackm at dev.mellanox.co.il Wed Oct 25 09:49:24 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 25 Oct 2006 18:49:24 +0200 Subject: [openib-general] [PATCH 1/2] ib/core/uverbs: return sq_draining value in query_qp response In-Reply-To: References: <200610251254.20996.jackm@dev.mellanox.co.il> Message-ID: <200610251849.25368.jackm@dev.mellanox.co.il> On Wednesday 25 October 2006 16:09, Roland Dreier wrote: > > Finally, the ABI structure change is only a name change - > > no need to increment the ABI version. > > I don't believe this is true unfortunately. If you have a new > libibverbs and an old kernel, then you will return the value of > en_sqd_async_notify to the consumer but tell the consumer that the > value represents sq_draining. > Why is this any worse than having a situation where a kernel module has a bug in which it returns an incorrect value in an ABI field in which no name change is involved? If we fix such a bug, do we need to increment the ABI as well? ( I do agree, though, that the sq_draining case pushes the above argument to the limit. I was aware of your observation above. I'm just uncomfortable with needing to increment the ABI version for all bug fixes, and I treat the above - including the name change - as a bug fix). - Jack From changquing.tang at hp.com Wed Oct 25 10:01:36 2006 From: changquing.tang at hp.com (Tang, Changqing) Date: Wed, 25 Oct 2006 12:01:36 -0500 Subject: [openib-general] APM support in openib stack In-Reply-To: <453F93AF.1020706@ichips.intel.com> Message-ID: Thanks, In situation that path 1 is primary, path 2 is alternate, when path 1 failed, traffic migrates to path 2. Now we can think that path 2 is primary, but there is no alternate path. Is there a way (not limited to cm) to know that path 1 is back again and reload it as the new alternate path ? If path 1 is down, we can not set it as alternate path, right ? If we can not bring a path back on fly, the usage of APM is limited. --CQ -----Original Message----- From: Sean Hefty [mailto:mshefty at ichips.intel.com] Sent: Wednesday, October 25, 2006 11:41 AM To: Tang, Changqing Cc: somenath at veritas.com; Venkatesh Babu; openib-general at openib.org Subject: Re: [openib-general] APM support in openib stack > How do you know that the old path is back ? Do you have a notification > handler called, or you need to query periodically ? The cm doesn't know that it's back. It records the original path used when sending the REQ and uses that same path for all future messages. - Sean From halr at voltaire.com Wed Oct 25 10:08:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 25 Oct 2006 19:08:18 +0200 Subject: [openib-general] creating releases for the libraries you own References: <453F831C.4010204@dev.mellanox.co.il> Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F501894376@taurus.voltaire.com> Yes, I plan on doing this moving forward. -- Hal ________________________________ From: Tziporet Koren [mailto:tziporet at dev.mellanox.co.il] Sent: Wed 10/25/2006 11:30 AM To: Hal Rosenstock; Sean Hefty Cc: OPENIB Subject: creating releases for the libraries you own Hi Hal & Sean, I want to suggest that you will create releases to the libraries you own (like Roland maintains for libibverbs). This will help us in OFED integration since we will be able to start the release from a known stable version, instead of taking code with unknown stability from svn. Thanks, Tziporet From mshefty at ichips.intel.com Wed Oct 25 10:08:34 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Oct 2006 10:08:34 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: References: Message-ID: <453F9A12.4020500@ichips.intel.com> Tang, Changqing wrote: > Is there a way (not limited to cm) to know that path 1 is back again and > reload it as the new alternate path ? If path 1 is down, we can not set > it as alternate path, right ? > > If we can not bring a path back on fly, the usage of APM is limited. Yes - the usage is currently limited, but that is being worked on. Support is needed in both the ib_cm and ib_sa. The ib_sa needs SA informinfo/notice support, so users can receive fabric event notifications. And the ib_cm needs to know when to switch to the alternate path. Venkatesh has a couple of patches that assist with both of these. - Sean From jgunthorpe at obsidianresearch.com Wed Oct 25 10:15:54 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 25 Oct 2006 11:15:54 -0600 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061025141859.GC5591@parisc-linux.org> References: <20061025063022.GC12319@colo.lackof.org> <20061025141859.GC5591@parisc-linux.org> Message-ID: <20061025171554.GM4054@obsidianresearch.com> On Wed, Oct 25, 2006 at 08:18:59AM -0600, Matthew Wilcox wrote: > PCI-PCI bridges are allowed to do it. If you look in table E-1 of PCI > 2.3, or table 8-3 of PCI-X 2.0, you'll see that a Posted Memory Write > can pass a Delayed Write Request (or in PCI-X, a Memory Write can pass a > Split Write Request). Carefull here.. Only MMIO writes are of the posted variety. Non posted transactions (config write, IO write, all reads) are not ever allowed to pass a posted write, but they can be re-ordered. Table 8-3 shows that Split Write vs Split Read are all Y/N meaning they could be re-ordered with respect to each other. The problem with mthca is exactly this, although the operations were issued in-order by the bridges the end device (mthca) is free to complete them in any order, and it choose to complete the MMIO read before the config write. > In most PCI-X implementations, Split Requests are managed in separate > buffers from Split Completions, so Split Requests naturally pass Split > Completions. However, no deadlocks occur if Split Completions block > Split Requests. Again, this is only for posted writes. All these rules are designed to prevent the bus from deadlocking due to buffer starvation under certain situations. [Basically split completions and posted writes are given a seperate queue that can advance if the request queue is stalled. Otherwise you can deadlock a bridge] > So all this code that checks to see if a write had an effect is unsafe. > I'm a little perturbed by this. It means the only way to reliably MMIO based code that does this is correct and reliable.. PIO code that does this is only safe if the platform is waiting for the PIO write completion before starting the PIO read. PCI-X (pg 80) says this about non-posted transaction ordering requirements: As in convention PCI, if a requester requires one non-posted transaction to complete before another, it must not initiate the second transaction until the first one compeltes. IMHO, all sane hardware implementations of config ops and PIO should block the host bridge until the completion is generated by the end device..I'm sure that most x86 platforms do this. (For instance I've observed this kind of behavior with a Hyper Transport probe on Opterons) This is more than just worrying about ordering, it is about how to engage a platform specific way to know that the completion has been generated. The person who suggested polling the PIO_OUTSTANDING register on SN2 seems to have the right idea (if that counts all pending non-posted operations, not just PIO ones) :| The risk of re-ordering is probably not so much in the bridges since that would be a fairly strange thing to do - but it is very likely in end-devices. This is especially true if the accesses are to different internal resources! Jason From mst at mellanox.co.il Wed Oct 25 10:59:35 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 19:59:35 +0200 Subject: [openib-general] Fwd: Re: [git failure] failure pulling latest Linus tree Message-ID: <20061025175935.GA13747@mellanox.co.il> CB. Looks like we better update git to 1.4.3.2 on the new server. ----- Forwarded message from Junio C Hamano ----- Subject: Re: [git failure] failure pulling latest Linus tree Date: Wed, 25 Oct 2006 19:35:03 +0200 From: "Junio C Hamano" "H. Peter Anvin" writes: > For some reason which we haven't been able to track down yet, the > recent load imposed by FC6 caused zeus1's load to skyrocket, but not > zeus2's... it's largely a mystery. Would kernel.org prefer RPM cut on a FC6 box now? > HOWEVER, git 1.4.3 seems to have been bad chicken. When we ran it we > got a neverending stream of segfaults in the logs. If that is git-daemon dying when talking to older clients, that has been fixed in 1.4.3.2 (it's virtual hosting support had an off-by-one wrong check to tell older clients from newer one). Sorry about that -- we heard about the incompatibility this Monday. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ----- End forwarded message ----- -- MST From rdreier at cisco.com Wed Oct 25 11:08:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 11:08:09 -0700 Subject: [openib-general] ofed git layout ideas In-Reply-To: <20061025121733.GD11732@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 25 Oct 2006 14:17:33 +0200") References: <1161626885.10502.4.camel@skippy.ornl.gov> <20061023184918.GA19781@sashak.voltaire.com> <537C6C0940C6C143AA46A88946B854170386CA55@ORNLEXCHANGE.ornl.gov> <20061025121733.GD11732@mellanox.co.il> Message-ID: > So, what we had for ofed 1.1, was a common git tree that had the addons and > additionally pulled from upstream from time to time. This means, however, that > life was harder for people working just on backports or just scripts as they always > had to check out the full linux tree. > > So I wandered whether it would be cleaner to just have 2 git trees > - one for ofed addons > - another one tracking upstream that addons are tested to work against > > The advantage would be that its easier to see what ofed adds on top of linux > kernel, and what kernel version ofed is based on, a disadvantage would be that > we how have 2 git checksums as ofed kernel identifiers, and not 1 as previously. I don't really have a strong opinion. But I would just stick to 1 tree. It should be easy to see what ofed adds on, just by doing a diff between the upstream branch and the ofed branch. And I don't think that checking out a full linux tree is a big deal -- maybe it takes a few extra seconds but I don't see any real disadvantage. From mst at mellanox.co.il Wed Oct 25 11:22:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 25 Oct 2006 20:22:52 +0200 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061025141859.GC5591@parisc-linux.org> References: <20061025063022.GC12319@colo.lackof.org> <20061025141859.GC5591@parisc-linux.org> Message-ID: <20061025182252.GA13967@mellanox.co.il> Quoting r. Matthew Wilcox : > Subject: Re: Ordering between PCI config space writes and MMIO reads? > > On Wed, Oct 25, 2006 at 12:30:22AM -0600, Grant Grundler wrote: > > Can someone provide a quote of the PCI Local bus spec that allows this? > > (Or at least a reference to a spec version and section number) > > PCI-PCI bridges are allowed to do it. If you look in table E-1 of PCI > 2.3, or table 8-3 of PCI-X 2.0, you'll see that a Posted Memory Write > can pass a Delayed Write Request (or in PCI-X, a Memory Write can pass a > Split Write Request). > > So mmiowb() will solve the problem for Altix, but leave everybody else > vulnerable. I actually don't see a way of forcing the config write to > complete before a memory write -- everything is allowed to pass a config > write, even a config read. I initially thought "But only a crack monkey > would implement a system where a config read could pass a config write", > but the spec explains that: > > In most PCI-X implementations, Split Requests are managed in separate > buffers from Split Completions, so Split Requests naturally pass Split > Completions. However, no deadlocks occur if Split Completions block > Split Requests. > > So all this code that checks to see if a write had an effect is unsafe. > I'm a little perturbed by this. It means the only way to reliably > distinguish between a write that hasn't taken effect yet and a bit (say, > MWI) the device hasn't implemented is to do a memory access to the > device. Which is hard when you're trying to program the BARs. > > I suppose this hasn't bitten us before in, what, 7 years of PCI-X, so > it can't be *that* common a thing for bridges to do. And we would have > noticed the BAR sizing code going wrong (as it does config write > followed immediately by config read), so maybe implementations aren't as > crackful as the PCI spec seems to permit them to be. > > I find it really hard to believe the PCI committee have done something > this stupid. There must be another rule somewhere that I'm missing. I think typically CPUs stall until a non-posted operation completes. And since config writes are non posted, pci_config_write_... write .... does not *start* the write until config write has completed. So there's only a single outstanding config operation and that's why there's never any re-ordering, without any need for flushes. Your Altix system seems the weird one here in that CPU actually treats config writes as posted and does not wait for their completion. I wander whether you can do a bus lock or something and force waiting till the completion. This would be much cleaner than trying to fix all drivers. -- MST From rdreier at cisco.com Wed Oct 25 11:50:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 25 Oct 2006 11:50:45 -0700 Subject: [openib-general] [PATCH 1/2] ib/core/uverbs: return sq_draining value in query_qp response In-Reply-To: <200610251849.25368.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 25 Oct 2006 18:49:24 +0200") References: <200610251254.20996.jackm@dev.mellanox.co.il> <200610251849.25368.jackm@dev.mellanox.co.il> Message-ID: Jack> Why is this any worse than having a situation where a kernel Jack> module has a bug in which it returns an incorrect value in Jack> an ABI field in which no name change is involved? Jack> If we fix such a bug, do we need to increment the ABI as well? Obviously fixing the kernel so it doesn't return a wrong value is OK. And I don't see any way to fix old libibverbs, which is just returning garbage to the consumer in the sq_draining field. However, you snipped the second part of my reply: Roland> Fortunately ib_uverbs_query_qp_resp has some reserved Roland> slots we can use for sq_draining. However, this means Roland> that an old kernel will always return 0 for sq_draining. Roland> However I'm not sure if we could do better... So I don't think we need to bump the ABI. Let's just take one of the reserved fields that we're lucky to have in the query QP response, and put sq_draining in there. - R. From sashak at voltaire.com Wed Oct 25 12:41:52 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 25 Oct 2006 21:41:52 +0200 Subject: [openib-general] [openfabrics-ewg] new server up and running In-Reply-To: <20061025105058.GA11682@mellanox.co.il> References: <20061023232250.GA3118@cuprite.pathscale.com> <20061025105058.GA11682@mellanox.co.il> Message-ID: <20061025194152.GA12061@sashak.voltaire.com> On 12:50 Wed 25 Oct , Michael S. Tsirkin wrote: > > Now, what's the URL to access git from outside? > I do: > git clone git://69.55.231.195/~mst/linux-2.6 > and this fails. git-daemon runs with '--base-path=/pub/scm', so git clone git://69.55.231.195/linux-2.6.18 will clone /pub/scm/linux-2.6.18 tree located on this machine. > I think it is best to be running with --user-path=scm so that I can > put git trees under scm sub-directory and have that part exported. '--user-path=scm' in addition to '--base-path'? Looks good for me. Sasha From or.gerlitz at gmail.com Wed Oct 25 12:39:42 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 25 Oct 2006 21:39:42 +0200 Subject: [openib-general] [PATCH 0/7 v2] for 2.6.20 rdma/cma: add userspace support In-Reply-To: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> References: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> Message-ID: <15ddcffd0610251239r7bf4c034jd2b12f69ea0d43bc@mail.gmail.com> On 10/25/06, Sean Hefty wrote: > The following set of patches expand the rdma_cm support to include > UD and multicast, and expose the rdma_cm to userspace. I would like to > target the 2.6.20 kernel, but at least getting them into one or more > branches would be helpful for other developers to test against these > changes. Sean, Just making sure, to test user space multicast with these patches, i can apply the patches on top of 2.6.19.rcX kernel and use librdmacm (and the mckey app) from the svn, correct? Or. From mshefty at ichips.intel.com Wed Oct 25 13:37:28 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 25 Oct 2006 13:37:28 -0700 Subject: [openib-general] [PATCH 0/7 v2] for 2.6.20 rdma/cma: add userspace support In-Reply-To: <15ddcffd0610251239r7bf4c034jd2b12f69ea0d43bc@mail.gmail.com> References: <000001c6f7bb$5c89fd50$a6d4180a@amr.corp.intel.com> <15ddcffd0610251239r7bf4c034jd2b12f69ea0d43bc@mail.gmail.com> Message-ID: <453FCB08.3060304@ichips.intel.com> Or Gerlitz wrote: > Just making sure, to test user space multicast with these patches, i > can apply the patches on top of 2.6.19.rcX kernel and use librdmacm > (and the mckey app) from the svn, correct? Actually, you will need my out of tree version of librdmacm. Reporting the connection parameters as part of the event, plus a fix to the multicast handling bumped the ABI. I updated the librdmacm to work with the modified kernel, but because of the differences didn't try to have the librdmacm support previous ABI versions. I'll post a patch that updates the librdmacm. - Sean From sean.hefty at intel.com Wed Oct 25 13:49:59 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 25 Oct 2006 13:49:59 -0700 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <15ddcffd0610251239r7bf4c034jd2b12f69ea0d43bc@mail.gmail.com> Message-ID: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> Updates the librdmacm to work with ABI version 3, which is the proposed kernel changes for inclusion in 2.6.20. Test programs are also updated. Signed-off-by: Sean Hefty --- Index: include/rdma/rdma_cma_abi.h =================================================================== --- include/rdma/rdma_cma_abi.h (revision 9192) +++ include/rdma/rdma_cma_abi.h (working copy) @@ -33,14 +33,15 @@ #ifndef RDMA_CMA_ABI_H #define RDMA_CMA_ABI_H +#include #include /* * This file must be kept in sync with the kernel's version of rdma_user_cm.h */ -#define RDMA_USER_CM_MIN_ABI_VERSION 1 -#define RDMA_USER_CM_MAX_ABI_VERSION 2 +#define RDMA_USER_CM_MIN_ABI_VERSION 3 +#define RDMA_USER_CM_MAX_ABI_VERSION 3 #define RDMA_MAX_PRIVATE_DATA 256 @@ -60,7 +61,7 @@ enum { UCMA_CMD_GET_EVENT, UCMA_CMD_GET_OPTION, UCMA_CMD_SET_OPTION, - UCMA_CMD_GET_DST_ATTR, + UCMA_CMD_ESTABLISH, UCMA_CMD_JOIN_MCAST, UCMA_CMD_LEAVE_MCAST }; @@ -71,11 +72,6 @@ struct ucma_abi_cmd_hdr { __u16 out; }; -struct ucma_abi_create_id_v1 { - __u64 uid; - __u64 response; -}; - struct ucma_abi_create_id { __u64 uid; __u64 response; @@ -133,7 +129,7 @@ struct ucma_abi_query_route_resp { struct ucma_abi_conn_param { __u32 qp_num; - __u32 qp_type; + __u32 reserved; __u8 private_data[RDMA_MAX_PRIVATE_DATA]; __u8 private_data_len; __u8 srq; @@ -145,6 +141,15 @@ struct ucma_abi_conn_param { __u8 valid; }; +struct ucma_abi_ud_param { + __u32 qp_num; + __u32 qkey; + struct ibv_kern_ah_attr ah_attr; + __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + __u8 private_data_len; + __u8 reserved[7]; +}; + struct ucma_abi_connect { struct ucma_abi_conn_param conn_param; __u32 id; @@ -180,25 +185,13 @@ struct ucma_abi_init_qp_attr { __u32 qp_state; }; -struct ucma_abi_join_mcast { - __u32 id; - struct sockaddr_in6 addr; - __u64 uid; -}; - -struct ucma_abi_leave_mcast { +struct ucma_abi_establish { __u32 id; - struct sockaddr_in6 addr; -}; - -struct ucma_abi_dst_attr_resp { - __u32 remote_qpn; - __u32 remote_qkey; - struct ibv_kern_ah_attr ah_attr; }; -struct ucma_abi_get_dst_attr { - __u64 response; +struct ucma_abi_join_mcast { + __u64 response; /* ucma_abi_create_id_resp */ + __u64 uid; struct sockaddr_in6 addr; __u32 id; }; @@ -212,30 +205,10 @@ struct ucma_abi_event_resp { __u32 id; __u32 event; __u32 status; - __u8 private_data_len; - __u8 reserved[3]; - __u8 private_data[RDMA_MAX_PRIVATE_DATA]; -}; - -struct ucma_abi_get_option { - __u64 response; - __u64 optval; - __u32 id; - __u32 level; - __u32 optname; - __u32 optlen; -}; - -struct ucma_abi_get_option_resp { - __u32 optlen; -}; - -struct ucma_abi_set_option { - __u64 optval; - __u32 id; - __u32 level; - __u32 optname; - __u32 optlen; + union { + struct ucma_abi_conn_param conn; + struct ucma_abi_ud_param ud; + } param; }; #endif /* RDMA_CMA_ABI_H */ Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 9272) +++ include/rdma/rdma_cma.h (working copy) @@ -61,11 +61,11 @@ enum rdma_port_space { RDMA_PS_UDP = 0x0111, }; -/* Protocol levels for get/set options. */ -enum { - RDMA_PROTO_IP = 0, - RDMA_PROTO_IB = 1, -}; +/* + * Global qkey value for all UD QPs and multicast groups created via the + * RDMA CM. + */ +#define RDMA_UD_QKEY 0x01234567 struct ib_addr { union ibv_gid sgid; @@ -74,8 +74,12 @@ struct ib_addr { }; struct rdma_addr { - struct sockaddr_in6 src_addr; - struct sockaddr_in6 dst_addr; + struct sockaddr src_addr; + uint8_t src_pad[sizeof(struct sockaddr_storage) - + sizeof(struct sockaddr)]; + struct sockaddr dst_addr; + uint8_t dst_pad[sizeof(struct sockaddr_storage) - + sizeof(struct sockaddr)]; union { struct ib_addr ibaddr; } addr; @@ -101,11 +105,25 @@ struct rdma_cm_id { uint8_t port_num; }; -struct rdma_multicast_data { - void *context; - struct sockaddr addr; - uint8_t pad[sizeof(struct sockaddr_in6) - - sizeof(struct sockaddr)]; +struct rdma_conn_param { + const void *private_data; + uint8_t private_data_len; + uint8_t responder_resources; + uint8_t initiator_depth; + uint8_t flow_control; + uint8_t retry_count; /* ignored when accepting */ + uint8_t rnr_retry_count; + /* Fields below ignored if a QP is created on the rdma_cm_id. */ + uint8_t srq; + uint32_t qp_num; +}; + +struct rdma_ud_param { + const void *private_data; + uint8_t private_data_len; + struct ibv_ah_attr ah_attr; + uint32_t qp_num; + uint32_t qkey; }; struct rdma_cm_event { @@ -113,8 +131,10 @@ struct rdma_cm_event { struct rdma_cm_id *listen_id; enum rdma_cm_event_type event; int status; - void *private_data; - uint8_t private_data_len; + union { + struct rdma_conn_param conn; + struct rdma_ud_param ud; + } param; }; /** @@ -206,20 +226,6 @@ int rdma_create_qp(struct rdma_cm_id *id */ void rdma_destroy_qp(struct rdma_cm_id *id); -struct rdma_conn_param { - const void *private_data; - uint8_t private_data_len; - uint8_t responder_resources; - uint8_t initiator_depth; - uint8_t flow_control; - uint8_t retry_count; /* ignored when accepting */ - uint8_t rnr_retry_count; - /* Fields below ignored if a QP is created on the rdma_cm_id. */ - uint8_t srq; - uint32_t qp_num; - enum ibv_qp_type qp_type; -}; - /** * rdma_connect - Initiate an active connection request. * @@ -251,6 +257,16 @@ int rdma_reject(struct rdma_cm_id *id, c uint8_t private_data_len); /** + * rdma_establish - Forces a connection state to established. + * @id: Connection identifier to transition to established. + * + * This routine should be invoked by users who receive messages on a + * QP before being notified that the connection has been established by the + * RDMA CM. + */ +int rdma_establish(struct rdma_cm_id *id); + +/** * rdma_disconnect - This function disconnects the associated QP and * transitions it into the error state. */ @@ -298,40 +314,17 @@ int rdma_get_cm_event(struct rdma_event_ */ int rdma_ack_cm_event(struct rdma_cm_event *event); -/** - * rdma_get_option - Retrieve options for an rdma_cm_id. - * @id: Communication identifier to retrieve option for. - * @level: Protocol level of the option to retrieve. - * @optname: Name of the option to retrieve. - * @optval: Buffer to receive the returned options. - * @optlen: On input, the size of the %optval buffer. On output, the - * size of the returned data. - */ -int rdma_get_option(struct rdma_cm_id *id, int level, int optname, - void *optval, size_t *optlen); - -/** - * rdma_set_option - Set options for an rdma_cm_id. - * @id: Communication identifier to set option for. - * @level: Protocol level of the option to set. - * @optname: Name of the option to set. - * @optval: Reference to the option data. - * @optlen: The size of the %optval buffer. - */ -int rdma_set_option(struct rdma_cm_id *id, int level, int optname, - void *optval, size_t optlen); - static inline uint16_t rdma_get_src_port(struct rdma_cm_id *id) { - return id->route.addr.src_addr.sin6_family == PF_INET6 ? - id->route.addr.src_addr.sin6_port : + return id->route.addr.src_addr.sa_family == PF_INET6 ? + ((struct sockaddr_in6 *) &id->route.addr.src_addr)->sin6_port : ((struct sockaddr_in *) &id->route.addr.src_addr)->sin_port; } static inline uint16_t rdma_get_dst_port(struct rdma_cm_id *id) { - return id->route.addr.dst_addr.sin6_family == PF_INET6 ? - id->route.addr.dst_addr.sin6_port : + return id->route.addr.dst_addr.sa_family == PF_INET6 ? + ((struct sockaddr_in6 *) &id->route.addr.dst_addr)->sin6_port : ((struct sockaddr_in *) &id->route.addr.dst_addr)->sin_port; } Index: src/cma.c =================================================================== --- src/cma.c (revision 9696) +++ src/cma.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -54,7 +54,6 @@ #include #include #include -#include #define PFX "librdmacm: " @@ -116,6 +115,28 @@ struct cma_id_private { pthread_cond_t cond; pthread_mutex_t mut; uint32_t handle; + struct cma_multicast *mc_list; +}; + +struct cma_multicast { + struct cma_multicast *next; + struct cma_id_private *id_priv; + void *context; + int events_completed; + pthread_cond_t cond; + uint32_t handle; + union ibv_gid mgid; + uint16_t mlid; + struct sockaddr addr; + uint8_t pad[sizeof(struct sockaddr_in6) - + sizeof(struct sockaddr)]; +}; + +struct cma_event { + struct rdma_cm_event event; + uint8_t private_data[RDMA_MAX_PRIVATE_DATA]; + struct cma_id_private *id_priv; + struct cma_multicast *mc; }; static struct cma_device *cma_dev_array; @@ -335,41 +356,6 @@ err: ucma_free_id(id_priv); return NULL; } -static int ucma_create_id_v1(struct rdma_event_channel *channel, - struct rdma_cm_id **id, void *context, - enum rdma_port_space ps) -{ - struct ucma_abi_create_id_resp *resp; - struct ucma_abi_create_id_v1 *cmd; - struct cma_id_private *id_priv; - void *msg; - int ret, size; - - if (ps != RDMA_PS_TCP) { - fprintf(stderr, "librdmacm: Kernel ABI does not support " - "requested port space.\n"); - return -EPROTONOSUPPORT; - } - - id_priv = ucma_alloc_id(channel, context, ps); - if (!id_priv) - return -ENOMEM; - - CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_CREATE_ID, size); - cmd->uid = (uintptr_t) id_priv; - - ret = write(channel->fd, msg, size); - if (ret != size) - goto err; - - id_priv->handle = resp->id; - *id = &id_priv->id; - return 0; - -err: ucma_free_id(id_priv); - return ret; -} - int rdma_create_id(struct rdma_event_channel *channel, struct rdma_cm_id **id, void *context, enum rdma_port_space ps) @@ -384,9 +370,6 @@ int rdma_create_id(struct rdma_event_cha if (ret) return ret; - if (abi_ver == 1) - return ucma_create_id_v1(channel, id, context, ps); - id_priv = ucma_alloc_id(channel, context, ps); if (!id_priv) return -ENOMEM; @@ -492,9 +475,9 @@ static int ucma_query_route(struct rdma_ sizeof id->route.addr.addr.ibaddr.dgid); id->route.addr.addr.ibaddr.pkey = resp->ib_route[0].pkey; memcpy(&id->route.addr.src_addr, &resp->src_addr, - sizeof id->route.addr.src_addr); + sizeof resp->src_addr); memcpy(&id->route.addr.dst_addr, &resp->dst_addr, - sizeof id->route.addr.dst_addr); + sizeof resp->dst_addr); if (!id_priv->cma_dev && resp->node_guid) { ret = ucma_get_device(id_priv, resp->node_guid); @@ -696,7 +679,7 @@ static int ucma_init_ib_qp(struct cma_id qp_attr.port_num = id_priv->id.port_num; qp_attr.qp_state = IBV_QPS_INIT; - qp_attr.qp_access_flags = IBV_ACCESS_LOCAL_WRITE; + qp_attr.qp_access_flags = 0; return ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_ACCESS_FLAGS | IBV_QP_PKEY_INDEX | IBV_QP_PORT); } @@ -767,11 +750,9 @@ void rdma_destroy_qp(struct rdma_cm_id * static void ucma_copy_conn_param_to_kern(struct ucma_abi_conn_param *dst, struct rdma_conn_param *src, - uint32_t qp_num, - enum ibv_qp_type qp_type, uint8_t srq) + uint32_t qp_num, uint8_t srq) { dst->qp_num = qp_num; - dst->qp_type = qp_type; dst->srq = srq; dst->responder_resources = src->responder_resources; dst->initiator_depth = src->initiator_depth; @@ -799,12 +780,11 @@ int rdma_connect(struct rdma_cm_id *id, cmd->id = id_priv->handle; if (id->qp) ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, - id->qp->qp_num, id->qp->qp_type, + id->qp->qp_num, (id->qp->srq != NULL)); else ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, conn_param->qp_num, - conn_param->qp_type, conn_param->srq); ret = write(id->channel->fd, msg, size); @@ -852,12 +832,11 @@ int rdma_accept(struct rdma_cm_id *id, s cmd->uid = (uintptr_t) id_priv; if (id->qp) ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, - id->qp->qp_num, id->qp->qp_type, + id->qp->qp_num, (id->qp->srq != NULL)); else ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, conn_param->qp_num, - conn_param->qp_type, conn_param->srq); ret = write(id->channel->fd, msg, size); @@ -894,6 +873,24 @@ int rdma_reject(struct rdma_cm_id *id, c return 0; } +int rdma_establish(struct rdma_cm_id *id) +{ + struct ucma_abi_establish *cmd; + struct cma_id_private *id_priv; + void *msg; + int ret, size; + + CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_ESTABLISH, size); + + id_priv = container_of(id, struct cma_id_private, id); + cmd->id = id_priv->handle; + ret = write(id->channel->fd, msg, size); + if (ret != size) + return (ret > 0) ? -ENODATA : ret; + + return 0; +} + int rdma_disconnect(struct rdma_cm_id *id) { struct ucma_abi_disconnect *cmd; @@ -929,74 +926,102 @@ int rdma_join_multicast(struct rdma_cm_i void *context) { struct ucma_abi_join_mcast *cmd; + struct ucma_abi_create_id_resp *resp; struct cma_id_private *id_priv; + struct cma_multicast *mc, **pos; void *msg; int ret, size, addrlen; + id_priv = container_of(id, struct cma_id_private, id); addrlen = ucma_addrlen(addr); if (!addrlen) return -EINVAL; - CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_JOIN_MCAST, size); - id_priv = container_of(id, struct cma_id_private, id); + mc = malloc(sizeof *mc); + if (!mc) + return -ENOMEM; + + memset(mc, 0, sizeof *mc); + mc->context = context; + mc->id_priv = id_priv; + memcpy(&mc->addr, addr, addrlen); + if (pthread_cond_init(&id_priv->cond, NULL)) { + ret = -1; + goto err1; + } + + pthread_mutex_lock(&id_priv->mut); + mc->next = id_priv->mc_list; + id_priv->mc_list = mc; + pthread_mutex_unlock(&id_priv->mut); + + CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_JOIN_MCAST, size); cmd->id = id_priv->handle; memcpy(&cmd->addr, addr, addrlen); - cmd->uid = (uintptr_t) context; + cmd->uid = (uintptr_t) mc; ret = write(id->channel->fd, msg, size); - if (ret != size) - return (ret > 0) ? -ENODATA : ret; + if (ret != size) { + ret = (ret > 0) ? -ENODATA : ret; + goto err2; + } + mc->handle = resp->id; return 0; +err2: + pthread_mutex_lock(&id_priv->mut); + for (pos = &id_priv->mc_list; *pos != mc; pos = &(*pos)->next) + ; + *pos = mc->next; + pthread_mutex_unlock(&id_priv->mut); +err1: + free(mc); + return ret; } int rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) { - struct ucma_abi_leave_mcast *cmd; + struct ucma_abi_destroy_id *cmd; + struct ucma_abi_destroy_id_resp *resp; struct cma_id_private *id_priv; + struct cma_multicast *mc, **pos; void *msg; int ret, size, addrlen; - struct ibv_ah_attr ah_attr; - uint32_t qp_info; addrlen = ucma_addrlen(addr); if (!addrlen) return -EINVAL; - CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_LEAVE_MCAST, size); id_priv = container_of(id, struct cma_id_private, id); - cmd->id = id_priv->handle; - memcpy(&cmd->addr, addr, addrlen); + pthread_mutex_lock(&id_priv->mut); + for (pos = &id_priv->mc_list; *pos; pos = &(*pos)->next) + if (!memcmp(&(*pos)->addr, addr, addrlen)) + break; - if (id->qp) { - ret = rdma_get_dst_attr(id, addr, &ah_attr, &qp_info, &qp_info); - if (ret) - goto out; + mc = *pos; + if (*pos) + *pos = mc->next; + pthread_mutex_unlock(&id_priv->mut); + if (!mc) + return -EADDRNOTAVAIL; - ret = ibv_detach_mcast(id->qp, &ah_attr.grh.dgid, ah_attr.dlid); - if (ret) - goto out; - } + if (id->qp) + ibv_detach_mcast(id->qp, &mc->mgid, mc->mlid); + CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_LEAVE_MCAST, size); + cmd->id = mc->handle; + ret = write(id->channel->fd, msg, size); if (ret != size) ret = (ret > 0) ? -ENODATA : ret; -out: - return ret; -} -static void ucma_copy_event_from_kern(struct rdma_cm_event *dst, - struct ucma_abi_event_resp *src) -{ - dst->event = src->event; - dst->status = src->status; - dst->private_data_len = src->private_data_len; - if (src->private_data_len) { - dst->private_data = dst + 1; - memcpy(dst->private_data, src->private_data, - src->private_data_len); - } else - dst->private_data = NULL; + pthread_mutex_lock(&id_priv->mut); + while (mc->events_completed < resp->events_reported) + pthread_cond_wait(&mc->cond, &id_priv->mut); + pthread_mutex_unlock(&id_priv->mut); + + free(mc); + return ret; } static void ucma_complete_event(struct cma_id_private *id_priv) @@ -1007,38 +1032,49 @@ static void ucma_complete_event(struct c pthread_mutex_unlock(&id_priv->mut); } +static void ucma_complete_mc_event(struct cma_multicast *mc) +{ + pthread_mutex_lock(&mc->id_priv->mut); + mc->events_completed++; + pthread_cond_signal(&mc->cond); + mc->id_priv->events_completed++; + pthread_cond_signal(&mc->id_priv->cond); + pthread_mutex_unlock(&mc->id_priv->mut); +} + int rdma_ack_cm_event(struct rdma_cm_event *event) { - struct rdma_cm_id *id; + struct cma_event *evt; if (!event) return -EINVAL; - id = (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) ? - event->listen_id : event->id; + evt = container_of(event, struct cma_event, event); - ucma_complete_event(container_of(id, struct cma_id_private, id)); - free(event); + if (evt->mc) + ucma_complete_mc_event(evt->mc); + else + ucma_complete_event(evt->id_priv); + free(evt); return 0; } -static int ucma_process_conn_req(struct rdma_cm_event *event, +static int ucma_process_conn_req(struct cma_event *evt, uint32_t handle) { - struct cma_id_private *listen_id_priv, *id_priv; + struct cma_id_private *id_priv; int ret; - listen_id_priv = container_of(event->id, struct cma_id_private, id); - id_priv = ucma_alloc_id(event->id->channel, event->id->context, - event->id->ps); + id_priv = ucma_alloc_id(evt->id_priv->id.channel, + evt->id_priv->id.context, evt->id_priv->id.ps); if (!id_priv) { - ucma_destroy_kern_id(event->id->channel->fd, handle); + ucma_destroy_kern_id(evt->id_priv->id.channel->fd, handle); ret = -ENOMEM; goto err; } - event->listen_id = event->id; - event->id = &id_priv->id; + evt->event.listen_id = &evt->id_priv->id; + evt->event.id = &id_priv->id; id_priv->handle = handle; ret = ucma_query_route(&id_priv->id); @@ -1049,7 +1085,7 @@ static int ucma_process_conn_req(struct return 0; err: - ucma_complete_event(listen_id_priv); + ucma_complete_event(evt->id_priv); return ret; } @@ -1093,34 +1129,54 @@ static int ucma_process_establish(struct return ret; } -static void ucma_process_mcast(struct rdma_cm_id *id, struct rdma_cm_event *evt) +static int ucma_process_join(struct cma_event *evt) { - struct ucma_abi_join_mcast kmc_data; - struct rdma_multicast_data *mc_data; - struct ibv_ah_attr ah_attr; - uint32_t qp_info; - - kmc_data = *(struct ucma_abi_join_mcast *) evt->private_data; - - mc_data = evt->private_data; - mc_data->context = (void *) (uintptr_t) kmc_data.uid; - memcpy(&mc_data->addr, &kmc_data.addr, - ucma_addrlen((struct sockaddr *) &kmc_data.addr)); - - if (evt->status || !id->qp) - return; - - evt->status = rdma_get_dst_attr(id, &mc_data->addr, &ah_attr, - &qp_info, &qp_info); - if (evt->status) - goto err; + evt->mc->mgid = evt->event.param.ud.ah_attr.grh.dgid; + evt->mc->mlid = evt->event.param.ud.ah_attr.dlid; - evt->status = ibv_attach_mcast(id->qp, &ah_attr.grh.dgid, ah_attr.dlid); - if (evt->status) - goto err; - return; -err: - evt->event = RDMA_CM_EVENT_MULTICAST_ERROR; + if (evt->id_priv->id.qp) + return ibv_attach_mcast(evt->id_priv->id.qp, + &evt->mc->mgid, evt->mc->mlid); + else + return 0; +} + +static void ucma_copy_conn_event(struct cma_event *event, + struct ucma_abi_conn_param *src) +{ + struct rdma_conn_param *dst = &event->event.param.conn; + + dst->private_data_len = src->private_data_len; + if (src->private_data_len) { + dst->private_data = &event->private_data; + memcpy(&event->private_data, src->private_data, + src->private_data_len); + } + + dst->responder_resources = src->responder_resources; + dst->initiator_depth = src->initiator_depth; + dst->flow_control = src->flow_control; + dst->retry_count = src->retry_count; + dst->rnr_retry_count = src->rnr_retry_count; + dst->srq = src->srq; + dst->qp_num = src->qp_num; +} + +static void ucma_copy_ud_event(struct cma_event *event, + struct ucma_abi_ud_param *src) +{ + struct rdma_ud_param *dst = &event->event.param.ud; + + dst->private_data_len = src->private_data_len; + if (src->private_data_len) { + dst->private_data = &event->private_data; + memcpy(&event->private_data, src->private_data, + src->private_data_len); + } + + ibv_copy_ah_attr_from_kern(&dst->ah_attr, &src->ah_attr); + dst->qp_num = src->qp_num; + dst->qkey = src->qkey; } int rdma_get_cm_event(struct rdma_event_channel *channel, @@ -1128,8 +1184,7 @@ int rdma_get_cm_event(struct rdma_event_ { struct ucma_abi_event_resp *resp; struct ucma_abi_get_event *cmd; - struct cma_id_private *id_priv; - struct rdma_cm_event *evt; + struct cma_event *evt; void *msg; int ret, size; @@ -1140,155 +1195,119 @@ int rdma_get_cm_event(struct rdma_event_ if (!event) return -EINVAL; - evt = malloc(sizeof *evt + RDMA_MAX_PRIVATE_DATA); + evt = malloc(sizeof *evt); if (!evt) return -ENOMEM; retry: + memset(evt, 0, sizeof *evt); CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_GET_EVENT, size); ret = write(channel->fd, msg, size); if (ret != size) { free(evt); return (ret > 0) ? -ENODATA : ret; } - - id_priv = (void *) (uintptr_t) resp->uid; - evt->id = &id_priv->id; - ucma_copy_event_from_kern(evt, resp); - switch (evt->event) { + evt->event.event = resp->event; + switch (resp->event) { case RDMA_CM_EVENT_ADDR_RESOLVED: - evt->status = ucma_query_route(&id_priv->id); - if (evt->status) - evt->event = RDMA_CM_EVENT_ADDR_ERROR; + evt->id_priv = (void *) (uintptr_t) resp->uid; + evt->event.id = &evt->id_priv->id; + evt->event.status = ucma_query_route(&evt->id_priv->id); + if (evt->event.status) + evt->event.event = RDMA_CM_EVENT_ADDR_ERROR; break; case RDMA_CM_EVENT_ROUTE_RESOLVED: - evt->status = ucma_query_route(&id_priv->id); - if (evt->status) - evt->event = RDMA_CM_EVENT_ROUTE_ERROR; + evt->id_priv = (void *) (uintptr_t) resp->uid; + evt->event.id = &evt->id_priv->id; + evt->event.status = ucma_query_route(&evt->id_priv->id); + if (evt->event.status) + evt->event.event = RDMA_CM_EVENT_ROUTE_ERROR; break; case RDMA_CM_EVENT_CONNECT_REQUEST: + evt->id_priv = (void *) (uintptr_t) resp->uid; + if (evt->id_priv->id.ps == RDMA_PS_TCP) + ucma_copy_conn_event(evt, &resp->param.conn); + else + ucma_copy_ud_event(evt, &resp->param.ud); + ret = ucma_process_conn_req(evt, resp->id); if (ret) goto retry; break; case RDMA_CM_EVENT_CONNECT_RESPONSE: - evt->status = ucma_process_conn_resp(id_priv); - if (!evt->status) - evt->event = RDMA_CM_EVENT_ESTABLISHED; + evt->id_priv = (void *) (uintptr_t) resp->uid; + evt->event.id = &evt->id_priv->id; + ucma_copy_conn_event(evt, &resp->param.conn); + evt->event.status = ucma_process_conn_resp(evt->id_priv); + if (!evt->event.status) + evt->event.event = RDMA_CM_EVENT_ESTABLISHED; else { - evt->event = RDMA_CM_EVENT_CONNECT_ERROR; - id_priv->connect_error = 1; + evt->event.event = RDMA_CM_EVENT_CONNECT_ERROR; + evt->id_priv->connect_error = 1; } break; case RDMA_CM_EVENT_ESTABLISHED: - if (id_priv->id.ps == RDMA_PS_UDP) + evt->id_priv = (void *) (uintptr_t) resp->uid; + evt->event.id = &evt->id_priv->id; + if (evt->id_priv->id.ps == RDMA_PS_UDP) { + ucma_copy_ud_event(evt, &resp->param.ud); break; + } - evt->status = ucma_process_establish(&id_priv->id); - if (evt->status) { - evt->event = RDMA_CM_EVENT_CONNECT_ERROR; - id_priv->connect_error = 1; + ucma_copy_conn_event(evt, &resp->param.conn); + evt->event.status = ucma_process_establish(&evt->id_priv->id); + if (evt->event.status) { + evt->event.event = RDMA_CM_EVENT_CONNECT_ERROR; + evt->id_priv->connect_error = 1; } break; case RDMA_CM_EVENT_REJECTED: - if (id_priv->connect_error) { - ucma_complete_event(id_priv); + evt->id_priv = (void *) (uintptr_t) resp->uid; + if (evt->id_priv->connect_error) { + ucma_complete_event(evt->id_priv); goto retry; } - ucma_modify_qp_err(evt->id); + evt->event.id = &evt->id_priv->id; + ucma_copy_conn_event(evt, &resp->param.conn); + ucma_modify_qp_err(evt->event.id); break; case RDMA_CM_EVENT_DISCONNECTED: - if (id_priv->connect_error) { - ucma_complete_event(id_priv); + evt->id_priv = (void *) (uintptr_t) resp->uid; + if (evt->id_priv->connect_error) { + ucma_complete_event(evt->id_priv); goto retry; } + evt->event.id = &evt->id_priv->id; + ucma_copy_conn_event(evt, &resp->param.conn); break; case RDMA_CM_EVENT_MULTICAST_JOIN: + evt->mc = (void *) (uintptr_t) resp->uid; + evt->id_priv = evt->mc->id_priv; + evt->event.id = &evt->id_priv->id; + ucma_copy_ud_event(evt, &resp->param.ud); + evt->event.param.ud.private_data = evt->mc->context; + evt->event.status = ucma_process_join(evt); + if (evt->event.status) + evt->event.event = RDMA_CM_EVENT_MULTICAST_ERROR; + break; case RDMA_CM_EVENT_MULTICAST_ERROR: - ucma_process_mcast(&id_priv->id, evt); + evt->mc = (void *) (uintptr_t) resp->uid; + evt->id_priv = evt->mc->id_priv; + evt->event.id = &evt->id_priv->id; + evt->event.status = resp->status; + evt->event.param.ud.private_data = evt->mc->context; break; default: + evt->id_priv = (void *) (uintptr_t) resp->uid; + evt->event.id = &evt->id_priv->id; + if (evt->id_priv->id.ps == RDMA_PS_TCP) + ucma_copy_conn_event(evt, &resp->param.conn); + else + ucma_copy_ud_event(evt, &resp->param.ud); break; } - *event = evt; - return 0; -} - -int rdma_get_option(struct rdma_cm_id *id, int level, int optname, - void *optval, size_t *optlen) -{ - struct ucma_abi_get_option_resp *resp; - struct ucma_abi_get_option *cmd; - struct cma_id_private *id_priv; - void *msg; - int ret, size; - - CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_GET_OPTION, size); - id_priv = container_of(id, struct cma_id_private, id); - cmd->id = id_priv->handle; - cmd->optval = (uintptr_t) optval; - cmd->level = level; - cmd->optname = optname; - cmd->optlen = *optlen; - - ret = write(id->channel->fd, msg, size); - if (ret != size) - return (ret > 0) ? -ENODATA : ret; - - *optlen = resp->optlen; - return 0; -} - -int rdma_set_option(struct rdma_cm_id *id, int level, int optname, - void *optval, size_t optlen) -{ - struct ucma_abi_set_option *cmd; - struct cma_id_private *id_priv; - void *msg; - int ret, size; - - CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_SET_OPTION, size); - id_priv = container_of(id, struct cma_id_private, id); - cmd->id = id_priv->handle; - cmd->optval = (uintptr_t) optval; - cmd->level = level; - cmd->optname = optname; - cmd->optlen = optlen; - - ret = write(id->channel->fd, msg, size); - if (ret != size) - return (ret > 0) ? -ENODATA : ret; - - return 0; -} - -int rdma_get_dst_attr(struct rdma_cm_id *id, struct sockaddr *addr, - struct ibv_ah_attr *ah_attr, uint32_t *remote_qpn, - uint32_t *remote_qkey) -{ - struct ucma_abi_dst_attr_resp *resp; - struct ucma_abi_get_dst_attr *cmd; - struct cma_id_private *id_priv; - void *msg; - int ret, size, addrlen; - - addrlen = ucma_addrlen(addr); - if (!addrlen) - return -EINVAL; - - CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_GET_DST_ATTR, size); - id_priv = container_of(id, struct cma_id_private, id); - cmd->id = id_priv->handle; - memcpy(&cmd->addr, addr, addrlen); - - ret = write(id->channel->fd, msg, size); - if (ret != size) - return (ret > 0) ? -ENODATA : ret; - - ibv_copy_ah_attr_from_kern(ah_attr, &resp->ah_attr); - *remote_qpn = resp->remote_qpn; - *remote_qkey = resp->remote_qkey; + *event = &evt->event; return 0; } Index: Makefile.am =================================================================== --- Makefile.am (revision 9192) +++ Makefile.am (working copy) @@ -31,12 +31,10 @@ examples_mckey_LDADD = $(top_builddir)/s librdmacmincludedir = $(includedir)/rdma librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \ - include/rdma/rdma_cma.h \ - include/rdma/rdma_cma_ib.h + include/rdma/rdma_cma.h EXTRA_DIST = include/rdma/rdma_cma_abi.h \ include/rdma/rdma_cma.h \ - include/rdma/rdma_cma_ib.h \ src/librdmacm.map \ librdmacm.spec.in Index: examples/mckey.c =================================================================== --- examples/mckey.c (revision 9208) +++ examples/mckey.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -42,9 +42,9 @@ #include #include #include +#include #include -#include struct cmatest_node { int id; @@ -76,6 +76,8 @@ static int connections = 1; static int message_size = 100; static int message_count = 10; static int is_sender; +static char *dst_addr; +static char *src_addr; static int create_message(struct cmatest_node *node) { @@ -239,19 +241,12 @@ err: return ret; } -static int join_handler(struct cmatest_node *node) +static int join_handler(struct cmatest_node *node, + struct rdma_ud_param *param) { - struct ibv_ah_attr ah_attr; - int ret; - - ret = rdma_get_dst_attr(node->cma_id, test.dst_addr, &ah_attr, - &node->remote_qpn, &node->remote_qkey); - if (ret) { - printf("mckey: failure getting destination attributes\n"); - goto err; - } - - node->ah = ibv_create_ah(node->pd, &ah_attr); + node->remote_qpn = param->qp_num; + node->remote_qkey = param->qkey; + node->ah = ibv_create_ah(node->pd, ¶m->ah_attr); if (!node->ah) { printf("mckey: failure creating address handle\n"); goto err; @@ -262,7 +257,7 @@ static int join_handler(struct cmatest_n return 0; err: connect_error(); - return ret; + return -1; } static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) @@ -274,7 +269,7 @@ static int cma_handler(struct rdma_cm_id ret = addr_handler(cma_id->context); break; case RDMA_CM_EVENT_MULTICAST_JOIN: - ret = join_handler(cma_id->context); + ret = join_handler(cma_id->context, &event->param.ud); break; case RDMA_CM_EVENT_ADDR_ERROR: case RDMA_CM_EVENT_ROUTE_ERROR: @@ -411,18 +406,21 @@ out: return ret; } -static int run(char *dst, char *src) +static int run(void) { int i, ret; - printf("mckey: starting client\n"); - if (src) { - ret = get_addr(src, &test.src_in); + if (is_sender) + printf("mckey: starting client\n"); + else + printf("mckey: starting server\n"); + if (src_addr) { + ret = get_addr(src_addr, &test.src_in); if (ret) return ret; } - ret = get_addr(dst, &test.dst_in); + ret = get_addr(dst_addr, &test.dst_in); if (ret) return ret; @@ -431,7 +429,7 @@ static int run(char *dst, char *src) printf("mckey: joining\n"); for (i = 0; i < connections; i++) { ret = rdma_resolve_addr(test.nodes[i].cma_id, - src ? test.src_addr : NULL, + src_addr ? test.src_addr : NULL, test.dst_addr, 2000); if (ret) { printf("mckey: failure getting addr: %d\n", ret); @@ -472,14 +470,39 @@ out: int main(int argc, char **argv) { - int ret; + int op, ret; - if (argc < 3 || argc > 4) { - printf("usage: %s {s[end] | r[ecv]} mcast_addr [bind_addr]]\n", - argv[0]); - exit(1); + while ((op = getopt(argc, argv, "m:sb:c:C:S:")) != -1) { + switch (op) { + case 'm': + dst_addr = optarg; + break; + case 's': + is_sender = 1; + break; + case 'b': + src_addr = optarg; + break; + case 'c': + connections = atoi(optarg); + break; + case 'C': + message_count = atoi(optarg); + break; + case 'S': + message_size = atoi(optarg); + break; + default: + printf("usage: %s\n", argv[0]); + printf("\t-m multicast_address\n"); + printf("\t[-s(ender)]\n"); + printf("\t[-b bind_address]\n"); + printf("\t[-c connections]\n"); + printf("\t[-C message_count]\n"); + printf("\t[-S message_size]\n"); + exit(1); + } } - is_sender = (argv[1][0] == 's'); test.dst_addr = (struct sockaddr *) &test.dst_in; test.src_addr = (struct sockaddr *) &test.src_in; @@ -494,7 +517,7 @@ int main(int argc, char **argv) if (alloc_nodes()) exit(1); - ret = run(argv[2], (argc == 4) ? argv[3] : NULL); + ret = run(); printf("test complete\n"); destroy_nodes(); Index: examples/udaddy.c =================================================================== --- examples/udaddy.c (revision 9208) +++ examples/udaddy.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -41,15 +41,9 @@ #include #include #include +#include #include -#include - -/* - * To execute: - * Server: udaddy - * Client: udaddy [server_addr [src_addr]] - */ struct cmatest_node { int id; @@ -80,7 +74,8 @@ static struct cmatest test; static int connections = 1; static int message_size = 100; static int message_count = 10; -static int is_server; +static char *dst_addr; +static char *src_addr; static int create_message(struct cmatest_node *node) { @@ -246,7 +241,6 @@ static int route_handler(struct cmatest_ memset(&conn_param, 0, sizeof conn_param); conn_param.qp_num = node->cma_id->qp->qp_num; - conn_param.qp_type = node->cma_id->qp->qp_type; conn_param.retry_count = 5; ret = rdma_connect(node->cma_id, &conn_param); if (ret) { @@ -284,7 +278,6 @@ static int connect_handler(struct rdma_c memset(&conn_param, 0, sizeof conn_param); conn_param.qp_num = node->cma_id->qp->qp_num; - conn_param.qp_type = node->cma_id->qp->qp_type; ret = rdma_accept(node->cma_id, &conn_param); if (ret) { printf("udaddy: failure accepting: %d\n", ret); @@ -303,19 +296,12 @@ err1: return ret; } -static int resolved_handler(struct cmatest_node *node) +static int resolved_handler(struct cmatest_node *node, + struct rdma_cm_event *event) { - struct ibv_ah_attr ah_attr; - int ret; - - ret = rdma_get_dst_attr(node->cma_id, test.dst_addr, &ah_attr, - &node->remote_qpn, &node->remote_qkey); - if (ret) { - printf("udaddy: failure getting destination attributes\n"); - goto err; - } - - node->ah = ibv_create_ah(node->pd, &ah_attr); + node->remote_qpn = event->param.ud.qp_num; + node->remote_qkey = event->param.ud.qkey; + node->ah = ibv_create_ah(node->pd, &event->param.ud.ah_attr); if (!node->ah) { printf("udaddy: failure creating address handle\n"); goto err; @@ -326,7 +312,7 @@ static int resolved_handler(struct cmate return 0; err: connect_error(); - return ret; + return -1; } static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) @@ -344,7 +330,7 @@ static int cma_handler(struct rdma_cm_id ret = connect_handler(cma_id); break; case RDMA_CM_EVENT_ESTABLISHED: - ret = resolved_handler(cma_id->context); + ret = resolved_handler(cma_id->context, event); break; case RDMA_CM_EVENT_ADDR_ERROR: case RDMA_CM_EVENT_ROUTE_ERROR: @@ -404,7 +390,7 @@ static int alloc_nodes(void) for (i = 0; i < connections; i++) { test.nodes[i].id = i; - if (!is_server) { + if (dst_addr) { ret = rdma_create_id(test.channel, &test.nodes[i].cma_id, &test.nodes[i], RDMA_PS_UDP); @@ -475,6 +461,28 @@ static int connect_events(void) return ret; } +static int get_addr(char *dst, struct sockaddr_in *addr) +{ + struct addrinfo *res; + int ret; + + ret = getaddrinfo(dst, NULL, NULL, &res); + if (ret) { + printf("getaddrinfo failed - invalid hostname or IP address\n"); + return ret; + } + + if (res->ai_family != PF_INET) { + ret = -1; + goto out; + } + + *addr = *(struct sockaddr_in *) res->ai_addr; +out: + freeaddrinfo(res); + return ret; +} + static int run_server(void) { struct rdma_cm_id *listen_id; @@ -487,7 +495,13 @@ static int run_server(void) return ret; } - test.src_in.sin_family = PF_INET; + if (src_addr) { + ret = get_addr(src_addr, &test.src_in); + if (ret) + goto out; + } else + test.src_in.sin_family = PF_INET; + test.src_in.sin_port = 7174; ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { @@ -526,40 +540,18 @@ out: return ret; } -static int get_addr(char *dst, struct sockaddr_in *addr) -{ - struct addrinfo *res; - int ret; - - ret = getaddrinfo(dst, NULL, NULL, &res); - if (ret) { - printf("getaddrinfo failed - invalid hostname or IP address\n"); - return ret; - } - - if (res->ai_family != PF_INET) { - ret = -1; - goto out; - } - - *addr = *(struct sockaddr_in *) res->ai_addr; -out: - freeaddrinfo(res); - return ret; -} - -static int run_client(char *dst, char *src) +static int run_client(void) { int i, ret; printf("udaddy: starting client\n"); - if (src) { - ret = get_addr(src, &test.src_in); + if (src_addr) { + ret = get_addr(src_addr, &test.src_in); if (ret) return ret; } - ret = get_addr(dst, &test.dst_in); + ret = get_addr(dst_addr, &test.dst_in); if (ret) return ret; @@ -568,7 +560,7 @@ static int run_client(char *dst, char *s printf("udaddy: connecting\n"); for (i = 0; i < connections; i++) { ret = rdma_resolve_addr(test.nodes[i].cma_id, - src ? test.src_addr : NULL, + src_addr ? test.src_addr : NULL, test.dst_addr, 2000); if (ret) { printf("udaddy: failure getting addr: %d\n", ret); @@ -601,13 +593,35 @@ out: int main(int argc, char **argv) { - int ret; + int op, ret; - if (argc > 3) { - printf("usage: %s [server_addr [src_addr]]\n", argv[0]); - exit(1); + while ((op = getopt(argc, argv, "s:b:c:C:S:")) != -1) { + switch (op) { + case 's': + dst_addr = optarg; + break; + case 'b': + src_addr = optarg; + break; + case 'c': + connections = atoi(optarg); + break; + case 'C': + message_count = atoi(optarg); + break; + case 'S': + message_size = atoi(optarg); + break; + default: + printf("usage: %s\n", argv[0]); + printf("\t[-s server_address]\n"); + printf("\t[-b bind_address]\n"); + printf("\t[-c connections]\n"); + printf("\t[-C message_count]\n"); + printf("\t[-S message_size]\n"); + exit(1); + } } - is_server = (argc == 1); test.dst_addr = (struct sockaddr *) &test.dst_in; test.src_addr = (struct sockaddr *) &test.src_in; @@ -622,10 +636,10 @@ int main(int argc, char **argv) if (alloc_nodes()) exit(1); - if (is_server) - ret = run_server(); + if (dst_addr) + ret = run_client(); else - ret = run_client(argv[1], (argc == 3) ? argv[2] : NULL); + ret = run_server(); printf("test complete\n"); destroy_nodes(); Index: examples/cmatose.c =================================================================== --- examples/cmatose.c (revision 9192) +++ examples/cmatose.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -41,6 +41,7 @@ #include #include #include +#include #include @@ -52,12 +53,6 @@ static inline uint64_t cpu_to_be64(uint6 static inline uint32_t cpu_to_be32(uint32_t x) { return bswap_32(x); } #endif -/* - * To execute: - * Server: rdma_cmatose - * Client: rdma_cmatose - */ - struct cmatest_node { int id; struct rdma_cm_id *cma_id; @@ -85,7 +80,8 @@ static struct cmatest test; static int connections = 1; static int message_size = 100; static int message_count = 10; -static int is_server; +static char *dst_addr; +static char *src_addr; static int create_message(struct cmatest_node *node) { @@ -377,7 +373,7 @@ static int alloc_nodes(void) for (i = 0; i < connections; i++) { test.nodes[i].id = i; - if (!is_server) { + if (dst_addr) { ret = rdma_create_id(test.channel, &test.nodes[i].cma_id, &test.nodes[i], RDMA_PS_TCP); @@ -460,6 +456,28 @@ static int disconnect_events(void) return ret; } +static int get_addr(char *dst, struct sockaddr_in *addr) +{ + struct addrinfo *res; + int ret; + + ret = getaddrinfo(dst, NULL, NULL, &res); + if (ret) { + printf("getaddrinfo failed - invalid hostname or IP address\n"); + return ret; + } + + if (res->ai_family != PF_INET) { + ret = -1; + goto out; + } + + *addr = *(struct sockaddr_in *) res->ai_addr; +out: + freeaddrinfo(res); + return ret; +} + static int run_server(void) { struct rdma_cm_id *listen_id; @@ -472,12 +490,18 @@ static int run_server(void) return ret; } - test.src_in.sin_family = PF_INET; + if (src_addr) { + ret = get_addr(src_addr, &test.src_in); + if (ret) + goto out; + } else + test.src_in.sin_family = PF_INET; + test.src_in.sin_port = 7471; ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { printf("cmatose: bind address failed: %d\n", ret); - return ret; + goto out; } ret = rdma_listen(listen_id, 0); @@ -528,40 +552,18 @@ out: return ret; } -static int get_addr(char *dst, struct sockaddr_in *addr) -{ - struct addrinfo *res; - int ret; - - ret = getaddrinfo(dst, NULL, NULL, &res); - if (ret) { - printf("getaddrinfo failed - invalid hostname or IP address\n"); - return ret; - } - - if (res->ai_family != PF_INET) { - ret = -1; - goto out; - } - - *addr = *(struct sockaddr_in *) res->ai_addr; -out: - freeaddrinfo(res); - return ret; -} - -static int run_client(char *dst, char *src) +static int run_client(void) { int i, ret, ret2; printf("cmatose: starting client\n"); - if (src) { - ret = get_addr(src, &test.src_in); + if (src_addr) { + ret = get_addr(src_addr, &test.src_in); if (ret) return ret; } - ret = get_addr(dst, &test.dst_in); + ret = get_addr(dst_addr, &test.dst_in); if (ret) return ret; @@ -570,7 +572,7 @@ static int run_client(char *dst, char *s printf("cmatose: connecting\n"); for (i = 0; i < connections; i++) { ret = rdma_resolve_addr(test.nodes[i].cma_id, - src ? test.src_addr : NULL, + src_addr ? test.src_addr : NULL, test.dst_addr, 2000); if (ret) { printf("cmatose: failure getting addr: %d\n", ret); @@ -597,7 +599,6 @@ static int run_client(char *dst, char *s } printf("data transfers complete\n"); - } ret = 0; @@ -611,13 +612,35 @@ out: int main(int argc, char **argv) { - int ret; + int op, ret; - if (argc > 3) { - printf("usage: %s [server_addr [src_addr]]\n", argv[0]); - exit(1); + while ((op = getopt(argc, argv, "s:b:c:C:S:")) != -1) { + switch (op) { + case 's': + dst_addr = optarg; + break; + case 'b': + src_addr = optarg; + break; + case 'c': + connections = atoi(optarg); + break; + case 'C': + message_count = atoi(optarg); + break; + case 'S': + message_size = atoi(optarg); + break; + default: + printf("usage: %s\n", argv[0]); + printf("\t[-s server_address]\n"); + printf("\t[-b bind_address]\n"); + printf("\t[-c connections]\n"); + printf("\t[-C message_count]\n"); + printf("\t[-S message_size]\n"); + exit(1); + } } - is_server = (argc == 1); test.dst_addr = (struct sockaddr *) &test.dst_in; test.src_addr = (struct sockaddr *) &test.src_in; @@ -633,10 +656,10 @@ int main(int argc, char **argv) if (alloc_nodes()) exit(1); - if (is_server) - ret = run_server(); + if (dst_addr) + ret = run_client(); else - ret = run_client(argv[1], (argc == 3) ? argv[2] : NULL); + ret = run_server(); printf("test complete\n"); destroy_nodes(); From ggrundstrom at NetEffect.com Wed Oct 25 17:06:45 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:06:45 -0500 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C88@venom2> The following set of patches contain the source code for the NetEffect NE010 adapter. Patches 1 through 9 contain the kernel driver source and patches 10 through 14 contain the userspace components. Due to license restrictions the NetEffect TCP/IP stack module cannot be released under the OFA license, but is available from NetEffect under a separate license agreement. NetEffect is in the process of developing a TCP stack that will meet the OFA license requirements. For information on licensing the NetEffect TCP stack module, contact openfabrics at neteffect.com. Signed-off-by: Glenn Grundstrom -------------- next part -------------- A non-text attachment was scrubbed... Name: patch1 Type: application/octet-stream Size: 1570 bytes Desc: patch1 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:13:06 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:13:06 -0500 Subject: [openib-general] [PATCH 2/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C8A@venom2> A non-text attachment was scrubbed... Name: patch2 Type: application/octet-stream Size: 17616 bytes Desc: patch2 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:16:39 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:16:39 -0500 Subject: [openib-general] [PATCH 3/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C8E@venom2> A non-text attachment was scrubbed... Name: patch3 Type: application/octet-stream Size: 40117 bytes Desc: patch3 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:18:45 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:18:45 -0500 Subject: [openib-general] [PATCH 4/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C91@venom2> A non-text attachment was scrubbed... Name: patch4 Type: application/octet-stream Size: 51440 bytes Desc: patch4 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:22:01 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:22:01 -0500 Subject: [openib-general] [PATCH 5/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C92@venom2> A non-text attachment was scrubbed... Name: patch5 Type: application/octet-stream Size: 57617 bytes Desc: patch5 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:25:36 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:25:36 -0500 Subject: [openib-general] [PATCH 6/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C95@venom2> A non-text attachment was scrubbed... Name: patch6 Type: application/octet-stream Size: 19729 bytes Desc: patch6 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:27:21 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:27:21 -0500 Subject: [openib-general] [PATCH 7/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C98@venom2> A non-text attachment was scrubbed... Name: patch7 Type: application/octet-stream Size: 16473 bytes Desc: patch7 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:28:26 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:28:26 -0500 Subject: [openib-general] [PATCH 8/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C9A@venom2> A non-text attachment was scrubbed... Name: patch8 Type: application/octet-stream Size: 92367 bytes Desc: patch8 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:29:39 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:29:39 -0500 Subject: [openib-general] [PATCH 9/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C9B@venom2> A non-text attachment was scrubbed... Name: patch9 Type: application/octet-stream Size: 4697 bytes Desc: patch9 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:30:53 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:30:53 -0500 Subject: [openib-general] [PATCH 10/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C9D@venom2> A non-text attachment was scrubbed... Name: patch10 Type: application/octet-stream Size: 270756 bytes Desc: patch10 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:31:54 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:31:54 -0500 Subject: [openib-general] [PATCH 11/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C9F@venom2> A non-text attachment was scrubbed... Name: patch11 Type: application/octet-stream Size: 2468 bytes Desc: patch11 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:33:02 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:33:02 -0500 Subject: [openib-general] [PATCH 12/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9CA0@venom2> A non-text attachment was scrubbed... Name: patch12 Type: application/octet-stream Size: 12410 bytes Desc: patch12 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:34:01 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:34:01 -0500 Subject: [openib-general] [PATCH 13/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9CA1@venom2> A non-text attachment was scrubbed... Name: patch13 Type: application/octet-stream Size: 6637 bytes Desc: patch13 URL: From ggrundstrom at NetEffect.com Wed Oct 25 17:34:58 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Wed, 25 Oct 2006 19:34:58 -0500 Subject: [openib-general] [PATCH 14/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9CA3@venom2> A non-text attachment was scrubbed... Name: patch14 Type: application/octet-stream Size: 25760 bytes Desc: patch14 URL: From venkatesh.babu at 3leafnetworks.com Wed Oct 25 18:25:15 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Wed, 25 Oct 2006 18:25:15 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000001c6f7ec$15567d10$17fd070a@amr.corp.intel.com> References: <000001c6f7ec$15567d10$17fd070a@amr.corp.intel.com> Message-ID: <45400E7B.2020600@3leafnetworks.com> All the patches are attached to the bugzilla bug reports. Yes, it is possible to enhance ib_cm_init_qp_attr(), but it requires a new parameter *alternate_path and changing it might break the compatibility with the existing code. So I thought, it is better to add new interface, so that it won't cause any interference with the existing code. And also initializing QP attributes for rearming is slightly different than initializing QP attributes for bringing it up. Loading the alternate path may not necessarily indicate the failover. It is just saying that alternate path is available. Failover actually happens only when primary path's any components like local port or remote port or switch port goes down. We should get some events when this happens to cause the failover. VBabu Sean Hefty wrote: >>I have implemented this interface and proposing this in bug#172. You >>have to patch it manually to get it working. >> >> > >Can you post the patch to the list? If the functionality is needed, I'd like to >queue it for 2.6.20. Although, I'd prefer if the existing ib_cm_init_qp_attr() >routine were used with state handling instead of adding a new API routine. > >And thinking about it more, we might be able to use that call as the indication >to the ib_cm to failover to the alternate path. > >- Sean > > From venkatesh.babu at 3leafnetworks.com Wed Oct 25 18:35:17 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Wed, 25 Oct 2006 18:35:17 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <000201c6f7ed$db0c15f0$17fd070a@amr.corp.intel.com> References: <000201c6f7ed$db0c15f0$17fd070a@amr.corp.intel.com> Message-ID: <454010D5.2050605@3leafnetworks.com> The patch for ib_sa_serv_notice_hdlr() handles multiple callers registering for the same event by maintaining a linked list of callback handlers. So I didn't think that the multiple user count was necessary. The actual registration with the MAD layer happens only when the ib_sa module is initialized and unregistered when it is unloaded. In sa_query.c file the multicasting handling code is almost similar to ib_sa_serv_notice_hdlr(). I implemented this interface by just following the same model for the rest of the calls in the same file. But this interface is bit different in other aspects like registering for the event notification (InformInfo) and processing when the event actually happens (Notice). So we need more code to handle it correctly. VBabu Sean Hefty wrote: >>The problem is for the the remote node where RC QP listen was accepted >>this connection (Passive side). There is no way for this node to know >>that port failure has occurred on the Active side. So it requires some >>interfaces to get this notification. So ib_sa_serv_notice_hdlr() >>interface as described in the patch attached to bug#159 can be used to >>register for the remote port events. This interface has to be called >>separately for for PORT_ERR and PORT_ACTIVE events. When the handler for >>remote PORT_ERR occurs, then RC QP's path_mig_state can be changed to >>IB_MIG_MIGRATED to cause the failover. >> >> > >Hal pointed me to your patches for this, since I'm working on adding InformInfo >/ Notice support. I believe that a good portion of the code there is usable. >What I didn't see in the patch was reference counting to handle multiple users >registering for the same event, but I'm planning on leveraging the multicast >handling code for that. > >- Sean > > From johann.george at qlogic.com Wed Oct 25 18:27:38 2006 From: johann.george at qlogic.com (Johann George) Date: Wed, 25 Oct 2006 18:27:38 -0700 Subject: [openib-general] staging.openfabrics.org now functional Message-ID: <20061026012738.GB7407@cuprite.pathscale.com> You can now reference the new OpenFabrics server using staging.openfabrics.org Johann From mlleinin at hpcn.ca.sandia.gov Wed Oct 25 18:40:39 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Wed, 25 Oct 2006 18:40:39 -0700 Subject: [openib-general] New server svn up Message-ID: <1161826839.26066.99.camel@localhost> Subversion is up and running on the new server at https://69.55.231.195/svn. Those who have write access to the current svn should have write access on the new server. Let us know if any issues arise. Thanks, - Matt From sashak at voltaire.com Wed Oct 25 19:16:35 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 26 Oct 2006 04:16:35 +0200 Subject: [openib-general] New server svn up In-Reply-To: <1161826839.26066.99.camel@localhost> References: <1161826839.26066.99.camel@localhost> Message-ID: <20061026021635.GA14818@sashak.voltaire.com> Hi Matt, On 18:40 Wed 25 Oct , Matt Leininger wrote: > > Subversion is up and running on the new server at > https://69.55.231.195/svn. That is great. What will be syncronization policy in staging period (with old, yet current SVN repository)? Now I can see that commits >= r9957 are not in new repo yet. Thanks, Sasha From vishal at endace.com Wed Oct 25 19:16:50 2006 From: vishal at endace.com (vishal) Date: Thu, 26 Oct 2006 15:16:50 +1300 Subject: [openib-general] PC to PC data transfer using Infiniband Message-ID: <1161829010.5112.68.camel@julia.et.endace.com> I am interested in moving data from memory in one PC to memory in the other using Infiniband hardware. Any suggestions on what would be the best way to do it ? Thanks! Vishal From greg.lindahl at qlogic.com Wed Oct 25 20:26:43 2006 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Wed, 25 Oct 2006 20:26:43 -0700 Subject: [openib-general] PC to PC data transfer using Infiniband In-Reply-To: <1161829010.5112.68.camel@julia.et.endace.com> References: <1161829010.5112.68.camel@julia.et.endace.com> Message-ID: <20061026032643.GA4246@greglaptop.skyriver> On Thu, Oct 26, 2006 at 03:16:50PM +1300, vishal wrote: > I am interested in moving data from memory in one PC to > memory in the other using Infiniband hardware. Any suggestions on what > would be the best way to do it ? There are 2 main approaches, messages and RDMA. The best one depends on the size of the data, how it's used, and the number of nodes that might be touching a given area of memory. -- greg From mst at mellanox.co.il Wed Oct 25 22:01:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 26 Oct 2006 07:01:01 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> References: <15ddcffd0610251239r7bf4c034jd2b12f69ea0d43bc@mail.gmail.com> <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> Message-ID: <20061026050101.GA28980@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA > > Updates the librdmacm to work with ABI version 3, which is the proposed > kernel changes for inclusion in 2.6.20. > > Test programs are also updated. > > Signed-off-by: Sean Hefty > --- > Index: include/rdma/rdma_cma_abi.h > =================================================================== > --- include/rdma/rdma_cma_abi.h (revision 9192) > +++ include/rdma/rdma_cma_abi.h (working copy) > @@ -33,14 +33,15 @@ > #ifndef RDMA_CMA_ABI_H > #define RDMA_CMA_ABI_H > > +#include > #include > > /* > * This file must be kept in sync with the kernel's version of rdma_user_cm.h > */ > > -#define RDMA_USER_CM_MIN_ABI_VERSION 1 > -#define RDMA_USER_CM_MAX_ABI_VERSION 2 > +#define RDMA_USER_CM_MIN_ABI_VERSION 3 > +#define RDMA_USER_CM_MAX_ABI_VERSION 3 Might it be a good idea to keep min ABI at 1, so that the library works e.g. against OFED kernel components? -- MST From mst at mellanox.co.il Wed Oct 25 23:11:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 26 Oct 2006 08:11:02 +0200 Subject: [openib-general] [openfabrics-ewg] new server up and running In-Reply-To: <20061025194152.GA12061@sashak.voltaire.com> References: <20061023232250.GA3118@cuprite.pathscale.com> <20061025105058.GA11682@mellanox.co.il> <20061025194152.GA12061@sashak.voltaire.com> Message-ID: <20061026061102.GA30178@mellanox.co.il> Quoting r. Sasha Khapyorsky : > > I think it is best to be running with --user-path=scm so that I can > > put git trees under scm sub-directory and have that part exported. > > '--user-path=scm' in addition to '--base-path'? Looks good for me. Could this be done please? I even think we should replace --base-path with --user-path=scm. As it is, we have e.g. /pub/scm/management.git -> /home/sashak/management.git which is just confusing. -- MST From tziporet at dev.mellanox.co.il Thu Oct 26 00:06:16 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 26 Oct 2006 09:06:16 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <5CE025EE7D88BA4599A2C8FEFCF226F501894376@taurus.voltaire.com> References: <453F831C.4010204@dev.mellanox.co.il> <5CE025EE7D88BA4599A2C8FEFCF226F501894376@taurus.voltaire.com> Message-ID: <45405E68.4060303@dev.mellanox.co.il> Hal Rosenstock wrote: > Yes, I plan on doing this moving forward. > > -- Hal > > What is your estimated time frame? Tziporet From jackm at dev.mellanox.co.il Thu Oct 26 00:28:11 2006 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Thu, 26 Oct 2006 09:28:11 +0200 Subject: [openib-general] [PATCH 1/2] ib/core/uverbs: return sq_draining value in query_qp response In-Reply-To: References: <200610251254.20996.jackm@dev.mellanox.co.il> <200610251849.25368.jackm@dev.mellanox.co.il> Message-ID: <200610260928.11547.jackm@dev.mellanox.co.il> On Wednesday 25 October 2006 20:50, Roland Dreier wrote: > Let's just take one of the > reserved fields that we're lucky to have in the query QP response, and > put sq_draining in there. > We need to remove the field "en_sqd_async_notify" from the structure. It is an error! The async_notify flag is valid ONLY for modify_qp, during the RTS->SQD transition. If you wish to use a different field for the sq_draining flag, please rename the en_sqd_async_notify field to "reserved1". BTW, I checked the low-level drivers for what they return in the attr->en_sqd_async_notify field. a. mthca: leaves this field explicitly undefined, with the following comment in procedure mthca_query_qp() : /* qp_attr->en_sqd_async_notify is only applicable in modify qp */ b. ipath: sets this field explicitly to zero (ipath_query_qp() ) c. ehca: leaves this field undefined (ehca_query_qp() ) All 3 providers do explicitly set the value of attr->sq_draining in _query_qp(). Conclusions: 1. I agree that we should use a previously reserved field in the response structure for returning the value of sq_draining. 2. IMHO, we must remove the field en_sqd_async_notify. It is erroneous and no one uses it (in fact the values currently returned in that field are undefined for 2 of the 3 providers, and no one noticed or cared enough to ask). Leaving it in is just asking for trouble. - Jack From mshefty at ichips.intel.com Thu Oct 26 01:13:58 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Oct 2006 01:13:58 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <45400E7B.2020600@3leafnetworks.com> References: <000001c6f7ec$15567d10$17fd070a@amr.corp.intel.com> <45400E7B.2020600@3leafnetworks.com> Message-ID: <45406E46.9040200@ichips.intel.com> Venkatesh Babu wrote: > Loading the alternate path may not necessarily indicate the failover. It > is just saying that alternate path is available. Failover actually > happens only when primary path's any components like local port or > remote port or switch port goes down. We should get some events when > this happens to cause the failover. From the CM's perspective, I see that two (independent) things need to occur: 1. The CM itself must failover to the alternate path. 2. The CM must allow loading a new alternate path. The call ib_send_cm_lap() should be used to load a new alternate path, and would be independent of failover. And ib_cm_init_qp_attr() could be used to notify the CM that failover has occurred, based on the state of the connection. - Sean From mshefty at ichips.intel.com Thu Oct 26 01:22:20 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Oct 2006 01:22:20 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <454010D5.2050605@3leafnetworks.com> References: <000201c6f7ed$db0c15f0$17fd070a@amr.corp.intel.com> <454010D5.2050605@3leafnetworks.com> Message-ID: <4540703C.5010302@ichips.intel.com> Venkatesh Babu wrote: > The patch for ib_sa_serv_notice_hdlr() handles multiple callers > registering for the same event by maintaining a linked list of callback > handlers. So I didn't think that the multiple user count was necessary. > The actual registration with the MAD layer happens only when the ib_sa > module is initialized and unregistered when it is unloaded. Since MADs can be delivered / processed out of order, there's a race if one user unregisters for events at the same time another user tries to register. The second user can be left unregistered. Requests to the SA for the same event must be serialized to ensure that we are left in the proper registration state. The problem becomes more complex when a user cancels their request while it's still being processed. - Sean From mshefty at ichips.intel.com Thu Oct 26 01:35:34 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Oct 2006 01:35:34 -0700 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <20061026050101.GA28980@mellanox.co.il> References: <15ddcffd0610251239r7bf4c034jd2b12f69ea0d43bc@mail.gmail.com> <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <20061026050101.GA28980@mellanox.co.il> Message-ID: <45407356.9090409@ichips.intel.com> Michael S. Tsirkin wrote: > Might it be a good idea to keep min ABI at 1, so that the library works > e.g. against OFED kernel components? I looked at this, but the API to the library changed in such a way that it's difficult to provide support for older kernel ABI versions with the updated userspace APIs. This is in part because older kernel components did not provide connection parameters as part of their events. I'm definitely for trying to support older ABIs, I just don't see a good way to do so. - Sean From ogerlitz at voltaire.com Thu Oct 26 01:37:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 26 Oct 2006 10:37:26 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <453F92E9.8050908@ichips.intel.com> References: <453F831C.4010204@dev.mellanox.co.il> <453F92E9.8050908@ichips.intel.com> Message-ID: <454073C6.7090509@voltaire.com> Sean Hefty wrote: > Tziporet Koren wrote: >> I want to suggest that you will create releases to the libraries you own >> (like Roland maintains for libibverbs). >> This will help us in OFED integration since we will be able to start the >> release from a known stable version, instead of taking code with unknown >> stability from svn. > > I'm aware of this for libibcm and librdmacm. I will start a release of > librdmacm once we have upstream support for a userspace rdma_cm. I've delayed > releasing libibcm until there's better userspace support for SA queries, but I > could be convinced otherwise. Sean, Per my understanding there is practically no IB native (ie without out-of-band data such as remote gid/lid etc) way to establish RC connections with using libibcm. On the other hand librdmacm does provide anything needed for apps using IP semantics. This bring the question why keep on supporting libibcm? Or. From mshefty at ichips.intel.com Thu Oct 26 01:51:55 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Oct 2006 01:51:55 -0700 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <454073C6.7090509@voltaire.com> References: <453F831C.4010204@dev.mellanox.co.il> <453F92E9.8050908@ichips.intel.com> <454073C6.7090509@voltaire.com> Message-ID: <4540772B.6000309@ichips.intel.com> Or Gerlitz wrote: > Per my understanding there is practically no IB native (ie without > out-of-band data such as remote gid/lid etc) way to establish RC > connections with using libibcm. On the other hand librdmacm does provide > anything needed for apps using IP semantics. This bring the question why > keep on supporting libibcm? Mainly because certain users (namely the national labs) want the lower level access. Personally, I favor the librdmacm for most users, but since it is an abstraction it does not currently provide the all of the features that a native IB interface provides, such as path failover or control over specific multicast settings. - Sean From halr at voltaire.com Thu Oct 26 02:37:02 2006 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 26 Oct 2006 11:37:02 +0200 Subject: [openib-general] creating releases for the libraries you own References: <453F831C.4010204@dev.mellanox.co.il> <5CE025EE7D88BA4599A2C8FEFCF226F501894376@taurus.voltaire.com> <45405E68.4060303@dev.mellanox.co.il> Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F50189437F@taurus.voltaire.com> Don't know yet. I need to go over things when I get back from staging. Is there a timeframe needed for an initial release (besides "yesterday") ? -- Hal ________________________________ From: Tziporet Koren [mailto:tziporet at dev.mellanox.co.il] Sent: Thu 10/26/2006 3:06 AM To: Hal Rosenstock Cc: Sean Hefty; OPENIB Subject: Re: creating releases for the libraries you own Hal Rosenstock wrote: > Yes, I plan on doing this moving forward. > > -- Hal > > What is your estimated time frame? Tziporet From tziporet at dev.mellanox.co.il Thu Oct 26 04:48:39 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 26 Oct 2006 13:48:39 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <5CE025EE7D88BA4599A2C8FEFCF226F50189437F@taurus.voltaire.com> References: <453F831C.4010204@dev.mellanox.co.il> <5CE025EE7D88BA4599A2C8FEFCF226F501894376@taurus.voltaire.com> <45405E68.4060303@dev.mellanox.co.il> <5CE025EE7D88BA4599A2C8FEFCF226F50189437F@taurus.voltaire.com> Message-ID: <4540A097.4090107@dev.mellanox.co.il> Hal Rosenstock wrote: > Don't know yet. I need to go over things when I get back from staging. Is there a timeframe needed for an initial release (besides "yesterday") ? > > -- Hal > > Well - I wish to have something before we start the 1.2 branch. 1.2 time frame is ~February but the final date will be closed on SC06. So something around end of this year should be good. Is this OK with you? Tziporet From halr at voltaire.com Thu Oct 26 04:56:55 2006 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 26 Oct 2006 13:56:55 +0200 Subject: [openib-general] creating releases for the libraries you own References: <453F831C.4010204@dev.mellanox.co.il> <5CE025EE7D88BA4599A2C8FEFCF226F501894376@taurus.voltaire.com> <45405E68.4060303@dev.mellanox.co.il> <5CE025EE7D88BA4599A2C8FEFCF226F50189437F@taurus.voltaire.com> <4540A097.4090107@dev.mellanox.co.il> Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F501894385@taurus.voltaire.com> That should be plenty of time. -- Hal ________________________________ From: Tziporet Koren [mailto:tziporet at dev.mellanox.co.il] Sent: Thu 10/26/2006 7:48 AM To: Hal Rosenstock Cc: Sean Hefty; OPENIB Subject: Re: creating releases for the libraries you own Hal Rosenstock wrote: > Don't know yet. I need to go over things when I get back from staging. Is there a timeframe needed for an initial release (besides "yesterday") ? > > -- Hal > > Well - I wish to have something before we start the 1.2 branch. 1.2 time frame is ~February but the final date will be closed on SC06. So something around end of this year should be good. Is this OK with you? Tziporet From eitan at mellanox.co.il Thu Oct 26 05:39:57 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 26 Oct 2006 14:39:57 +0200 Subject: [openib-general] [PATCH] opensm: autogen.sh: tools version verification fixes In-Reply-To: <20061023214011.GA6311@sashak.voltaire.com> References: <20061023214011.GA6311@sashak.voltaire.com> Message-ID: <4540AC9D.30008@mellanox.co.il> Seems right. Thanks. Sasha Khapyorsky wrote: > This fixes couple of things related to tools version verifications in > autogen.sh. Originally autogen.sh was claiming that automake-1.10 is > older that automake-1.6.3 and was failing with zero exit status, so: > > - regular expression fix - proper version string separation > - numeric camparison for extracted version elements > - non-zero exit status when old tools are detected > - slightly improved condition statements > > Signed-off-by: Sasha Khapyorsky > --- > osm/autogen.sh | 47 +++++++++++++++++++++-------------------------- > 1 files changed, 21 insertions(+), 26 deletions(-) > > diff --git a/osm/autogen.sh b/osm/autogen.sh > index 658d377..6570426 100755 > --- a/osm/autogen.sh > +++ b/osm/autogen.sh > @@ -1,4 +1,4 @@ > -#!/bin/bash > +#!/bin/bash > > # We change dir since the later utilities assume to work in the project dir > cd ${0%*/*} > @@ -7,49 +7,44 @@ # make sure autoconf is up-to-date > ac_ver=`autoconf --version | head -n 1 | awk '{print $NF}'` > ac_maj=`echo $ac_ver|sed 's/\..*//'` > ac_min=`echo $ac_ver|sed 's/.*\.//'` > -if [[ $ac_maj < 2 ]]; then > +if [[ $ac_maj -lt 2 ]]; then > echo Min autoconf version is 2.57 > - exit > -fi > -if [[ $ac_maj = 2 && $ac_min < 57 ]]; then > + exit 1 > +elif [[ $ac_maj -eq 2 && $ac_min -lt 57 ]]; then > echo Min autoconf version is 2.57 > - exit > + exit 1 > fi > > # make sure automake is up-to-date > am_ver=`automake --version | head -n 1 | awk '{print $NF}'` > am_maj=`echo $am_ver|sed 's/\..*//'` > -am_min=`echo $am_ver|sed 's/.*\.\([^\.]*\)\..*/\1/'` > -am_sub=`echo $am_ver|sed 's/.*\.//'` > -if [[ $am_maj < 1 ]]; then > +am_min=`echo $am_ver|sed 's/[^\.]*\.\([^\.]*\)\.*.*/\1/'` > +am_sub=`echo $am_ver|sed 's/[^\.]*\.[^\.]*\.*//'` > +if [[ $am_maj -lt 1 ]]; then > echo Min automake version is 1.6.3 > - exit > -fi > -if [[ $am_maj = 1 && $am_min < 6 ]]; then > + exit 1 > +elif [[ $am_maj -eq 1 && $am_min -lt 6 ]]; then > echo "automake version is too old:$am_maj.$am_min.$am_sub < required 1.6.3" > - exit > -fi > -if [[ $am_maj = 1 && $am_min = 6 && $am_sub < 3 ]]; then > + exit 1 > +elif [[ $am_maj -eq 1 && $am_min -eq 6 && $am_sub -lt 3 ]]; then > echo "automake version is too old:$am_maj.$am_min.$am_sub < required 1.6.3" > - exit > + exit 1 > fi > > # make sure libtool is up-to-date > lt_ver=`libtool --version | head -n 1 | awk '{print $4}'` > lt_maj=`echo $lt_ver|sed 's/\..*//'` > -lt_min=`echo $lt_ver|sed 's/.*\.\([^\.]*\)\..*/\1/'` > -lt_sub=`echo $lt_ver|sed 's/.*\.//'` > -if [[ $lt_maj < 1 ]]; then > +lt_min=`echo $lt_ver|sed 's/[^\.]*\.\([^\.]*\)\.*.*/\1/'` > +lt_sub=`echo $lt_ver|sed 's/[^\.]*\.[^\.]*\.*//'` > +if [[ $lt_maj -lt 1 ]]; then > echo Min libtool version is 1.4.2 > - exit > -fi > -if [[ $lt_maj = 1 && $lt_min < 4 ]]; then > + exit 1 > +elif [[ $lt_maj -eq 1 && $lt_min -lt 4 ]]; then > echo "automake version is too old:$lt_maj.$lt_min.$lt_sub < required 1.4.2" > - exit > -fi > -if [[ $lt_maj = 1 && $lt_min = 4 && $lt_sub < 2 ]]; then > + exit 1 > +elif [[ $lt_maj -eq 1 && $lt_min -eq 4 && $lt_sub -lt 2 ]]; then > echo "automake version is too old:$lt_maj.$lt_min.$lt_sub < required 1.4.2" > - exit > + exit 1 > fi > > # cleanup > From tom at opengridcomputing.com Thu Oct 26 07:19:35 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Thu, 26 Oct 2006 09:19:35 -0500 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9C88@venom2> Message-ID: Glenn: Thanks for posting the NES driver patch. This is great. When you post a patch there are a few conventions that people usually follow: - The patch contents should not be an attachment. This is because people like to reply to your patch with comments inline and including the patch as an attachment does not facilitate this. - BTW, You need to be careful with your mailer. Some mailers mess with the white space. - All patches should be plain text, not uuencoded, etc... - The patchset should include a [PATCH 0/x] mail message that describes what the patchset is for, "NES 10Gb RNIC Driver" or somesuch. - Each patch should contain a signature line: Signed-off-by: Glenn Grundstrom Take a look at some of the patches that have been submitted recently for examples. Also, here are some write ups with more extensive guidelines... http://linux.yyz.us/patch-format.html http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt Please repost per the above and we'll get started. Thanks again, Tom On 10/25/06 7:06 PM, "Glenn Grundstrom" wrote: > The following set of patches contain the source code for the NetEffect > NE010 adapter. Patches 1 through 9 contain the kernel driver source and > patches 10 through 14 contain the userspace components. Due to license > restrictions the NetEffect TCP/IP stack module cannot be released under > the OFA license, but is available from NetEffect under a separate > license agreement. NetEffect is in the process of developing a TCP > stack that will meet the OFA license requirements. For information on > licensing the NetEffect TCP stack module, contact > openfabrics at neteffect.com. > > Signed-off-by: Glenn Grundstrom > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Thu Oct 26 07:32:16 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 26 Oct 2006 09:32:16 -0500 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver In-Reply-To: References: Message-ID: <1161873136.4280.22.camel@stevo-desktop> Also please post the user stuff as a separate patch set. On Thu, 2006-10-26 at 09:19 -0500, Tom Tucker wrote: > Glenn: > > Thanks for posting the NES driver patch. This is great. > > When you post a patch there are a few conventions that people usually > follow: > > - The patch contents should not be an attachment. This is because people > like to reply to your patch with comments inline and including the patch as > an attachment does not facilitate this. > > - BTW, You need to be careful with your mailer. Some mailers mess with the > white space. > > - All patches should be plain text, not uuencoded, etc... > > - The patchset should include a [PATCH 0/x] mail message that describes what > the patchset is for, "NES 10Gb RNIC Driver" or somesuch. > > - Each patch should contain a signature line: > Signed-off-by: Glenn Grundstrom > > Take a look at some of the patches that have been submitted recently for > examples. > > Also, here are some write ups with more extensive guidelines... > > http://linux.yyz.us/patch-format.html > http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt > > Please repost per the above and we'll get started. > > Thanks again, > Tom > > > > On 10/25/06 7:06 PM, "Glenn Grundstrom" wrote: > > > The following set of patches contain the source code for the NetEffect > > NE010 adapter. Patches 1 through 9 contain the kernel driver source and > > patches 10 through 14 contain the userspace components. Due to license > > restrictions the NetEffect TCP/IP stack module cannot be released under > > the OFA license, but is available from NetEffect under a separate > > license agreement. NetEffect is in the process of developing a TCP > > stack that will meet the OFA license requirements. For information on > > licensing the NetEffect TCP stack module, contact > > openfabrics at neteffect.com. > > > > Signed-off-by: Glenn Grundstrom > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Thu Oct 26 07:45:34 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 26 Oct 2006 16:45:34 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> Message-ID: <4540CA0E.9020807@voltaire.com> Sean Hefty wrote: > Updates the librdmacm to work with ABI version 3, which is the proposed > kernel changes for inclusion in 2.6.20. > > Test programs are also updated. > OK, Sean, i have one system up and running, with kernel based on Roland's git plus patches 1-7 and user space based on the svn with the librdmacm patch. Will clone this config on Sunday such that i can actually run mckey and see it working. Thanks a lot for putting everything together... Anyway, during working on that i have noted two issues which need to be addressed: 1) librdmacm does not get built against libibverbs-1.0 (see below) so i am using libibverbs (ie the non released yet libibverbs1.1) 2) the cma rdma multicast does not let a consumer to join as send-only Or. Path: . URL: https://openib.org/svn/gen2/trunk/src/userspace/librdmacm Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd Revision: 9963 Node Kind: directory Schedule: normal Last Changed Author: swise Last Changed Rev: 9898 Last Changed Date: 2006-10-19 15:08:19 +0200 (Thu, 19 Oct 2006) make all-am make[1]: Entering directory `/home/ogerlitz/openib/infiniband-user-9963/librdmacm' if /bin/sh ./libtool --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -I/home/ogerlitz/ib/include -MT cma.lo -MD -MP -MF ".deps/cma.Tpo" -c -o cma.lo `test -f 'src/cma.c' || echo './'`src/cma.c; \ then mv -f ".deps/cma.Tpo" ".deps/cma.Plo"; else rm -f ".deps/cma.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -g -Wall -D_GNU_SOURCE -I/home/ogerlitz/ib/include -MT cma.lo -MD -MP -MF .deps/cma.Tpo -c src/cma.c -fPIC -DPIC -o .libs/cma.o src/cma.c: In function `rdma_disconnect': src/cma.c:901: error: structure has no member named `transport_type' src/cma.c:902: error: `IBV_TRANSPORT_IB' undeclared (first use in this function) src/cma.c:902: error: (Each undeclared identifier is reported only once src/cma.c:902: error: for each function it appears in.) src/cma.c:905: error: `IBV_TRANSPORT_IWARP' undeclared (first use in this function) src/cma.c: In function `ucma_copy_ud_event': src/cma.c:1177: warning: implicit declaration of function `ibv_copy_ah_attr_from_kern' make[1]: *** [cma.lo] Error 1 make[1]: Leaving directory `/home/ogerlitz/openib/infiniband-user-9963/librdmacm' make: *** [all] Error 2 From swise at opengridcomputing.com Thu Oct 26 09:06:34 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 26 Oct 2006 11:06:34 -0500 Subject: [openib-general] [PATCH 3/7 v2] for 2.6.20 rdma/cma: report connection data with event In-Reply-To: <000301c6f7bd$95d3e790$a6d4180a@amr.corp.intel.com> References: <000301c6f7bd$95d3e790$a6d4180a@amr.corp.intel.com> Message-ID: <1161878794.4280.27.camel@stevo-desktop> Sean, This patch fails to apply to the for-2.6.20 branch from roland's tree. Should it? On Tue, 2006-10-24 at 15:41 -0700, Sean Hefty wrote: > When establishing a connection, users of the rdma_cm provide connection > parameters during calls to rdma_connect() and rdma_accept(). These > parameters are not given to the remote side during connection establishment. > The result is that the remote side does not know parameters such as > initiator_depth and responder_resources until after a connection is > established, and then only by querying the QP attributes. This makes it > difficult to optimize resources before connecting or reject a connection > if it cannot provide the required resources. > > Signed-off-by: Sean Hefty > --- From mshefty at ichips.intel.com Thu Oct 26 09:13:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Oct 2006 09:13:15 -0700 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <4540CA0E.9020807@voltaire.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> Message-ID: <4540DE9B.7070900@ichips.intel.com> > 1) librdmacm does not get built against libibverbs-1.0 (see below) so i > am using libibverbs (ie the non released yet libibverbs1.1) I need to think about what we can do here. The librdmacm uses functionality not found in libibverbs-1.0. > 2) the cma rdma multicast does not let a consumer to join as send-only This would require some sort of change to the API and ABI, so if this is needed, I'd like to incorporate this now. (Adding it could be done by specifying join parameters.) Do we need/want this level of control in the librdmacm, or should users go to a direct IB interface for this? - Sean From mshefty at ichips.intel.com Thu Oct 26 09:26:37 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 26 Oct 2006 09:26:37 -0700 Subject: [openib-general] [PATCH 3/7 v2] for 2.6.20 rdma/cma: report connection data with event In-Reply-To: <1161878794.4280.27.camel@stevo-desktop> References: <000301c6f7bd$95d3e790$a6d4180a@amr.corp.intel.com> <1161878794.4280.27.camel@stevo-desktop> Message-ID: <4540E1BD.5080400@ichips.intel.com> Steve Wise wrote: > This patch fails to apply to the for-2.6.20 branch from roland's tree. > > Should it? Well, there wasn't a for-2.6.20 branch when I was making these... I'm using stg to manage these patches, but I'm in a branch based on for-2.6.19. I will need to move my copy of Roland's git tree forward and update this patch. - Sean From swise at opengridcomputing.com Thu Oct 26 09:30:31 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 26 Oct 2006 11:30:31 -0500 Subject: [openib-general] [PATCH 3/7 v2] for 2.6.20 rdma/cma: report connection data with event In-Reply-To: <4540E1BD.5080400@ichips.intel.com> References: <000301c6f7bd$95d3e790$a6d4180a@amr.corp.intel.com> <1161878794.4280.27.camel@stevo-desktop> <4540E1BD.5080400@ichips.intel.com> Message-ID: <1161880231.4280.31.camel@stevo-desktop> No problem. I applied them to for-2.6.20 and merged (dunno if its correct). But I'm building now. I wasn't sure which branch to use. Maybe in future you could indicate which branch the patch set is based on in the initial comments? Stevo. On Thu, 2006-10-26 at 09:26 -0700, Sean Hefty wrote: > Steve Wise wrote: > > This patch fails to apply to the for-2.6.20 branch from roland's tree. > > > > Should it? > > Well, there wasn't a for-2.6.20 branch when I was making these... I'm using stg > to manage these patches, but I'm in a branch based on for-2.6.19. I will need > to move my copy of Roland's git tree forward and update this patch. > > - Sean From mlleinin at hpcn.ca.sandia.gov Thu Oct 26 10:52:08 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Thu, 26 Oct 2006 10:52:08 -0700 Subject: [openib-general] New server svn up In-Reply-To: <20061026021635.GA14818@sashak.voltaire.com> References: <1161826839.26066.99.camel@localhost> <20061026021635.GA14818@sashak.voltaire.com> Message-ID: <1161885128.31851.16.camel@localhost> On Thu, 2006-10-26 at 04:16 +0200, Sasha Khapyorsky wrote: > Hi Matt, > > On 18:40 Wed 25 Oct , Matt Leininger wrote: > > > > Subversion is up and running on the new server at > > https://69.55.231.195/svn. > > That is great. > > What will be syncronization policy in staging period (with old, yet > current SVN repository)? Now I can see that commits >= r9957 are not > in new repo yet. > That's up to the developers. I suggest folks try out the new server and move over to using git/svn on it as soon as possible. We can figure out how to clean up or remove the svn user space tree during the summit as SC06. - Matt From rdreier at cisco.com Thu Oct 26 12:34:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 12:34:45 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <45400E7B.2020600@3leafnetworks.com> (Venkatesh Babu's message of "Wed, 25 Oct 2006 18:25:15 -0700") References: <000001c6f7ec$15567d10$17fd070a@amr.corp.intel.com> <45400E7B.2020600@3leafnetworks.com> Message-ID: > All the patches are attached to the bugzilla bug reports. bugzilla really isn't the best way to discuss patches. In the future, I would suggest sending the patches to openib-general so that everyone can see them and discuss them. - R. From rdreier at cisco.com Thu Oct 26 12:34:06 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 12:34:06 -0700 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver In-Reply-To: <1161873136.4280.22.camel@stevo-desktop> (Steve Wise's message of "Thu, 26 Oct 2006 09:32:16 -0500") References: <1161873136.4280.22.camel@stevo-desktop> Message-ID: And (while we're all complaining) please use different, descriptive titles for each patch in the set, and cc netdev at vger.kernel.org with the patches. Also, I have a question about the original message: > Due to license restrictions the NetEffect TCP/IP stack module cannot > be released under the OFA license, but is available from NetEffect > under a separate license agreement. What is the TCP/IP stack module?? What's wrong with the TCP stack we already have in the kernel? I think I can say pretty definitely that a second TCP stack has no hope of ever being merged. - R. From rdreier at cisco.com Thu Oct 26 13:51:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 13:51:34 -0700 Subject: [openib-general] [openfabrics-ewg] New server svn up In-Reply-To: <1161885128.31851.16.camel@localhost> (Matt Leininger's message of "Thu, 26 Oct 2006 10:52:08 -0700") References: <1161826839.26066.99.camel@localhost> <20061026021635.GA14818@sashak.voltaire.com> <1161885128.31851.16.camel@localhost> Message-ID: > That's up to the developers. I suggest folks try out the new server > and move over to using git/svn on it as soon as possible. We can > figure out how to clean up or remove the svn user space tree during the > summit as SC06. How does one use git on the new server? - R. From ggrundstrom at NetEffect.com Thu Oct 26 14:08:27 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 16:08:27 -0500 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E3C@venom2> One question regarding the userspace diff paths. I've scanned other email submissions and seen several different paths used that were accepted. All of the following have previously been used. osm/opensm/osm_pkey_mgr.c last_stable/src/userspace/libmthca/src/verbs.c src/device.c Seems like I should use new/src/userspace/libnes/* to follow the Mellanox style. Is that correct? For the kernel code I will use new/drivers/infiniband/hw/nes/*. Thanks, Glenn. -----Original Message----- From: Tom Tucker [mailto:tom at opengridcomputing.com] Sent: Thursday, October 26, 2006 9:20 AM To: Glenn Grundstrom; openib-general at openib.org Subject: Re: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver Glenn: Thanks for posting the NES driver patch. This is great. When you post a patch there are a few conventions that people usually follow: - The patch contents should not be an attachment. This is because people like to reply to your patch with comments inline and including the patch as an attachment does not facilitate this. - BTW, You need to be careful with your mailer. Some mailers mess with the white space. - All patches should be plain text, not uuencoded, etc... - The patchset should include a [PATCH 0/x] mail message that describes what the patchset is for, "NES 10Gb RNIC Driver" or somesuch. - Each patch should contain a signature line: Signed-off-by: Glenn Grundstrom Take a look at some of the patches that have been submitted recently for examples. Also, here are some write ups with more extensive guidelines... http://linux.yyz.us/patch-format.html http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt Please repost per the above and we'll get started. Thanks again, Tom On 10/25/06 7:06 PM, "Glenn Grundstrom" wrote: > The following set of patches contain the source code for the NetEffect > NE010 adapter. Patches 1 through 9 contain the kernel driver source > and patches 10 through 14 contain the userspace components. Due to > license restrictions the NetEffect TCP/IP stack module cannot be > released under the OFA license, but is available from NetEffect under > a separate license agreement. NetEffect is in the process of > developing a TCP stack that will meet the OFA license requirements. > For information on licensing the NetEffect TCP stack module, contact > openfabrics at neteffect.com. > > Signed-off-by: Glenn Grundstrom > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Thu Oct 26 14:16:55 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 26 Oct 2006 23:16:55 +0200 Subject: [openib-general] [openfabrics-ewg] New server svn up In-Reply-To: References: <1161826839.26066.99.camel@localhost> <20061026021635.GA14818@sashak.voltaire.com> <1161885128.31851.16.camel@localhost> Message-ID: <20061026211655.GH11425@sashak.voltaire.com> On 13:51 Thu 26 Oct , Roland Dreier wrote: > > That's up to the developers. I suggest folks try out the new server > > and move over to using git/svn on it as soon as possible. We can > > figure out how to clean up or remove the svn user space tree during the > > summit as SC06. > > How does one use git on the new server? To put your tree there you need user account. Then to make 'tree' publically available you can place it under ~rdreier/scm/ and this will be pullable as git://staging.openfabrics.org/~rdreier/tree , or under /pub/scm/ , then this will be available as git://staging.openfabrics.org/tree . Sasha From swise at opengridcomputing.com Thu Oct 26 14:11:34 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 26 Oct 2006 16:11:34 -0500 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E3C@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E3C@venom2> Message-ID: <1161897094.4280.42.camel@stevo-desktop> > Seems like I should use new/src/userspace/libnes/* to follow the > Mellanox style. Is that correct? > For the kernel code I will use new/drivers/infiniband/hw/nes/*. > That looks good to me. Steve. From worleys at gmail.com Thu Oct 26 14:20:16 2006 From: worleys at gmail.com (Chris Worley) Date: Thu, 26 Oct 2006 15:20:16 -0600 Subject: [openib-general] /etc/rocks-release screws up OFED 1.1.1 build process Message-ID: To fix it, make sure the test for rocks-release is at the end, or might as well delete it altogether, of build_env.sh: # Set Distribuition dependency environment dist_rpm="" if [ -f /etc/SuSE-release ]; then dist_rpm=$($RPM -qf /etc/SuSE-release) DISTRIBUTION="SuSE" elif [ -f /etc/fedora-release ]; then dist_rpm=$($RPM -qf /etc/fedora-release) DISTRIBUTION="fedora" elif [ -f /etc/redhat-release ]; then dist_rpm=$($RPM -qf /etc/redhat-release) DISTRIBUTION="redhat" elif [ -f /etc/rocks-release ]; then dist_rpm=$($RPM -qf /etc/rocks-release) DISTRIBUTION="Rocks" else dist_rpm="Unknown" DISTRIBUTION=$(ls /etc/*-release | head -n 1 | xargs -iXXX basename XXX -release 2> $NULL) [ -z "${DISTRIBUTION}" ] && DISTRIBUTION="Unknown" fi The problem is the rpm -qf returns an error: [root at c OFED-1.1.1]# rpm -qf /etc/redhat-release redhat-release-4AS-4.1 [root at c OFED-1.1.1]# rpm -qf /etc/rocks-release file /etc/rocks-release is not owned by any package Which, during the build, generates the error: ERROR: Failed executing "/bin/mv -f /var/tmp/OFEDRPM/RPMS/x86_64/dapl-1.2.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/dapl-devel-1.2.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/ipoibtools-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/kernel-ib-1.1-2.6.9_34.ELsmp.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/kernel-ib-devel-1.1-2.6.9_34.ELsmp.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibcm-0.9.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibcm-devel-0.9.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibcommon-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibcommon-devel-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibmad-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibmad-devel-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibumad-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibumad-devel-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibverbs-1.0.4-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibverbs-devel-1.0.4-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libibverbs-utils-1.0.4-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libipathverbs-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libipathverbs-devel-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libmthca-1.0.3-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libmthca-devel-1.0.3-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libopensm-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libopensm-devel-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libosmcomp-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libosmcomp-devel-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libosmvendor-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libosmvendor-devel-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/librdmacm-0.9.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/librdmacm-devel-0.9.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/librdmacm-utils-0.9.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/libsdp-1.1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/mstflint-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/openib-diags-1.1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/opensm-2.0.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/perftest-1.0-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/srptools-0.0.4-0.x86_64.rpm /var/tmp/OFEDRPM/RPMS/x86_64/tvflash-0.9.0-0.x86_64.rpm /export/tools/OFED-1.1.1/RPMS/file /etc/rocks-release is not owned by any package" From ggrundstrom at NetEffect.com Thu Oct 26 14:36:41 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 16:36:41 -0500 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E55@venom2> Roland, The TCP/IP stack module is an additional piece of software that we obtained under a BSD license. Since the OFA license is dual, we cannot submit it. John Hagerman reported this to the OFA board in September and it was agreed upon then. There is nothing wrong with the current Linux TCP stack and, in fact, we are working towards the goal of using as much functionality from it as we reasonably can. The bottom line is that we understand there is little hope of merging a second stack and are working that issue. Thanks, Glenn. -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Thursday, October 26, 2006 2:34 PM To: Steve Wise Cc: Tom Tucker; Glenn Grundstrom; openib-general at openib.org Subject: Re: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver And (while we're all complaining) please use different, descriptive titles for each patch in the set, and cc netdev at vger.kernel.org with the patches. Also, I have a question about the original message: > Due to license restrictions the NetEffect TCP/IP stack module cannot > be released under the OFA license, but is available from NetEffect > under a separate license agreement. What is the TCP/IP stack module?? What's wrong with the TCP stack we already have in the kernel? I think I can say pretty definitely that a second TCP stack has no hope of ever being merged. - R. From rdreier at cisco.com Thu Oct 26 14:43:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 14:43:07 -0700 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E3C@venom2> (Glenn Grundstrom's message of "Thu, 26 Oct 2006 16:08:27 -0500") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E3C@venom2> Message-ID: > One question regarding the userspace diff paths. I've scanned other > email submissions and seen several different paths used that were > accepted. All of the following have previously been used. It doesn't matter too much to me. As long as the patches apply (which is not always a given) then I can usually fix up the pathnames. For the kernel the standard rule ('should apply with -p1') is more important because that lets me use git tools in a very convenient way. - R. From venkatesh.babu at 3leafnetworks.com Thu Oct 26 15:10:07 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Thu, 26 Oct 2006 15:10:07 -0700 Subject: [openib-general] APM support in openib stack Message-ID: <4541323F.6020204@3leafnetworks.com> Yes, this is possible too. The alternate path specified in ib_send_cm_lap() can be used to set cm_id_priv->alt_av. Then in ib_cm_init_qp_attr() cm_id_priv->alt_av can be used to initialize the fields for making path_mig_state transitions to IB_MIG_REARM or IB_MIG_MIGRATED. VBabu Sean Hefty wrote: > Venkatesh Babu wrote: > >> Loading the alternate path may not necessarily indicate the failover. >> It is just saying that alternate path is available. Failover actually >> happens only when primary path's any components like local port or >> remote port or switch port goes down. We should get some events when >> this happens to cause the failover. > > > From the CM's perspective, I see that two (independent) things need to > occur: > > 1. The CM itself must failover to the alternate path. > 2. The CM must allow loading a new alternate path. > > The call ib_send_cm_lap() should be used to load a new alternate path, > and would be independent of failover. And ib_cm_init_qp_attr() could > be used to notify the CM that failover has occurred, based on the > state of the connection. > > > - Sean From venkatesh.babu at 3leafnetworks.com Thu Oct 26 15:11:05 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Thu, 26 Oct 2006 15:11:05 -0700 Subject: [openib-general] APM support in openib stack Message-ID: <45413279.2040400@3leafnetworks.com> I see your point. With my ULP module this was not the case, since this was the only module which was registering during the module init and unregistering during module exit. From the generic OFED stack's perspective, you are right. This condition needs to be addressed. This possibility exists in other interfaces of sa_query.c also. VBabu Sean Hefty wrote: > Venkatesh Babu wrote: > >> The patch for ib_sa_serv_notice_hdlr() handles multiple callers >> registering for the same event by maintaining a linked list of >> callback handlers. So I didn't think that the multiple user count was >> necessary. The actual registration with the MAD layer happens only >> when the ib_sa module is initialized and unregistered when it is >> unloaded. > > > Since MADs can be delivered / processed out of order, there's a race > if one user unregisters for events at the same time another user tries > to register. The second user can be left unregistered. Requests to > the SA for the same event must be serialized to ensure that we are > left in the proper registration state. > > The problem becomes more complex when a user cancels their request > while it's still being processed. > > - Sean From rdreier at cisco.com Thu Oct 26 14:45:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 14:45:57 -0700 Subject: [openib-general] [PATCH 1/14] NetEffect iWarp RNIC driver In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E55@venom2> (Glenn Grundstrom's message of "Thu, 26 Oct 2006 16:36:41 -0500") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9E55@venom2> Message-ID: > The TCP/IP stack module is an additional piece of software that we > obtained under a BSD license. Since the OFA license is dual, we cannot > submit it. John Hagerman reported this to the OFA board in September > and it was agreed upon then. There is nothing wrong with the current > Linux TCP stack and, in fact, we are working towards the goal of using > as much functionality from it as we reasonably can. The bottom line is > that we understand there is little hope of merging a second stack and > are working that issue. This would depend on the exact BSD license you have, but in general it is possible to rerelease BSD code under the GPL as well (since the BSD license itself gives you that right). And yes, this makes the OFA dual-license policy pretty silly. I'm still not clear why you're messing around with TCP stacks at all. Is this something your iWARP driver needs? Is the code you just submitted functional without the stack? (There's not much hope of merging a driver that requires some additional extra TCP stack blob to do anything) - R. From venkatesh.babu at 3leafnetworks.com Thu Oct 26 15:12:56 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Thu, 26 Oct 2006 15:12:56 -0700 Subject: [openib-general] APM support in openib stack Message-ID: <454132E8.3080008@3leafnetworks.com> Any comments on the issue described in the following email ? It doesn't look like a firmware problem. I had got the APM working on the same Mellanox HCA cards with IBGD 1.8.2 stack. With OFED 1.0 stack I am getting the following problem. I guess it is some problem in initializing the timers to the firmware. VBabu Venkatesh Babu wrote: > I have added couple of patches to the OFED stack as described in > bug#160, bug#172, and bug#159 and with this successfully tested the > APM functionality, except one issue. > > ISSUE: > If I pull the both the cables then there are no paths to the > destination, so RC QP connection is supposed to tear down. But it is > not working. > > 1. Create a RC QP and load both primary and alternate path > (I was setting rnr_retry_count = 6, retry_count = 6, > packet_life_time field of struct ib_sa_path_rec to 15 and also tried > with 12) > 2. Send some traffic over RC QP > 3. Disconnect the cable belonging to the primary path > 4. It smoothly fails over to alternate path and it becomes primary path. > > No affect to the traffic on that RC QP > 5. Remove the second cable belonging to the new primary path. > 6. Obviously traffic stops since there are no paths to the > destination. But for the outstanding WRs in the RC QP I don't get any > callback from the verbs layer describing whether it succeeded or > failed due to some error like IB_WC_RETRY_EXC_ERR. > When I query the RC QP properties it still shows that it is in > IB_QPS_RTS state. > > > Without APM functionality it behaves correctly - > 1. Create a RC QP and load only primary path > (I was setting rnr_retry_count = 6, retry_count = 6, > packet_life_time field of struct ib_sa_path_rec to 15 and also tried > with 12) > 2. Send some traffic over RC QP > 3. Disconnect the cable belonging to the primary path > 4. Obviously traffic stops since there are no paths to the > destination. For the outstanding WRs in the RC QP I do get a callback > from the verbs layer describing the first WR that it failed due to > error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR. > I will close this RC QP. From swise at opengridcomputing.com Thu Oct 26 15:20:18 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 26 Oct 2006 17:20:18 -0500 Subject: [openib-general] problem with 2.6.19? Message-ID: <1161901218.4280.55.camel@stevo-desktop> Hey Roland, I'm testing the Ammasso rnic on linus's latest (2.6.19-rc3+) and I'm having problems with dma_map_single(). The systems are Intel Dempsey processors (x86_64). The adapter seems to be dma'ing into the wrong memory. The patch below backs the usage of dma_map_single() back to using __pa() for converting kernel virtual addresses (from kmalloc) into bus addresses, and things work ok. So I'm wondering if the Ammasso driver is misusing dma_map_single()?? Or maybe the driver needs to do something at init time to request dma mappings? Any thoughts? Steve. ---- hack to use __pa() instead of dma_map_single() ---- diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c b/drivers/infiniband/hw/amso1100/c2_alloc.c index 028a60b..adf7fb3 100644 --- a/drivers/infiniband/hw/amso1100/c2_alloc.c +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c @@ -47,8 +47,12 @@ static int c2_alloc_mqsp_chunk(struct c2 if (new_head == NULL) return -ENOMEM; +#if 0 new_head->dma_addr = dma_map_single(c2dev->ibdev.dma_device, new_head, PAGE_SIZE, DMA_FROM_DEVICE); +#else + new_head->dma_addr = __pa(new_head); +#endif pci_unmap_addr_set(new_head, mapping, new_head->dma_addr); new_head->next = NULL; diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 9d7bcc5..f3452f1 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -270,10 +270,13 @@ static int c2_alloc_cq_buf(struct c2_dev (u8 *) pool_start, NULL, /* peer (currently unknown) */ C2_MQ_HOST_TARGET); - +#if 0 mq->host_dma = dma_map_single(c2dev->ibdev.dma_device, (void *)pool_start, q_size * msg_size, DMA_FROM_DEVICE); +#else + mq->host_dma = __pa(pool_start); +#endif pci_unmap_addr_set(mq, mapping, mq->host_dma); return 0; diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 30409e1..d2f9344 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -522,9 +522,13 @@ int c2_rnic_init(struct c2_dev *c2dev) err = -ENOMEM; goto bail1; } +#if 0 c2dev->rep_vq.host_dma = dma_map_single(c2dev->ibdev.dma_device, (void *)q1_pages, qsize * msgsize, DMA_FROM_DEVICE); +#else + c2dev->rep_vq.host_dma = __pa(q1_pages); +#endif pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma); pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages, (unsigned long long) c2dev->rep_vq.host_dma); @@ -545,9 +549,13 @@ int c2_rnic_init(struct c2_dev *c2dev) err = -ENOMEM; goto bail2; } +#if 0 c2dev->aeq.host_dma = dma_map_single(c2dev->ibdev.dma_device, (void *)q2_pages, qsize * msgsize, DMA_FROM_DEVICE); +#else + c2dev->aeq.host_dma = __pa(q2_pages); +#endif pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, (unsigned long long) c2dev->rep_vq.host_dma); From rdreier at cisco.com Thu Oct 26 15:26:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 15:26:49 -0700 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161901218.4280.55.camel@stevo-desktop> (Steve Wise's message of "Thu, 26 Oct 2006 17:20:18 -0500") References: <1161901218.4280.55.camel@stevo-desktop> Message-ID: Steve> The adapter seems to be dma'ing into the wrong memory. The Steve> patch below backs the usage of dma_map_single() back to Steve> using __pa() for converting kernel virtual addresses (from Steve> kmalloc) into bus addresses, and things work ok. Hmm. It might be interesting to hack the driver to print the result of both dma_map_single() and __pa() and see if they're different. Are you running on a 32-bit (i386) or 64-bit (x86_64) kernel? How much RAM do you have? Is the kernel using swiotlb? If so then you need to make sure your DMA_{TO,FROM} directions and dma_unmap calls are right, since otherwise the DMAed data won't be copied to/from the bounce buffer at the right time. If you're not using swiotlb then I'm somewhat mystified. I guess comparing what dma_map_single() and __pa() do might be enlightening. Another thing to do if you're patient would be to use git-bisect and figure out exactly which patch made amso1100 stop working. - R. From sobebike at gmail.com Thu Oct 26 15:58:03 2006 From: sobebike at gmail.com (SoBeBike) Date: Thu, 26 Oct 2006 17:58:03 -0500 Subject: [openib-general] uDAPL in OFED 1.1 Message-ID: Just installed OFED 1.1 on SLES 10 (2.6.16.21-0.8-smp) x86_64. I have existing uDAPL code which runs fine on OFED 1.0 (SLES 9). It does not work on OFED 1.1 SLES 10. dat_evd_create fails with DAT_INSUFFICIENT_RESOURCES even if I only create a single EVD. In order to have a common test case, I attempted to run dapltest, but it does not appear to be part of the OFED 1.1package. Was dapltest removed from the OFED package? If so, why? Is there some other common test in the OFED package that I should run to validate basic uDAPL functionality? thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Thu Oct 26 16:01:47 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 26 Oct 2006 16:01:47 -0700 Subject: [openib-general] uDAPL in OFED 1.1 Message-ID: AFAIK dapltest was never part of OFED 1.0, at least it never got built in the RPMs. Scott ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of SoBeBike Sent: Thursday, October 26, 2006 3:58 PM To: openib-general at openib.org Subject: [openib-general] uDAPL in OFED 1.1 Just installed OFED 1.1 on SLES 10 (2.6.16.21-0.8-smp) x86_64. I have existing uDAPL code which runs fine on OFED 1.0 (SLES 9). It does not work on OFED 1.1 SLES 10. dat_evd_create fails with DAT_INSUFFICIENT_RESOURCES even if I only create a single EVD. In order to have a common test case, I attempted to run dapltest, but it does not appear to be part of the OFED 1.1 package. Was dapltest removed from the OFED package? If so, why? Is there some other common test in the OFED package that I should run to validate basic uDAPL functionality? thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ggrundstrom at NetEffect.com Thu Oct 26 16:58:48 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 18:58:48 -0500 Subject: [openib-general] [PATCH 3/9] NetEffect 10Gb RNIC Driver: openfabrics connection manager c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EB8@venom2> Kernel driver patch 3 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_cm.c new/drivers/infiniband/hw/nes/nes_cm.c --- old/drivers/infiniband/hw/nes/nes_cm.c 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_cm.c 2006-10-25 10:36:29.000000000 -0500 @@ -0,0 +1,1204 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#define TCPOPT_TIMESTAMP 8 + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include +#include + +#include "nes.h" + +#define OS_LINUX +#define OS_LINUX_26 +#include +#include + +extern unsigned int send_first; + +struct nes_v4_quad +{ + UINT32 rsvd0; + UINT32 DstIpAdrIndex; /* Only most significant 5 bits are valid */ + UINT32 SrcIpadr; + UINT32 TcpPorts; /* src is low, dest is high */ +}; + +enum ietf_mpa_flags { + IETF_MPA_FLAGS_MARKERS = 0x80, /* receive Markers */ + IETF_MPA_FLAGS_CRC = 0x40, /* receive Markers */ + IETF_MPA_FLAGS_REJECT = 0x20, /* Reject */ +}; + +#define IEFT_MPA_KEY_REQ "MPA ID Req Frame" +#define IEFT_MPA_KEY_REP "MPA ID Rep Frame" + +struct ietf_mpa_req_resp_frame { + u8 key[16]; + u8 flags; + u8 rev; + u16 private_data_size; + u8 private_data[0]; +}; + +static void connect_worker(void *); +static void listen_worker(void *); + +extern int NesAdapterAdd(struct net_device *netdev); +extern int NesInitSockets(void); +extern void set_interface( + UINT32 ip_addr, + UINT32 mask, + UINT32 bcastaddr, + UINT32 type + ); +#define ADD_ADDR 1 +#define SET_ADDR 2 +#define DELETE_ADDR 3 + +extern void bdc_cleanup(void); +extern int mpa_version; + +unsigned char DriverNamePrefix[] = "iw_nes"; + +int nes_if_count = 0; + +#define MAX_NES_IFS 4 +struct nes_dev *nes_ifs[MAX_NES_IFS]= { 0 }; + + +/** + * nes_start_cm + * + * @param nesdev + * @param new_ifa + * + * @return int + */ +int nes_start_cm(struct nes_dev *nesdev, struct in_ifaddr *new_ifa) +{ + int result = 0; + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + nes_ifs[0] = nesdev; + + stack_ops_p->dhcp_control(0x00); + + // set ip and subnet mask + stack_ops_p->set_ip_info(ntohl(new_ifa->ifa_address), + ntohl(new_ifa->ifa_mask)); + stack_ops_p->set_dev_name(nesdev->netdev->name); + + if (nesdev->nes_stack_start == 0) { + stack_ops_p->stack_init(nesdev->netdev); + /* TODO: Deal with multiple IP addresses */ + nesdev->local_ipaddr = new_ifa->ifa_address; + + nesdev->nes_stack_start = 1; + } + + return result; +} + + +/** + * nes_stop_cm + * + * @param nesdev + * + * @return int + */ +int nes_stop_cm(struct nes_dev *nesdev) +{ + if (nesdev->nes_stack_start) + { + nesdev->nes_stack_start = 0; + stack_ops_p->stack_exit(nesdev->netdev); + } + return 0; +} + + +/** + * nes_update_arp + * + * @param pMacAddress + * @param u32IpAddress + * @param u32ArpTimeout + * @param u16Entry + * @param type + */ +void nes_update_arp(unsigned char *pMacAddress, u32 u32IpAddress, + u32 u32ArpTimeout, u16 u16Entry, u16 type) +{ + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_dev *nesdev; + unsigned long flags; + u32 cqp_head; + u16 arp_index; + + if (nes_ifs[0] == NULL) { + return; + } + + nesdev = nes_ifs[0]; + + dprintk("%s: pMacAddress = %p, type = %u.\n", __FUNCTION__, pMacAddress, type ); + if (NULL == pMacAddress) { + dprintk("%s: Received a Delete request for IP address 0x%08X, index %u).\n", + __FUNCTION__, u32IpAddress, u16Entry ); + nes_arp_table_update(nesdev, u32IpAddress, NES_ARP_INDEX_DELETE); + return; + } else { + dprintk("%s: Received an Update request for IP address 0x%08X, index %u, address %02X:%02X:%02X:%02X:%02X:%02X).\n", + __FUNCTION__, u32IpAddress, u16Entry, pMacAddress[0], + pMacAddress[1], pMacAddress[2], pMacAddress[3], pMacAddress[4], + pMacAddress[5]); + } + + /* Add the ARP Entry */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_MANAGE_ARP_CACHE | NES_CQP_ARP_PERM; + arp_index = nes_arp_table_update(nesdev, u32IpAddress, NES_ARP_INDEX_ADD); + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= (u32)PCI_FUNC(nesdev->pcidev->devfn) << NES_CQP_ARP_AEQ_INDEX_SHIFT; +// cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= ((u32)arp_index) << NES_CQP_ARP_AEQ_INDEX_SHIFT; + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = (u32)arp_index; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + if (1 == type) { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_ARP_VALID; + cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX] = + (((u32)pMacAddress[2])<<24) + (((u32)pMacAddress[3])<<16) + + (((u32)pMacAddress[4])<<8) + (u32)pMacAddress[5]; + cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_HIGH_IDX] = (((u32)pMacAddress[0])<<16) + (u32)pMacAddress[1]; + } else { + cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_HIGH_IDX] = 0; + } + + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX]); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX]); + cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_HIGH_IDX] = cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_HIGH_IDX]); + cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX] = cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX]); + + barrier(); + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); +} + + +/** + * connect_worker + * @param qp + */ +static void connect_worker(void *qp) +{ + unsigned long qplockflags; + UINTPTR socket; + struct socket *ksock; + struct nes_qp *nesqp = qp; + struct nes_dev *nesdev = to_nesdev(nesqp->ibqp.device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct iw_cm_id *cm_id = nesqp->cm_id; + struct nes_hw_qp_wqe *wqe; + struct iw_cm_event cm_event; + struct NES_sockaddr_in inet_addr; + struct NES_sockaddr_in new_socket_name; + struct nes_v4_quad nes_quad; + struct ib_qp_attr attr; + struct ietf_mpa_req_resp_frame *req_frame = nesqp->ietf_frame; + int kaddr_length; + int socket_bytes; + int err; + u16 resp_private_data_length; + + dprintk("Attempting to connect to 0x%08X:0x%04X on local port 0x%04X.\n", + ntohl(cm_id->remote_addr.sin_addr.s_addr), + ntohs(cm_id->remote_addr.sin_port), + ntohs(cm_id->local_addr.sin_port) ); + + memset( &inet_addr, 0, sizeof(inet_addr) ); + inet_addr.sin_len = sizeof( inet_addr ); + inet_addr.sin_family = NES_AF_INET; + inet_addr.sin_port = cm_id->remote_addr.sin_port; + inet_addr.sin_addr.NES_s_addr = cm_id->remote_addr.sin_addr.s_addr; + + err = stack_ops_p->sock_ops_p->connect( + nesqp->socket, + (struct NES_sockaddr *)&inet_addr, + sizeof(inet_addr)); + + dprintk("%s: Connect request returned %d.\n", __FUNCTION__, err); + + if (err < 0) { + dprintk("nes connect call returned %d.\n", err ); + goto conn_err0; + } + + /* send the req */ + strcpy(&req_frame->key[0], IEFT_MPA_KEY_REQ); + req_frame->flags = IETF_MPA_FLAGS_CRC; + /* TODO: allow configuration of the revision */ + /* TODO: Set context and registers properly */ + req_frame->rev = mpa_version; + + /* TODO: add retry logic checking the number of bytes sent */ + err = stack_ops_p->sock_ops_p->send(nesqp->socket, (char *)req_frame, + sizeof(*req_frame)+nesqp->private_data_len,0); + + dprintk("%s: Send for MPA request returned %d.\n", __FUNCTION__, err); + + if (err < 0) { + dprintk("nes send call returned %d.\n", err); + goto conn_err0; + } + + /* receive the reply */ + socket_bytes = 0; + do + { + err = stack_ops_p->sock_ops_p->recv(nesqp->socket, (char *)req_frame, sizeof(*req_frame),0); + + dprintk("%s: Recv for MPA reply returned %d.\n", __FUNCTION__, err); + + if (err < 0) { + goto conn_err0; + } + socket_bytes += err; + } while ( socket_bytes < sizeof(*req_frame) ); + + if (req_frame->flags&IETF_MPA_FLAGS_MARKERS) { + dprintk("%s: Peer specified markers in MPA reply. Aborting MPA negotiation\n", + __FUNCTION__ ); + /* TODO: Should send a reject */ + goto conn_err0; + } + if (req_frame->flags&IETF_MPA_FLAGS_CRC) { + dprintk("%s: Peer specified CRC in MPA reply. MPA version = %u.\n", + __FUNCTION__, req_frame->rev ); + } else { + dprintk("%s: Peer did not specified CRC in MPA reply. MPA version = %u.\n", + __FUNCTION__, req_frame->rev ); + } + + resp_private_data_length = be16_to_cpu(req_frame->private_data_size); + if (resp_private_data_length){ + if (resp_private_data_length>nesqp->private_data_len) + { + nesqp->ietf_frame = kzalloc(sizeof(*nesqp->ietf_frame)+resp_private_data_length, + GFP_KERNEL); + if (!nesqp->ietf_frame) + { + dprintk("%s: Error allocating response private data area.\n", + __FUNCTION__ ); + goto conn_err0; + } + *nesqp->ietf_frame = *req_frame; + kfree(req_frame); + req_frame = nesqp->ietf_frame; + } + err = stack_ops_p->sock_ops_p->recv(nesqp->socket, (char *)req_frame->private_data, resp_private_data_length,0); + + dprintk("%s: Recv for MPA response private data returned %d.\n", __FUNCTION__, err); + if (err < 0) { + goto conn_err0; + } + } + + stack_ops_p->accelerate_socket(nesqp->socket, nesqp->nesqp_context); + + nesqp->nesqp_context->tcpPorts = ntohs(cm_id->remote_addr.sin_port) << 16; + nesqp->nesqp_context->tcpPorts += ntohs(cm_id->local_addr.sin_port); + nesqp->nesqp_context->ip0 = ntohl(cm_id->remote_addr.sin_addr.s_addr); + + nesqp->nesqp_context->misc2 |= (u32)PCI_FUNC(nesdev->pcidev->devfn) << NES_QPCONTEXT_MISC2_SRC_IP_SHIFT; + nesqp->nesqp_context->arp_index_vlan |= ((u32)nes_arp_table_update(nesdev, nesqp->nesqp_context->ip0, NES_ARP_INDEX_RESOLVE))<<16; + nesqp->nesqp_context->ts_val_delta = jiffies - nes_read_indexed(nesdev->index_reg, NES_IDX_TCP_NOW); + nesqp->nesqp_context->ird_index = nesqp->hwqp.qp_id; + nesqp->nesqp_context->ird_ord_sizes |= (u32)1 << NES_QPCONTEXT_ORDIRD_IWARP_MODE_SHIFT; + /* Adjust tail for not having a LSMM */ + nesqp->hwqp.sq_tail = 1; + +#if defined(NES_SEND_FIRST_WRITE) + if (send_first) { + wqe = &nesqp->hwqp.sq_vbase[0]; + *((struct nes_qp **)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; + *((u64 *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) |= NES_SW_CONTEXT_ALIGN>>1; + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = cpu_to_le32(NES_IWARP_SQ_OP_RDMAW); + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = 0; + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX] = 0; + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX] = 0; + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX] = 0; + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX] = 0; + + /* use the reserved spot on the WQ for the extra first WQE */ + nesqp->nesqp_context->ird_ord_sizes &= ~(NES_QPCONTEXT_ORDIRD_LSMM_PRESENT | NES_QPCONTEXT_ORDIRD_WRPDU | NES_QPCONTEXT_ORDIRD_ALSMM); + nesqp->skip_lsmm = 1; + nesqp->hwqp.sq_tail = 0; + nes_write32(nesdev->regs + NES_WQE_ALLOC, (1 << 24) | 0x00800000 | nesqp->hwqp.qp_id); + } +#endif + + memset ( &nes_quad, 0, sizeof(nes_quad)); + + nes_quad.DstIpAdrIndex = (u32)PCI_FUNC(nesdev->pcidev->devfn) << 27; + nes_quad.SrcIpadr = cm_id->remote_addr.sin_addr.s_addr; + nes_quad.TcpPorts = cm_id->remote_addr.sin_port; + nes_quad.TcpPorts |= (u32)cm_id->local_addr.sin_port << 16; + + // Produce hash key + nesqp->hte_index = nes_crc32( TRUE, + NES_HASH_CRC_INITAL_VALUE, + NES_HASH_CRC_FINAL_XOR, + sizeof(nes_quad), + (PUINT8)&nes_quad, + ORDER, + REFIN, + REFOUT + ); + + dprintk("%s: HTE Index = 0x%08X, CRC = 0x%08X\n", __FUNCTION__, + nesqp->hte_index, nesqp->hte_index & nesadapter->hte_index_mask); + + nesqp->hte_index &= nesadapter->hte_index_mask; + nesqp->nesqp_context->hte_index = nesqp->hte_index; + + attr.qp_state = IB_QPS_RTS; + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE); + + kaddr_length = sizeof(new_socket_name); + stack_ops_p->sock_ops_p->getsockname( nesqp->socket, + (struct NES_sockaddr *)&new_socket_name, + &kaddr_length); + + cm_event.event = IW_CM_EVENT_CONNECT_REPLY; + cm_event.status = IW_CM_EVENT_STATUS_ACCEPTED; + cm_event.provider_data = cm_id->provider_data; + cm_event.local_addr.sin_family = new_socket_name.sin_family; + cm_event.local_addr.sin_port = new_socket_name.sin_port; + cm_event.local_addr.sin_addr.s_addr = new_socket_name.sin_addr.NES_s_addr; + cm_event.remote_addr = cm_id->remote_addr; + cm_event.private_data = &req_frame->private_data; + cm_event.private_data_len = resp_private_data_length; + + cm_id->event_handler(cm_id, &cm_event); + // kfree(req_frame); + + dprintk("%s: Exiting connect thread for QP%u\n", + __FUNCTION__, nesqp->hwqp.qp_id ); + return; + +conn_err0: + kfree(req_frame); + if (nesqp->cm_id) + { + spin_lock_irqsave(&nesqp->lock, qplockflags); + if (nesqp->ksock) { + ksock = nesqp->ksock; + socket = nesqp->socket; + nesqp->ksock = 0; + nesqp->socket = 0; + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + stack_ops_p->sock_ops_p->close( socket ); + sock_release(ksock); + } else { + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + } + cm_id->rem_ref(cm_id); + nesqp->cm_id = NULL; + cm_id->provider_data = NULL; + cm_event.event = IW_CM_EVENT_CONNECT_REPLY; + cm_event.status = IW_CM_EVENT_STATUS_REJECTED; + cm_event.provider_data = cm_id->provider_data; + cm_event.local_addr = cm_id->local_addr; + cm_event.remote_addr = cm_id->remote_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + + cm_id->event_handler(cm_id, &cm_event); + } +} + + +/** + * nes_sock_release + * + * @param nesqp + * @param qplockflags + */ +void nes_sock_release(struct nes_qp *nesqp, unsigned long *qplockflags) { + UINTPTR socket; + struct socket *ksock; + + ksock = nesqp->ksock; + socket = nesqp->socket; + nesqp->ksock = 0; + nesqp->socket = 0; + spin_unlock_irqrestore(&nesqp->lock, *qplockflags); + stack_ops_p->sock_ops_p->close( socket ); + sock_release(ksock); + spin_lock_irqsave(&nesqp->lock, *qplockflags); +} + + +/** + * nes_connect + * + * @param cm_id + * @param conn_param + * + * @return int + */ +int nes_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) +{ + int err; + int int_socket_opt; + u8 u8temp; + UINTPTR socket; + struct socket *ksock; + struct NES_sockaddr_in inet_addr; + struct sockaddr_in kinet_addr; + int kaddr_length; + struct nes_qp *nesqp; + struct nes_dev *nesdev = to_nesdev(cm_id->device); + struct ib_qp *ibqp; + + dprintk("%s:%s:%u: data len = %u, cm_id = %p, event handler = %p.\n", __FILE__, + __FUNCTION__, __LINE__, conn_param->private_data_len, cm_id, cm_id->event_handler); + + // update the NES stack routing table + // Unfortunately, cannot be done in interface event handler. Handler is called before routes are setup. + dprintk("call nes_update_rt\n"); + // stack_ops_p->update_route(nesdev->netdev->name); + stack_ops_p->dump_rt_table(); + + + ibqp = nes_get_qp(cm_id->device, conn_param->qpn); + if (!ibqp) + return -EINVAL; + nesqp = to_nesqp(ibqp); + + nesqp->ietf_frame = kzalloc(sizeof(*nesqp->ietf_frame)+conn_param->private_data_len, GFP_KERNEL); + if (!nesqp->ietf_frame) + return -ENOMEM; + + nesqp->active_conn = 1; + dprintk("%s: QP%u, Destination IP = 0x%08X, local = 0x%08X.\n", + __FUNCTION__, nesqp->hwqp.qp_id, + ntohl(cm_id->remote_addr.sin_addr.s_addr), + ntohl(cm_id->local_addr.sin_addr.s_addr)); + + socket = stack_ops_p->sock_ops_p->socket(NES_AF_INET, NES_SOCK_STREAM, 0); + dprintk("returned socket = %p.\n", (void *)socket); + nesqp->socket = socket; + + err = sock_create_kern(PF_INET, SOCK_STREAM, IPPROTO_TCP, &ksock); + if (err < 0) { + dprintk("kernel socket call returned %d.\n", err ); + stack_ops_p->sock_ops_p->close( socket ); + return err; + } + + dprintk("kernel socket = %p.\n", ksock ); + nesqp->ksock = ksock; + + memset( &kinet_addr, 0, sizeof(kinet_addr) ); + kinet_addr.sin_family = AF_INET; + kinet_addr.sin_port = cm_id->local_addr.sin_port; + kinet_addr.sin_addr.s_addr = cm_id->local_addr.sin_addr.s_addr; + err = ksock->ops->bind(ksock, (struct sockaddr *)&kinet_addr, sizeof(kinet_addr)); + if (err < 0) { + dprintk("kernel bind call returned %d.\n", err ); + sock_release(ksock); + stack_ops_p->sock_ops_p->close( socket ); + return err; + } + + memset( &kinet_addr, 0, sizeof(kinet_addr) ); + err = ksock->ops->getname(ksock, (struct sockaddr *)&kinet_addr, &kaddr_length,0); + if (err < 0) { + dprintk("kernel getname call returned %d.\n", err ); + sock_release(ksock); + stack_ops_p->sock_ops_p->close( socket ); + return err; + } + + dprintk("kernel getname call returned port = 0x%04X.\n", ntohs(kinet_addr.sin_port) ); + cm_id->local_addr.sin_port = kinet_addr.sin_port; + inet_addr.sin_len = sizeof( inet_addr ); + inet_addr.sin_family = NES_AF_INET; + inet_addr.sin_port = cm_id->local_addr.sin_port; + inet_addr.sin_addr.NES_s_addr = cm_id->local_addr.sin_addr.s_addr; + err = stack_ops_p->sock_ops_p->bind( + socket, + (struct NES_sockaddr *)&inet_addr, + sizeof(inet_addr)); + + if (err < 0) { + dprintk("nes bind call returned %d.\n", err ); + sock_release(ksock); + stack_ops_p->sock_ops_p->close( socket ); + return err; + } + + int_socket_opt = 1; + err = stack_ops_p->sock_ops_p->setsockopt( + socket, NES_SOL_SOCKET, NES_TCP_NODELAY, + (char *)&int_socket_opt, sizeof(int_socket_opt)); + + if (err < 0) { + dprintk("nes setsockopt (TCP_NODELAY) call returned %d.\n", err ); + } + + int_socket_opt = 0; + u8temp = 1 << (ntohs(cm_id->local_addr.sin_port)&7); + nesdev->apbv_table[ntohs(cm_id->local_addr.sin_port)>>3] |= u8temp; + + /* Cache the cm_id in the qp */ + nesqp->cm_id = cm_id; + cm_id->provider_data = nesqp; + /* Associate QP <--> CM_ID */ + cm_id->add_ref(cm_id); + + /* Copy the private data */ + if (conn_param->private_data_len) { + memcpy(nesqp->ietf_frame->private_data, conn_param->private_data, + conn_param->private_data_len); + } + nesqp->ietf_frame->private_data_size = cpu_to_be16(conn_param->private_data_len); + nesqp->private_data_len = conn_param->private_data_len; + nesqp->nesqp_context->ird_ord_sizes |= (u32)conn_param->ord; + dprintk("%s:requested ord = 0x%08X.\n", __FUNCTION__, (u32)conn_param->ord ); + + // start a worker thread + nesqp->wq = create_singlethread_workqueue("NesConnectWQ"); + INIT_WORK(&nesqp->work, connect_worker, nesqp); + queue_work(nesqp->wq, &nesqp->work); + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + return err; +} + + +/** + * nes_disconnect_worker + * + * @param qp + */ +void nes_disconnect_worker(void *qp) +{ + struct nes_qp *nesqp = qp; + // struct nes_dev *nesdev = to_nesdev(nesqp->ibqp.device); + // struct iw_cm_id *cm_id = nesqp->cm_id; + // struct iw_cm_event cm_event; + struct ib_qp_attr attr; + // u8 u8temp; + + dprintk("%s: Disconnecting qp%u after AE\n", __FUNCTION__, nesqp->hwqp.qp_id ); + + switch (nesqp->ibqp_state) { + case IB_QPS_RTS: + /* this should be a FIN received */ + attr.qp_state = IB_QPS_SQD; + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE ); + break; + case IB_QPS_SQD: + /* this should be a Close complete */ + attr.qp_state = IB_QPS_SQD; + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE ); + break; + case IB_QPS_SQE: + /* TODO: Add Terminate received processing */ + break; + default: + dprintk("%s: Should not be here. QP%u state = %u.\n", __FUNCTION__, nesqp->hwqp.qp_id, nesqp->ibqp_state ); + + } + + return; +} + + +/** + * nes_disconnect + * + * @param cm_id + * @param abrupt + * + * @return int + */ +int nes_disconnect(struct iw_cm_id *cm_id, int abrupt) +{ + struct ib_qp_attr attr; + struct ib_qp *ibqp; + struct nes_qp *nesqp; + struct nes_dev *nesdev = to_nesdev(cm_id->device); + int err = 0; + u8 u8temp; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + /* If the qp was already destroyed, then there's no QP */ + if (cm_id->provider_data == 0) + return 0; + + nesqp = (struct nes_qp *)cm_id->provider_data; + ibqp = &nesqp->ibqp; + + /* Disassociate the QP from this cm_id */ + cm_id->provider_data = 0; + cm_id->rem_ref(cm_id); + nesqp->cm_id = 0; + + stack_ops_p->decelerate_socket(nesqp->socket, + (struct nes_uploaded_qp_context *) + nesqp->nesqp_context); + + if (nesqp->active_conn) { + u8temp = 1 << (ntohs(cm_id->local_addr.sin_port)&7); + nesdev->apbv_table[ntohs(cm_id->local_addr.sin_port)>>3] &= ~(u8temp); + } else { + dev_put(nesdev->netdev); + /* Need to free the Last Streaming Mode Message */ + pci_free_consistent(nesdev->pcidev, + nesqp->private_data_len+sizeof(*nesqp->ietf_frame), + nesqp->ietf_frame, + nesqp->ietf_frame_pbase); + } + + if (nesqp->ksock) sock_release(nesqp->ksock); + stack_ops_p->sock_ops_p->close( nesqp->socket ); + nesqp->ksock = 0; + nesqp->socket = 0; + if (nesqp->wq) { + destroy_workqueue(nesqp->wq); + nesqp->wq = NULL; + } + + memset(&attr, 0, sizeof(struct ib_qp_attr)); + if (abrupt) + attr.qp_state = IB_QPS_ERR; + else + attr.qp_state = IB_QPS_SQD; + + return err; +} + + +/** + * nes_accept + * + * @param cm_id + * @param conn_param + * + * @return int + */ +int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) +{ + struct nes_qp *nesqp; + struct nes_dev *nesdev; + struct nes_adapter *nesadapter; + struct ib_qp *ibqp; + struct nes_hw_qp_wqe *wqe; + struct nes_v4_quad nes_quad; + struct ib_qp_attr attr; + struct iw_cm_event cm_event; + + dprintk("%s:%s:%u: data len = %u\n", + __FILE__, __FUNCTION__, __LINE__, conn_param->private_data_len); + + ibqp = nes_get_qp(cm_id->device, conn_param->qpn); + if (!ibqp) + return -EINVAL; + nesqp = to_nesqp(ibqp); + nesdev = to_nesdev(nesqp->ibqp.device); + nesadapter = nesdev->nesadapter; + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + nesqp->ietf_frame = pci_alloc_consistent(nesdev->pcidev, + sizeof(*nesqp->ietf_frame)+conn_param->private_data_len, + &nesqp->ietf_frame_pbase); + if (!nesqp->ietf_frame) { + dprintk(KERN_ERR PFX "%s: Unable to allocate memory for private data\n", __FUNCTION__); + return -ENOMEM; + } + dprintk(PFX "%s: PCI consistent memory for " + "private data located @ %p (pa = 0x%08lX.) size = %u.\n", + __FUNCTION__, nesqp->ietf_frame, (unsigned long)nesqp->ietf_frame_pbase, + conn_param->private_data_len+sizeof(*nesqp->ietf_frame)); + nesqp->private_data_len = conn_param->private_data_len; + + strcpy(&nesqp->ietf_frame->key[0], IEFT_MPA_KEY_REP); + memcpy(&nesqp->ietf_frame->private_data, conn_param->private_data, conn_param->private_data_len); + nesqp->ietf_frame->private_data_size = cpu_to_be16(conn_param->private_data_len); + nesqp->ietf_frame->rev = mpa_version; + nesqp->ietf_frame->flags = IETF_MPA_FLAGS_CRC; + + wqe = &nesqp->hwqp.sq_vbase[0]; + *((struct nes_qp **)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; + *((u64 *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) |= NES_SW_CONTEXT_ALIGN>>1; + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = cpu_to_le32(NES_IWARP_SQ_WQE_STREAMING); + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = cpu_to_le32(conn_param->private_data_len+sizeof(*nesqp->ietf_frame)); + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32((u32)nesqp->ietf_frame_pbase); + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX] = cpu_to_le32((u32)((u64)nesqp->ietf_frame_pbase>>32)); + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX] = cpu_to_le32(conn_param->private_data_len+sizeof(*nesqp->ietf_frame)); + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX] = 0; + + nesqp->nesqp_context->ird_ord_sizes |= NES_QPCONTEXT_ORDIRD_LSMM_PRESENT | NES_QPCONTEXT_ORDIRD_WRPDU; + nesqp->skip_lsmm = 1; + + /* Cache the cm_id in the qp */ + nesqp->cm_id = cm_id; + nesqp->socket = (u32)cm_id->provider_data; + nesqp->ksock = 0; + cm_id->provider_data = nesqp; + nesqp->active_conn = 0; + /* Just save the private data here and set context bits */ + stack_ops_p->accelerate_socket(nesqp->socket, nesqp->nesqp_context); + nesqp->nesqp_context->tcpPorts = ntohs(cm_id->remote_addr.sin_port) << 16; + nesqp->nesqp_context->tcpPorts += ntohs(cm_id->local_addr.sin_port); + nesqp->nesqp_context->ip0 = ntohl(cm_id->remote_addr.sin_addr.s_addr); + nesqp->nesqp_context->misc2 |= (u32)PCI_FUNC(nesdev->pcidev->devfn) << NES_QPCONTEXT_MISC2_SRC_IP_SHIFT; + nesqp->nesqp_context->arp_index_vlan |= ((u32)nes_arp_table_update(nesdev, nesqp->nesqp_context->ip0, NES_ARP_INDEX_RESOLVE))<<16; + nesqp->nesqp_context->ts_val_delta = jiffies - nes_read_indexed(nesdev->index_reg, NES_IDX_TCP_NOW); + nesqp->nesqp_context->ird_index = nesqp->hwqp.qp_id; + nesqp->nesqp_context->ird_ord_sizes |= (u32)1 << NES_QPCONTEXT_ORDIRD_IWARP_MODE_SHIFT; + nesqp->nesqp_context->ird_ord_sizes |= (u32)conn_param->ord; + + memset ( &nes_quad, 0, sizeof(nes_quad)); + + nes_quad.DstIpAdrIndex = (u32)PCI_FUNC(nesdev->pcidev->devfn) << 27; + nes_quad.SrcIpadr = cm_id->remote_addr.sin_addr.s_addr; + nes_quad.TcpPorts = cm_id->remote_addr.sin_port; + nes_quad.TcpPorts |= (u32)cm_id->local_addr.sin_port << 16; + + // Produce hash key + nesqp->hte_index = nes_crc32( TRUE, + NES_HASH_CRC_INITAL_VALUE, + NES_HASH_CRC_FINAL_XOR, + sizeof(nes_quad), + (PUINT8)&nes_quad, + ORDER, + REFIN, + REFOUT + ); + + dprintk("%s: HTE Index = 0x%08X, CRC = 0x%08X\n", + __FUNCTION__, nesqp->hte_index, + nesqp->hte_index & nesadapter->hte_index_mask); + + nesqp->hte_index &= nesadapter->hte_index_mask; + nesqp->nesqp_context->hte_index = nesqp->hte_index; + + attr.qp_state = IB_QPS_RTS; + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE ); + cm_id->add_ref(cm_id); + + cm_event.event = IW_CM_EVENT_ESTABLISHED; + cm_event.status = IW_CM_EVENT_STATUS_ACCEPTED; + cm_event.provider_data = (void *)nesqp; + cm_event.local_addr = cm_id->local_addr; + cm_event.remote_addr = cm_id->remote_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + + cm_id->event_handler(cm_id, &cm_event); + + return 0; +} + + +/** + * nes_reject + * + * @param cm_id + * @param pdata + * @param pdata_len + * + * @return int + */ +int nes_reject(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + stack_ops_p->sock_ops_p->close( (UINTPTR)cm_id->provider_data ); + return 0; +} + + +/** + * listen_worker + * + * @param listener + */ +static void listen_worker( void *listener ) +{ + struct nes_listener *nes_listener = listener; + struct nes_dev *nesdev = nes_listener->nesdev; + struct iw_cm_id *cm_id = nes_listener->cm_id; + struct iw_cm_event cm_event; + struct NES_sockaddr_in inet_addr; + struct NES_sockaddr_in new_socket_name; + struct ietf_mpa_req_resp_frame req_frame; + char *private_data = NULL; + UINTPTR new_socket; + int kaddr_length; + int err; + int socket_bytes; + u16 req_private_data_length; + + cm_id->add_ref(cm_id); + do { + dprintk("Issuing Accept on 0x%08X:0x%04X (socket 0x%0lX), netdev->refcnt = %u.\n", + ntohl(cm_id->local_addr.sin_addr.s_addr), + ntohs(cm_id->local_addr.sin_port), nes_listener->socket, atomic_read(&nesdev->netdev->refcnt)); + + kaddr_length = sizeof(inet_addr); + new_socket = stack_ops_p->sock_ops_p->accept(nes_listener->socket, + (struct NES_sockaddr *)&inet_addr, &kaddr_length); + + dprintk("%s: Accept request returned %d.\n", __FUNCTION__, + (int)new_socket ); + if ((int)new_socket < 0) { + if (-NES_ECONNABORTED != (int)new_socket) + { + cm_event.event = IW_CM_EVENT_CONNECT_REQUEST; + cm_event.status = IW_CM_EVENT_STATUS_EINVAL; + cm_event.provider_data = (void *)new_socket; + cm_event.local_addr = nes_listener->cm_id->local_addr; + cm_event.remote_addr.sin_family = AF_INET; + cm_event.remote_addr.sin_port = inet_addr.sin_port; + cm_event.remote_addr.sin_addr.s_addr = inet_addr.sin_addr.NES_s_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + continue; + } + break; + } + + dprintk("Accept address info: 0x%08X:0x%04X.netdev->refcnt = %u\n", + ntohl(inet_addr.sin_addr.NES_s_addr), + ntohs(inet_addr.sin_port), atomic_read(&nesdev->netdev->refcnt)); + + /* Issue receive for IETF mode request */ + socket_bytes = 0; + do + { + err = stack_ops_p->sock_ops_p->recv(new_socket, (char *)&req_frame, sizeof(req_frame),0); + + dprintk("%s: Recv for MPA request returned %d.\n", __FUNCTION__, err ); + + if (err < 0) { + goto accept_err0; + } + socket_bytes += err; + } while (socket_bytes < sizeof(req_frame)); + + if (req_frame.flags&IETF_MPA_FLAGS_MARKERS) + { + dprintk("%s: Peer specified Markers in MPA request. Aborting MPA negotiation \n", + __FUNCTION__ ); + goto accept_err0; + } + if (req_frame.flags&IETF_MPA_FLAGS_CRC) { + dprintk("%s: Peer specified CRC in MPA reply. MPA version = %u.\n", + __FUNCTION__, req_frame.rev ); + } else { + dprintk("%s: Peer did not specified CRC in MPA reply. MPA version = %u.\n", + __FUNCTION__, req_frame.rev ); + } + + req_private_data_length = be16_to_cpu(req_frame.private_data_size); + if (req_private_data_length) { + private_data = kzalloc(req_private_data_length, GFP_KERNEL); + if (!private_data) + { + dprintk("%s: Error allocating req private data area.\n", __FUNCTION__ ); + goto accept_err0; + } + err = stack_ops_p->sock_ops_p->recv(new_socket, private_data, req_private_data_length,0); + + dprintk("%s: Recv for MPA request private data returned %d.\n", __FUNCTION__, err ); + if (err < 0) { + goto accept_err0; + } + } + + kaddr_length = sizeof(new_socket_name); + stack_ops_p->sock_ops_p->getsockname( new_socket, + (struct NES_sockaddr *)&new_socket_name, + &kaddr_length); + + cm_event.event = IW_CM_EVENT_CONNECT_REQUEST; + cm_event.status = IW_CM_EVENT_STATUS_OK; + cm_event.provider_data = (void *)new_socket; + cm_event.local_addr.sin_family = new_socket_name.sin_family; + cm_event.local_addr.sin_port = new_socket_name.sin_port; + cm_event.local_addr.sin_addr.s_addr = new_socket_name.sin_addr.NES_s_addr; + cm_event.remote_addr.sin_family = AF_INET; + cm_event.remote_addr.sin_port = inet_addr.sin_port; + cm_event.remote_addr.sin_addr.s_addr = inet_addr.sin_addr.NES_s_addr; + cm_event.private_data = private_data; + cm_event.private_data_len = req_private_data_length; + + cm_id->event_handler(cm_id, &cm_event); + + if (private_data) + { + } + + private_data = NULL; + continue; + +accept_err0: + if (private_data) + kfree(private_data); + private_data = NULL; + stack_ops_p->sock_ops_p->close( new_socket ); + + } while (1); + + dprintk("Exiting Listener worker thread \n" ); + nes_listener->accept_failed = 1; + return; +} + + +/** + * nes_create_listen + * + * @param cm_id + * @param backlog + * + * @return int + */ +int nes_create_listen(struct iw_cm_id *cm_id, int backlog) +{ + int err; + int int_socket_opt; + u8 u8temp; + UINTPTR socket; + struct socket *ksock; + struct nes_listener *nes_listener; + struct sockaddr_in kinet_addr; + struct NES_sockaddr_in inet_addr; + int kaddr_length; + struct nes_dev *nesdev = to_nesdev(cm_id->device); + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + // Allocate a listener + nes_listener = kzalloc(sizeof *nes_listener, GFP_KERNEL); + if (NULL == nes_listener) { + dprintk("%s:%s: Error allocating listener.\n", __FILE__, __FUNCTION__ ); + return -ENOMEM; + } + + dprintk("%s: socket function pointer = %p, listener = %p, cm_id = %p, event_handler = %p, netdev->refcnt = %u.\n", + __FUNCTION__, stack_ops_p->sock_ops_p->socket, nes_listener, cm_id, cm_id->event_handler, + atomic_read(&nesdev->netdev->refcnt) ); + nes_listener->nesdev = nesdev; + socket = stack_ops_p->sock_ops_p->socket( NES_AF_INET, NES_SOCK_STREAM, 0 ); + dprintk("returned socket = %p.\n", (void *)socket ); + if ((long)socket < 0) { + dprintk("NES socket call returned %d.\n", (int)socket ); + return (int)socket; + } + nes_listener->socket = socket; + + err = sock_create_kern(PF_INET, SOCK_STREAM, IPPROTO_TCP, &ksock); + if (err < 0) { + dprintk("kernel socket call returned %d.\n", err ); + stack_ops_p->sock_ops_p->close( socket ); + return err; + } + + dprintk("kernel socket = %p.\n", ksock ); + nes_listener->ksock = ksock; + + memset( &kinet_addr, 0, sizeof(kinet_addr) ); + kinet_addr.sin_family = AF_INET; + kinet_addr.sin_port = cm_id->local_addr.sin_port; + kinet_addr.sin_addr.s_addr = cm_id->local_addr.sin_addr.s_addr; + err = ksock->ops->bind(ksock, (struct sockaddr *)&kinet_addr, sizeof(kinet_addr)); + if (err < 0) { + dprintk("kernel bind call returned %d.\n", err ); + goto release_sockets0; + } + + memset( &kinet_addr, 0, sizeof(kinet_addr) ); + err = ksock->ops->getname(ksock, (struct sockaddr *)&kinet_addr, &kaddr_length,0); + if (err < 0) { + dprintk("kernel getname call returned %d.\n", err ); + goto release_sockets0; + } + + dprintk("kernel getname call returned port = 0x%04X.\n", kinet_addr.sin_port ); + cm_id->local_addr.sin_port = kinet_addr.sin_port; + inet_addr.sin_len = sizeof( inet_addr ); + inet_addr.sin_family = NES_AF_INET; + inet_addr.sin_port = cm_id->local_addr.sin_port; + inet_addr.sin_addr.NES_s_addr = cm_id->local_addr.sin_addr.s_addr; + err = stack_ops_p->sock_ops_p->bind(socket, (struct NES_sockaddr *)&inet_addr, + sizeof(inet_addr)); + if (err < 0) { + dprintk("NES Socket bind call returned %d.\n", err ); + goto release_sockets0; + } + + int_socket_opt = 1; + err = stack_ops_p->sock_ops_p->setsockopt(socket, NES_SOL_SOCKET, NES_TCP_NODELAY, + (char *)&int_socket_opt, sizeof(int_socket_opt)); + + if (err < 0) { + dprintk("%s: nes setsockopt (TCP_NODELAY) call returned %d.\n", __FUNCTION__, err ); + } + + int_socket_opt = 0; + err = stack_ops_p->sock_ops_p->setsockopt(socket, NES_SOL_SOCKET, TCPOPT_TIMESTAMP, + (char *)&int_socket_opt, sizeof(int_socket_opt)); + + if (err < 0) { + dprintk("%s: nes setsockopt (TCPOPT_TIMESTAMP) call returned %d.\n", __FUNCTION__, err ); + } + + int_socket_opt = (496*1024)-8; + err = stack_ops_p->sock_ops_p->setsockopt(socket, NES_SOL_SOCKET, NES_SO_RCVBUF, + (char *)&int_socket_opt, sizeof(int_socket_opt)); + + if (err < 0) { + dprintk("%s: nes setsockopt (NES_SO_RECVBUF) call returned %d.\n", __FUNCTION__, err); + } + + u8temp = 1 << (ntohs(cm_id->local_addr.sin_port)&7); + nesdev->apbv_table[ntohs(cm_id->local_addr.sin_port)>>3] |= u8temp; + + dprintk("Attempting to listen on 0x%08X:0x%04X.\n", + ntohl(cm_id->local_addr.sin_addr.s_addr), + ntohs(cm_id->local_addr.sin_port) ); + + err = stack_ops_p->sock_ops_p->listen(socket, backlog ); + + dprintk("Listen request returned %X.\n", err ); + + if (err < 0) { + dprintk("NES Socket listen call returned %d.\n", err ); + goto release_sockets0; + } + + dprintk("Setting cm_id->provider_data for listen to %p.\n", nes_listener ); + nes_listener->cm_id = cm_id; + cm_id->provider_data = nes_listener; + + // start a worker thread + nes_listener->wq = create_singlethread_workqueue("NesListenerWQ"); + + INIT_WORK(&nes_listener->work, listen_worker, nes_listener); + queue_work(nes_listener->wq, &nes_listener->work); + dprintk("%s: Exiting create listen, netdev->refcnt = %u.\n", __FUNCTION__, + atomic_read(&nesdev->netdev->refcnt) ); + return 0; + +release_sockets0: + sock_release(ksock); + stack_ops_p->sock_ops_p->close( socket ); + return err; +} + + +/** + * nes_destroy_listen + * + * @param cm_id + * + * @return int + */ +int nes_destroy_listen(struct iw_cm_id *cm_id) +{ + struct nes_listener *nes_listener = (struct nes_listener *)(unsigned long)cm_id->provider_data; + struct nes_dev *nesdev = to_nesdev(cm_id->device); + int err; + u8 u8temp; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + err = 0; + + u8temp = 1 << (ntohs(cm_id->local_addr.sin_port)&7); + nesdev->apbv_table[ntohs(cm_id->local_addr.sin_port)>>3] &= ~(u8temp); + + sock_release(nes_listener->ksock); + stack_ops_p->sock_ops_p->close( nes_listener->socket ); + + do { + msleep(1); + } while( 0 == nes_listener->accept_failed ); + + // dprintk("%s: Accept failed.\n", __FUNCTION__ ); + destroy_workqueue(nes_listener->wq); + + cm_id->rem_ref(cm_id); + kfree(nes_listener); + + return err; +} + From rdreier at cisco.com Thu Oct 26 16:58:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 16:58:41 -0700 Subject: [openib-general] [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> (Glenn Grundstrom's message of "Thu, 26 Oct 2006 18:45:39 -0500") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> Message-ID: > +source "drivers/infiniband/hw/nes/Kconfig" > + > source "drivers/infiniband/hw/cxgb3/Kconfig" This patch seems to be against some non-standard tree, since cxgb3 isn't upstream yet. And if cxgb3 were already upstream, it might be polite to add yourself after it rather than before ;) > +config INFINIBAND_NES_DEBUG > + bool "Verbose debugging output" > + depends on INFINIBAND_NES > + default n > + ---help--- > + This option causes the NetEffect RNIC driver to produce debug > + messages. Select this if you are developing the driver > + or trying to diagnose a problem. I recommend making this option invisible unless EMBEDDED is set, and having the default be 'y', and making your debugging level changeable at run-time. That way everyone (in particular distros) will have this turned on and you'll be able to figure out problems without making end-users rebuild a kernel. > +EXTRA_CFLAGS += -Idrivers/infiniband/include Not needed in the kernel tree. > -Idrivers/infiniband/hw/nes/nes_tcpip/include I guess this is the mysterious TCP stack module. Anyway if you need this in the end, I would suggest removing the C flag and using #include "nes_tcpip/blah.h" in your source. > +ifdef CONFIG_INFINIBAND_NES_DEBUG > +EXTRA_CFLAGS += -DNES_DEBUG > +endif There's no point to this -- just test CONFIG_INFINIBAND_NES_DEBUG directly. > +ifneq ($(KERNELRELEASE),) > + obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o > + > + iw_nes-objs := \ > + nes.o \ > + nes_hw.o \ > + nes_nic.o \ > + nes_cm.o \ > + nes_utils.o \ > + nes_verbs.o > +else This should be your whole Makefile -- we're not going to merge stuff into the kernel tree to build your module out of the kernel tree. Also it's more idiomatic to put all your component objects onto one (or a few) lines. - R. From ggrundstrom at NetEffect.com Thu Oct 26 16:45:39 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 18:45:39 -0500 Subject: [openib-general] [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> The following set of patches contain the source code for the NetEffect NE010 iWarp adapter running under the OpenFabrics Alliance software stack. This is a repost. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/Kconfig new/drivers/infiniband/Kconfig --- old/drivers/infiniband/Kconfig 2006-10-25 09:57:43.000000000 -0500 +++ new/drivers/infiniband/Kconfig 2006-10-25 10:48:40.000000000 -0500 @@ -41,6 +41,8 @@ source "drivers/infiniband/hw/ehca/Kconf source "drivers/infiniband/hw/amso1100/Kconfig" +source "drivers/infiniband/hw/nes/Kconfig" + source "drivers/infiniband/hw/cxgb3/Kconfig" source "drivers/infiniband/ulp/ipoib/Kconfig" diff -ruNp old/drivers/infiniband/hw/nes/Kconfig new/drivers/infiniband/hw/nes/Kconfig --- old/drivers/infiniband/hw/nes/Kconfig 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/Kconfig 2006-10-25 10:50:18.000000000 -0500 @@ -0,0 +1,15 @@ +config INFINIBAND_NES + tristate "NetEffect RNIC support" + depends on PCI && INET && INFINIBAND + ---help--- + This is a low-level driver for NetEffect RDMA enabled + Network Interface Cards (RNIC). + +config INFINIBAND_NES_DEBUG + bool "Verbose debugging output" + depends on INFINIBAND_NES + default n + ---help--- + This option causes the NetEffect RNIC driver to produce debug + messages. Select this if you are developing the driver + or trying to diagnose a problem. diff -ruNp old/drivers/infiniband/hw/nes/Makefile new/drivers/infiniband/hw/nes/Makefile --- old/drivers/infiniband/hw/nes/Makefile 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/Makefile 2006-10-25 11:10:26.000000000 -0500 @@ -0,0 +1,27 @@ +EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/hw/nes/nes_tcpip/include + +ifdef CONFIG_INFINIBAND_NES_DEBUG +EXTRA_CFLAGS += -DNES_DEBUG +endif + +ifneq ($(KERNELRELEASE),) + obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o + + iw_nes-objs := \ + nes.o \ + nes_hw.o \ + nes_nic.o \ + nes_cm.o \ + nes_utils.o \ + nes_verbs.o +else + KERNELDIR ?= /usr/src/linux + PWD := $(shell pwd) + +default: + $(MAKE) -C $(KERNELDIR) M=$(PWD) modules + +clean: + $(MAKE) -C $(KERNELDIR) M=$(PWD) clean + +endif From davem at davemloft.net Thu Oct 26 17:09:45 2006 From: davem at davemloft.net (David Miller) Date: Thu, 26 Oct 2006 17:09:45 -0700 (PDT) Subject: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBB@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBB@venom2> Message-ID: <20061026.170945.116353454.davem@davemloft.net> From: "Glenn Grundstrom" Date: Thu, 26 Oct 2006 19:06:23 -0500 > +#include "nes_tcpip/include/nes_sockets.h" I want to know what in the world this nes_tcpip thing is? From ggrundstrom at NetEffect.com Thu Oct 26 16:54:18 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 18:54:18 -0500 Subject: [openib-general] [PATCH 2/9] NetEffect 10Gb RNIC Driver: main kernel driver c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EB6@venom2> Kernel driver patch 2 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes.c new/drivers/infiniband/hw/nes/nes.c --- old/drivers/infiniband/hw/nes/nes.c 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes.c 2006-10-25 10:15:49.000000000 -0500 @@ -0,0 +1,653 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include "nes.h" + +MODULE_AUTHOR("NetEffect"); +MODULE_DESCRIPTION("NetEffect RNIC Low-level iWARP Driver"); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_VERSION(DRV_VERSION); + +int max_mtu = ETH_DATA_LEN; + + +/* Interoperability */ +int mpa_version = 1; +module_param(mpa_version, int, 0); +MODULE_PARM_DESC(mpa_version, "MPA version to be used int MPA Req/Resp (0 or 1)"); + +/* Interoperability */ +int disable_mpa_crc = 0; +module_param(disable_mpa_crc, int, 0); +MODULE_PARM_DESC(disable_mpa_crc, "Disable checking of MPA CRC"); + + +unsigned int send_first = 0; +module_param(send_first, int, 0); +MODULE_PARM_DESC(send_first, "Send RDMA Message First on Active Connection"); + + +LIST_HEAD(nes_adapter_list); +LIST_HEAD(nes_dev_list); + +static int nes_device_event(struct notifier_block *notifier, unsigned long event, void *ptr); +static int nes_inetaddr_event(struct notifier_block *notifier, unsigned long event, void *ptr); +static void nes_print_macaddr(struct net_device *netdev); +static irqreturn_t nes_interrupt(int, void *, struct pt_regs *); +static int __devinit nes_probe(struct pci_dev *, const struct pci_device_id *); +static int nes_suspend(struct pci_dev *, pm_message_t); +static int nes_resume(struct pci_dev *); +static void __devexit nes_remove(struct pci_dev *); +static int __init nes_init_module(void); +static void __exit nes_exit_module(void); + +extern struct nes_dev *nes_ifs[]; + +// _the_ function interface handle to nes_tcpip module +struct nes_stack_ops *stack_ops_p; + +static struct pci_device_id nes_pci_table[] = { + {PCI_VENDOR_ID_NETEFFECT, PCI_DEVICE_ID_NETEFFECT_NE010, PCI_ANY_ID, PCI_ANY_ID}, + {0} +}; + +MODULE_DEVICE_TABLE(pci, nes_pci_table); + + +static struct notifier_block nes_dev_notifier = { + notifier_call: nes_device_event +}; + +static struct notifier_block nes_inetaddr_notifier = { + notifier_call: nes_inetaddr_event +}; + + +/** + * nes_device_event + * + * @param notifier + * @param event + * @param ptr + * + * @return int + */ +static int nes_device_event(struct notifier_block *notifier, + unsigned long event, void *ptr) +{ + struct net_device *netdev = (struct net_device *)ptr; + struct nes_dev *nesdev; + + dprintk("nes_device_event: notifier %p event=%ld netdev=%p, interface name = %s.\n", + notifier, event, netdev, netdev->name); + + list_for_each_entry(nesdev, &nes_dev_list, list) { + dprintk("Nesdev list entry = 0x%p.\n", nesdev); + if (nesdev->netdev == netdev) { + switch (event) { + case NETDEV_DOWN: + { + nes_ifs[0] = NULL; + dprintk("event:DOWN \n"); + nes_stop_cm(nesdev); + } + break; + case NETDEV_REGISTER: + { + dprintk("event:Register \n"); + nes_ifs[0] = nesdev; + if (nesdev->nes_stack_start == 0) + { + // initialize tcpip stack + if (!nes_register_stack_client(0, &stack_ops_p, nes_update_arp)) + { + // initialize the stack + stack_ops_p->stack_init((void *)nesdev->netdev); + // disable dhcp + stack_ops_p->dhcp_control(0x00); + + nesdev->nes_stack_start = 1; + } + } + } + break; + default: + break; + } + } + } + + return (NOTIFY_DONE); +} + + +/** + * nes_inetaddr_event + * + * @param notifier + * @param event + * @param ptr + * + * @return int + */ +static int nes_inetaddr_event(struct notifier_block *notifier, + unsigned long event, void *ptr) +{ + struct in_ifaddr *ifa = ptr; + struct net_device *netdev = ifa->ifa_dev->dev; + struct nes_dev *nesdev; + unsigned int addr; + unsigned int mask; + int ret; + + dprintk("nes_inetaddr_event: notifier %p event=%ld netdev=%p, interface name = %s.\n", + notifier, event, netdev, netdev->name); + dprintk("nes_inetaddr_event: ifa_address=%08X, ifa_mask=%08X\n", ifa->ifa_address, ifa->ifa_mask); + + addr = ntohl(ifa->ifa_address); + mask = ntohl(ifa->ifa_mask); + dprintk("nes_inetaddr_event: ip address %08X, netmask %08X.\n", addr, mask); + list_for_each_entry(nesdev, &nes_dev_list, list) { + dprintk("Nesdev list entry = 0x%p.\n", nesdev); + if (nesdev->netdev == netdev) { + // we have ifa->ifa_address/mask here if we need it + switch (event) { + case NETDEV_DOWN: + { + dprintk("event:DOWN \n"); + nes_write_indexed(nesdev->index_reg, + NES_IDX_DST_IP_ADDR+(0x10*PCI_FUNC(nesdev->pcidev->devfn)), 0); + // del the IP addr + stack_ops_p->del_ipaddr(addr, mask); + nesdev->local_ipaddr = 0; + stack_ops_p->dump_rt_table(); + } + break; + case NETDEV_UP: + { + dprintk("event:UP \n"); + // Add the address to the IP table + stack_ops_p->add_ipaddr(addr, mask); + nesdev->local_ipaddr = ifa->ifa_address; + + nes_write_indexed(nesdev->index_reg, + NES_IDX_DST_IP_ADDR+(0x10*PCI_FUNC(nesdev->pcidev->devfn)), ntohl(ifa->ifa_address)); + + stack_ops_p->dump_rt_table(); + if (0 == nesdev->of_device_registered) + { + ret = nes_register_device(nesdev); + if (ret) { + printk(KERN_ERR PFX "Unable to register RDMA device, ret = %d\n", ret); + } else { + nesdev->of_device_registered = 1; + } + } + } + break; + default: + break; + } + } + } + return (NOTIFY_DONE); +} + + +/** + * nes_print_macaddr + * + * @param netdev + */ +static void nes_print_macaddr(struct net_device *netdev) +{ + dprintk("%s: MAC %02X:%02X:%02X:%02X:%02X:%02X, " + "IRQ %u\n", netdev->name, + netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2], + netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5], + netdev->irq); +} + + +/** + * nes_add_ref + * + * @param ibqp + */ +void nes_add_ref(struct ib_qp *ibqp) +{ + struct nes_qp *nesqp; + dprintk("%s: Bumping refcount for QP%u.\n", __FUNCTION__, ibqp->qp_num ); + + nesqp = to_nesqp(ibqp); + atomic_inc(&nesqp->refcount); +} + + +/** + * nes_rem_ref + * + * @param ibqp + */ +void nes_rem_ref(struct ib_qp *ibqp) +{ + struct nes_qp *nesqp; + dprintk("%s: Decing refcount for QP%u.\n", __FUNCTION__, ibqp->qp_num ); + + nesqp = to_nesqp(ibqp); + if (atomic_dec_and_test(&nesqp->refcount)) { + dprintk("%s: Refcount for QP%u is 0. Freeing QP structure\n", __FUNCTION__, ibqp->qp_num ); + kfree(nesqp->allocated_buffer); + } +} + + +/** + * nes_get_qp + * + * @param device + * @param qpn + * + * @return struct ib_qp* + */ +struct ib_qp *nes_get_qp(struct ib_device *device, int qpn) +{ + struct nes_dev *nesdev = to_nesdev(device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + + if ((qpn=(NES_FIRST_QPN+nesadapter->max_qp))) + return NULL; + + return (&nesadapter->qp_table[qpn-NES_FIRST_QPN]->ibqp); +} + + +/** + * nes_interrupt - handle interrupts + */ +static irqreturn_t nes_interrupt(int irq, void *dev_id, struct pt_regs *regs) +{ + struct nes_dev *nesdev = (struct nes_dev *) dev_id; + int handled = 0; + + handled = nes_read32(nesdev->regs+NES_INT_PENDING ); + // dprintk("Interrupt Pending value = 0x%08X\n", handled ); + + if (handled) { + tasklet_schedule(&nesdev->dpc_tasklet); + return IRQ_HANDLED; + } else { + return IRQ_NONE; + } +} + + +/** + * nes_probe - + * + * @param pcidev + * @param ent + * + * @return int __devinit + */ +static int __devinit nes_probe(struct pci_dev *pcidev, const struct pci_device_id *ent) +{ + int ret = 0; + unsigned long reg0_start, reg0_flags, reg0_len; + unsigned long reg1_start, reg1_flags, reg1_len; + struct net_device *netdev = NULL; + struct nes_dev *nesdev = NULL; + void __iomem *mmio_regs = NULL; + + assert(pcidev != NULL); + assert(ent != NULL); + + printk(KERN_INFO PFX "%s: NetEffect RNIC driver v%s loading\n", pci_name(pcidev), DRV_VERSION); + + /* Enable PCI device */ + ret = pci_enable_device(pcidev); + if (ret){ + printk(KERN_ERR PFX "%s: Unable to enable PCI device\n", pci_name(pcidev)); + goto bail0; + } + + reg0_start = pci_resource_start(pcidev, BAR_0); + reg0_len = pci_resource_len(pcidev, BAR_0); + reg0_flags = pci_resource_flags(pcidev, BAR_0); + + reg1_start = pci_resource_start(pcidev, BAR_1); + reg1_len = pci_resource_len(pcidev, BAR_1); + reg1_flags = pci_resource_flags(pcidev, BAR_1); + + dprintk("BAR0 (@0x%08lX) size = 0x%lX bytes\n", reg0_start, reg0_len); + dprintk("BAR1 (@0x%08lX) size = 0x%lX bytes\n", reg1_start, reg1_len); + + /* Make sure PCI base addr are MMIO */ + if (!(reg0_flags & IORESOURCE_MEM) || !(reg1_flags & IORESOURCE_MEM)) { + printk(KERN_ERR PFX "PCI regions not an MMIO resource\n"); + ret = -ENODEV; + goto bail1; + } + + /* Reserve PCI I/O and memory resources */ + ret = pci_request_regions(pcidev, DRV_NAME); + if (ret) { + printk(KERN_ERR PFX "%s: Unable to request regions\n", pci_name(pcidev)); + goto bail1; + } + + if ((sizeof(dma_addr_t) > 4)) { + ret = pci_set_dma_mask(pcidev, DMA_64BIT_MASK); + if (ret < 0) { + printk(KERN_ERR PFX "64b DMA configuration failed\n"); + goto bail2; + } + } else { + ret = pci_set_dma_mask(pcidev, DMA_32BIT_MASK); + if (ret < 0){ + printk(KERN_ERR PFX "32b DMA configuration failed\n"); + goto bail2; + } + } + + /* Enables bus-mastering on the device */ + pci_set_master(pcidev); + + /* pci tweaks */ + pci_write_config_word(pcidev, 0x000c, 0xfc10); + pci_write_config_dword(pcidev, 0x0048, 0x00480007); + + /* Allocate hardware structure */ + nesdev = (struct nes_dev *)ib_alloc_device(sizeof *nesdev); + if (!nesdev) { + printk(KERN_ERR PFX "%s: Unable to alloc hardware struct\n", pci_name(pcidev)); + ret = -ENOMEM; + iounmap(mmio_regs); + goto bail2; + } + + memset(nesdev, 0, sizeof(*nesdev)); + spin_lock_init(&nesdev->lock); + nesdev->pcidev = pcidev; + nesdev->cur_tx = 0; + + /* Remap the PCI registers in adapter BAR0 to kernel VA space */ + mmio_regs = ioremap_nocache(reg0_start, sizeof(mmio_regs)); + if (mmio_regs == 0UL){ + printk(KERN_ERR PFX "Unable to remap BAR0\n"); + ret = -EIO; + goto bail35; + } + nesdev->regs = mmio_regs; + nesdev->index_reg = 0x50 + (PCI_FUNC(pcidev->devfn)*8) + mmio_regs; + + // Ensure interrupts are disabled + nes_write32(nesdev->regs+NES_INT_MASK, 0x7fffffff ); + + tasklet_init(&nesdev->dpc_tasklet, nes_dpc, (unsigned long) nesdev); + + /* Request an interrupt line for the driver */ + ret = request_irq(pcidev->irq, nes_interrupt, SA_SHIRQ, DRV_NAME, nesdev); + if (ret) { + printk(KERN_ERR PFX "%s: requested IRQ %u is busy\n", pci_name(pcidev), pcidev->irq); + iounmap(mmio_regs); + goto bail3; + } + + // Init the adapter + nesdev->nesadapter = nes_adapter_init(nesdev, (reg1_len / 4096)); + if (!nesdev->nesadapter) { + printk(KERN_ERR PFX "Unable to initialize adapter.\n" ); + ret = -ENOMEM; + goto bail37; + } + + nesdev->nesadapter->csr_start = reg0_start; + nesdev->nesadapter->doorbell_start = reg1_start; + + if (nes_read_eeprom_values(nesdev)) { + printk(KERN_ERR PFX "Unable to read EEPROM data.\n"); + ret = -ENODEV; + goto bail4; + } + + /* Set driver specific data */ + pci_set_drvdata(pcidev, nesdev); + + /* Initialize network device */ + if ((netdev = nes_netdev_init(nesdev, mmio_regs)) == NULL) { + goto bail4; + } + + /* Register network device */ + ret = register_netdev(netdev); + if (ret) { + printk(KERN_ERR PFX "Unable to register netdev, ret = %d\n", ret); + goto bail5; + } + + /* Disable network packets */ + netif_stop_queue(netdev); + + dprintk("CQP Status = 0x%08X bytes\n", nes_read_indexed(nesdev->index_reg, 0xa0)); + dprintk("PCI Function # = %u.\n", PCI_FUNC(pcidev->devfn) ); + + /* Print out the MAC address */ + nes_print_macaddr(netdev); + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + register_netdevice_notifier(&nes_dev_notifier); + register_inetaddr_notifier(&nes_inetaddr_notifier); + + printk(KERN_ERR PFX "%s: NetEffect RNIC driver successfully loaded.\n", pci_name(pcidev)); + + return 0; + +bail5: + printk(KERN_ERR PFX "bail5\n"); + nes_netdev_exit(nesdev); + free_netdev(netdev); + +bail4: + printk(KERN_ERR PFX "bail4\n"); + /* Deallocate the Adapter Structure */ + nes_adapter_free(nesdev->nesadapter); + +bail37: + printk(KERN_ERR PFX "bail37\n"); + /* Unmap adapter PA space */ + iounmap(mmio_regs); + +bail35: + printk(KERN_ERR PFX "bail3.5\n"); + free_irq(pcidev->irq, nesdev); + +bail3: + printk(KERN_ERR PFX "bail3\n"); + ib_dealloc_device(&nesdev->ibdev); + +bail2: + pci_release_regions(pcidev); + +bail1: + pci_disable_device(pcidev); + +bail0: + return (ret); +} + + +/** + * nes_suspend - power management + */ +static int nes_suspend(struct pci_dev *pcidev, pm_message_t state) +{ + dprintk("pcidev=%p\n", pcidev); + + return (0); +} + + +/** + * nes_resume - power management + */ +static int nes_resume(struct pci_dev *pcidev) +{ + dprintk("pcidev=%p\n", pcidev); + + return (0); +} + + +/** + * nes_remove - unload from kernel + * + * @param pcidev + * + * @return void __devexit + */ +static void __devexit nes_remove(struct pci_dev *pcidev) +{ + struct nes_dev *nesdev = pci_get_drvdata(pcidev); + struct net_device *netdev = nesdev->netdev; + + dprintk("nes_remove called.\n"); + assert(netdev != NULL); + + unregister_inetaddr_notifier(&nes_inetaddr_notifier); + unregister_netdevice_notifier(&nes_dev_notifier); + + /* Clean up the RNIC resources */ + dprintk("nes_remove: calling unregister_netdev.\n"); + /* Remove network device from the kernel */ + unregister_netdev(netdev); + + dprintk("nes_remove: calling nes_netdev_exit.\n"); + /* Free network device */ + nes_netdev_exit(nesdev); + free_netdev(netdev); + + dprintk("nes_remove: calling free_irq.\n"); + /* Free the interrupt line */ + free_irq(pcidev->irq, nesdev); + + /* missing: Turn LEDs off here */ + + dprintk("nes_remove: calling nes_adapter_free(%p).\n", nesdev->nesadapter); + /* Deallocate the Adapter Structure */ + nes_adapter_free(nesdev->nesadapter); + + dprintk("nes_remove: calling iounmap.\n"); + /* Unmap adapter PA space */ + iounmap(nesdev->regs); + + /* Unregister with OpenFabrics */ + if (nesdev->of_device_registered) { + dprintk("nes_remove: calling nes_unregister_device.\n"); + nes_unregister_device(nesdev); + } + + dprintk("nes_remove: calling ib_dealloc_device.\n"); + /* Free the hardware structure */ + ib_dealloc_device(&nesdev->ibdev); + + dprintk("nes_remove: calling pci_release_regions.\n"); + /* Release reserved PCI I/O and memory resources */ + pci_release_regions(pcidev); + + dprintk("nes_remove: calling pci_disable_device.\n"); + /* Disable PCI device */ + pci_disable_device(pcidev); + + dprintk("nes_remove: calling pci_set_drvdata.\n"); + /* Clear driver specific data */ + pci_set_drvdata(pcidev, NULL); +} + + +static struct pci_driver nes_pci_driver = { + .name = DRV_NAME, + .id_table = nes_pci_table, + .probe = nes_probe, + .remove = __devexit_p(nes_remove), +#ifdef CONFIG_PM + .suspend = nes_suspend, + .resume = nes_resume, +#endif +}; + + +/** + * nes_init_module - module initialization entry point + * + * @return int __init + */ +static int __init nes_init_module(void) +{ + return (pci_module_init(&nes_pci_driver)); +} + + +/** + * nes_exit_module - module unload entry point + * + * @return void __exit + */ +static void __exit nes_exit_module(void) +{ + pci_unregister_driver(&nes_pci_driver); +} + + +module_init(nes_init_module); +module_exit(nes_exit_module); + From ggrundstrom at NetEffect.com Thu Oct 26 17:14:19 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:14:19 -0500 Subject: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBD@venom2> It is part of our connection manager and assists with connection setup and teardown only. Glenn. -----Original Message----- From: David Miller [mailto:davem at davemloft.net] Sent: Thursday, October 26, 2006 7:10 PM To: Glenn Grundstrom; Glenn Grundstrom Cc: openib-general at openib.org; netdev at vger.kernel.org Subject: Re: [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files From: "Glenn Grundstrom" Date: Thu, 26 Oct 2006 19:06:23 -0500 > +#include "nes_tcpip/include/nes_sockets.h" I want to know what in the world this nes_tcpip thing is? From davem at davemloft.net Thu Oct 26 17:16:57 2006 From: davem at davemloft.net (David Miller) Date: Thu, 26 Oct 2006 17:16:57 -0700 (PDT) Subject: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBD@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBD@venom2> Message-ID: <20061026.171657.43852974.davem@davemloft.net> From: "Glenn Grundstrom" Date: Thu, 26 Oct 2006 19:14:19 -0500 > It is part of our connection manager and assists with connection setup > and teardown only. I fear this is exactly the kind of stuff that we didn't want to see start going into the kernel, and we've resisted the TCP/IP stack offload stuff in the infiniband layer exactly for this reason. From ggrundstrom at NetEffect.com Thu Oct 26 17:06:23 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:06:23 -0500 Subject: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBB@venom2> Kernel driver patch 4 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_context.h new/drivers/infiniband/hw/nes/nes_context.h --- old/drivers/infiniband/hw/nes/nes_context.h 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_context.h 2006-10-25 10:15:50.000000000 -0500 @@ -0,0 +1,218 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef NES_CONTEXT_H +#define NES_CONTEXT_H + +struct nes_qp_context { + u32 misc; + u32 cqs; + u32 sq_addr_low; + u32 sq_addr_high; + u32 rq_addr_low; + u32 rq_addr_high; + u32 misc2; + u32 tcpPorts; + u32 ip0; + u32 ip1; + u32 ip2; + u32 ip3; + u32 mss; + u32 arp_index_vlan; + u32 tcp_state_flow_label; + u32 pd_index_wscale; + u32 keepalive; + u32 ts_recent; + u32 ts_age; + u32 snd_nxt; + u32 snd_wnd; + u32 rcv_nxt; + u32 rcv_wnd; + u32 snd_max; + u32 snd_una; + u32 srtt; + u32 rttvar; + u32 ssthresh; + u32 cwnd; + u32 snd_wl1; + u32 snd_wl2; + u32 max_snd_wnd; + u32 ts_val_delta; + u32 retransmit; + u32 probe_cnt; + u32 hte_index; + u32 q2_addr_low; + u32 q2_addr_high; + u32 ird_index; + u32 Rsvd3; + u32 ird_ord_sizes; + u32 mrkr_offset; + u32 aeq_token_low; + u32 aeq_token_high; +}; + +/* QP Context Misc Field */ + +#define NES_QPCONTEXT_MISC_IWARP_VER_MASK 0x00000003 +#define NES_QPCONTEXT_MISC_IWARP_VER_SHIFT 0 +#define NES_QPCONTEXT_MISC_EFB_SIZE_MASK 0x000000C0 +#define NES_QPCONTEXT_MISC_EFB_SIZE_SHIFT 6 +#define NES_QPCONTEXT_MISC_RQ_SIZE_MASK 0x00000300 +#define NES_QPCONTEXT_MISC_RQ_SIZE_SHIFT 8 +#define NES_QPCONTEXT_MISC_SQ_SIZE_MASK 0x00000c00 +#define NES_QPCONTEXT_MISC_SQ_SIZE_SHIFT 10 +#define NES_QPCONTEXT_MISC_PCI_FCN_MASK 0x00007000 +#define NES_QPCONTEXT_MISC_PCI_FCN_SHIFT 12 +#define NES_QPCONTEXT_MISC_DUP_ACKS_MASK 0x00070000 +#define NES_QPCONTEXT_MISC_DUP_ACKS_SHIFT 16 + +enum nes_qp_context_misc_bits { + NES_QPCONTEXT_MISC_RX_WQE_SIZE = 0x00000004, + NES_QPCONTEXT_MISC_IPV4 = 0x00000008, + NES_QPCONTEXT_MISC_DO_NOT_FRAG = 0x00000010, + NES_QPCONTEXT_MISC_INSERT_VLAN = 0x00000020, + NES_QPCONTEXT_MISC_DROS = 0x00008000, + NES_QPCONTEXT_MISC_WSCALE = 0x00080000, + NES_QPCONTEXT_MISC_KEEPALIVE = 0x00100000, + NES_QPCONTEXT_MISC_TIMESTAMP = 0x00200000, + NES_QPCONTEXT_MISC_SACK = 0x00400000, + NES_QPCONTEXT_MISC_RDMA_WRITE_EN = 0x00800000, + NES_QPCONTEXT_MISC_RDMA_READ_EN = 0x01000000, + NES_QPCONTEXT_MISC_WBIND_EN = 0x10000000, + NES_QPCONTEXT_MISC_FAST_REGISTER_EN = 0x20000000, + NES_QPCONTEXT_MISC_PRIV_EN = 0x40000000, + NES_QPCONTEXT_MISC_NO_NAGLE = 0x80000000 +}; + +enum nes_qp_acc_wq_sizes { + HCONTEXT_TOE_WQ_SIZE_4 = 0, + HCONTEXT_TOE_WQ_SIZE_32 = 1, + HCONTEXT_TOE_WQ_SIZE_128 = 2, + HCONTEXT_TOE_WQ_SIZE_512 = 3 +}; + +/* QP Context Misc2 Fields */ +#define NES_QPCONTEXT_MISC2_TTL_MASK 0x000000ff +#define NES_QPCONTEXT_MISC2_TTL_SHIFT 0 +#define NES_QPCONTEXT_MISC2_HOP_LIMIT_MASK 0x000000ff +#define NES_QPCONTEXT_MISC2_HOP_LIMIT_SHIFT 0 +#define NES_QPCONTEXT_MISC2_LIMIT_MASK 0x00000300 +#define NES_QPCONTEXT_MISC2_LIMIT_SHIFT 8 +#define NES_QPCONTEXT_MISC2_NIC_INDEX_MASK 0x0000fc00 +#define NES_QPCONTEXT_MISC2_NIC_INDEX_SHIFT 10 +#define NES_QPCONTEXT_MISC2_SRC_IP_MASK 0x001f0000 +#define NES_QPCONTEXT_MISC2_SRC_IP_SHIFT 16 +#define NES_QPCONTEXT_MISC2_TOS_MASK 0xff000000 +#define NES_QPCONTEXT_MISC2_TOS_SHIFT 24 +#define NES_QPCONTEXT_MISC2_TRAFFIC_CLASS_MASK 0xff000000 +#define NES_QPCONTEXT_MISC2_TRAFFIC_CLASS_SHIFT 24 + +/* QP Context Tcp State/Flow Label Fields */ +#define NES_QPCONTEXT_TCPFLOW_FLOW_LABEL_MASK 0x000fffff +#define NES_QPCONTEXT_TCPFLOW_FLOW_LABEL_SHIFT 0 +#define NES_QPCONTEXT_TCPFLOW_TCP_STATE_MASK 0xf0000000 +#define NES_QPCONTEXT_TCPFLOW_TCP_STATE_SHIFT 28 + +enum nes_qp_tcp_state { + NES_QPCONTEXT_TCPSTATE_CLOSED = 1, + NES_QPCONTEXT_TCPSTATE_EST = 5, + NES_QPCONTEXT_TCPSTATE_TIME_WAIT = 11, +}; + +/* QP Context PD Index/wscale Fields */ +#define NES_QPCONTEXT_PDWSCALE_RCV_WSCALE_MASK 0x0000000f +#define NES_QPCONTEXT_PDWSCALE_RCV_WSCALE_SHIFT 0 +#define NES_QPCONTEXT_PDWSCALE_SND_WSCALE_MASK 0x00000f00 +#define NES_QPCONTEXT_PDWSCALE_SND_WSCALE_SHIFT 8 +#define NES_QPCONTEXT_PDWSCALE_PDINDEX_MASK 0xffff0000 +#define NES_QPCONTEXT_PDWSCALE_PDINDEX_SHIFT 16 + +/* QP Context Keepalive Fields */ +#define NES_QPCONTEXT_KEEPALIVE_DELTA_MASK 0x0000ffff +#define NES_QPCONTEXT_KEEPALIVE_DELTA_SHIFT 0 +#define NES_QPCONTEXT_KEEPALIVE_PROBE_CNT_MASK 0x00ff0000 +#define NES_QPCONTEXT_KEEPALIVE_PROBE_CNT_SHIFT 16 +#define NES_QPCONTEXT_KEEPALIVE_INTV_MASK 0xff000000 +#define NES_QPCONTEXT_KEEPALIVE_INTV_SHIFT 24 + +/* QP Context ORD/IRD Fields */ +#define NES_QPCONTEXT_ORDIRD_ORDSIZE_MASK 0x0000007f +#define NES_QPCONTEXT_ORDIRD_ORDSIZE_SHIFT 0 +#define NES_QPCONTEXT_ORDIRD_IRDSIZE_MASK 0x00030000 +#define NES_QPCONTEXT_ORDIRD_IRDSIZE_SHIFT 16 +#define NES_QPCONTEXT_ORDIRD_IWARP_MODE_MASK 0x30000000 +#define NES_QPCONTEXT_ORDIRD_IWARP_MODE_SHIFT 28 + +enum nes_ord_ird_bits { + NES_QPCONTEXT_ORDIRD_WRPDU = 0x02000000, + NES_QPCONTEXT_ORDIRD_LSMM_PRESENT = 0x04000000, + NES_QPCONTEXT_ORDIRD_ALSMM = 0x08000000, + NES_QPCONTEXT_ORDIRD_AAH = 0x40000000, + NES_QPCONTEXT_ORDIRD_RNMC = 0x80000000 +}; + +enum nes_iwarp_qp_state { + NES_QPCONTEXT_IWARP_STATE_NONEXIST = 0, + NES_QPCONTEXT_IWARP_STATE_IDLE = 1, + NES_QPCONTEXT_IWARP_STATE_RTS = 2, + NES_QPCONTEXT_IWARP_STATE_CLOSING = 3, + NES_QPCONTEXT_IWARP_STATE_TERMINATE = 5, + NES_QPCONTEXT_IWARP_STATE_ERROR = 6 +}; + +struct nes_uploaded_qp_context { + u32 qp_state; + u32 keepalive; + u32 ts_recent; + u32 ts_age; + u32 snd_nxt; + u32 snd_wnd; + u32 rcv_nxt; + u32 rcv_wnd; + u32 snd_max; + u32 snd_una; + u32 srtt; + u32 rttvar; + u32 ssthresh; + u32 cwnd; + u32 snd_wl1; + u32 snd_wl2; + u32 max_snd_wnd; + u32 retransmit; + u32 probe_cnt; + u32 rsvd0; + u32 sq_ptrs; + u32 rq_ptrs; + u32 rsvd1; +}; + +#endif /* NES_CONTEXT_H */ diff -ruNp old/drivers/infiniband/hw/nes/nes.h new/drivers/infiniband/hw/nes/nes.h --- old/drivers/infiniband/hw/nes/nes.h 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes.h 2006-10-25 10:15:50.000000000 -0500 @@ -0,0 +1,362 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __NES_H +#define __NES_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#define PHY_10G +#define SA1 + +#include "nes_hw.h" +#include "nes_verbs.h" +#include "nes_context.h" +#include "nes_user.h" +#include "nes_stack.h" + +#define DRV_NAME "iw_nes" +#define DRV_VERSION "0.2" +#define PFX DRV_NAME ": " + +/* + * NetEffect PCI vendor id and NE010 PCI device id. + */ +#ifndef PCI_VENDOR_ID_NETEFFECT /* not in pci.ids yet */ + #define PCI_VENDOR_ID_NETEFFECT 0x1678 + #define PCI_DEVICE_ID_NETEFFECT_NE010 0x0100 +#endif + +#define BAR_0 0 +#define BAR_1 2 + +#define RX_BUF_SIZE (1536 + 8) + +#define NES_REG0_SIZE (4 * 1024) +#define NES_TX_TIMEOUT (6*HZ) +#define NES_FIRST_QPN 40 +#define NES_SEND_FIRST_WRITE 1 +#define NES_SW_CONTEXT_ALIGN 1024 + +#define NES_MAX_SQ_PAYLOAD_SIZE 0x40000000 + + +#ifdef NES_DEBUG + #define assert(expr) \ + if(!(expr)) { \ + printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n", \ + #expr, __FILE__, __FUNCTION__, __LINE__); \ + } + #define dprintk(fmt, args...) do { printk(KERN_ERR PFX fmt, ##args); } while (0) +#else + #define assert(expr) do {} while (0) + #define dprintk(fmt, args...) do {} while (0) +#endif + +extern struct list_head nes_adapter_list; +extern struct list_head nes_dev_list; + +extern int max_mtu; +#define max_frame_len (max_mtu+ETH_HLEN) + +extern struct nes_stack_ops *stack_ops_p; + + +struct nes_dev { + /* MUST BE THE FIRST ENTRY IN THE STRUCTURE */ + struct ib_device ibdev; + struct nes_adapter *nesadapter; +// struct workqueue_struct *reg_wq; + void __iomem *regs; + void __iomem *index_reg; + struct pci_dev *pcidev; + struct net_device *netdev; + unsigned char *apbv_table; + struct tasklet_struct dpc_tasklet; + spinlock_t lock; +// struct work_struct reg_work; + __be32 local_ipaddr; + unsigned int cur_tx; + unsigned int cur_rx; + unsigned int mac_index; + unsigned int nes_stack_start; + unsigned int of_device_registered; + + // Control Structures + struct nes_hw_cqp cqp; + struct nes_hw_cq ccq; + struct nes_hw_nic hnic; + struct nes_hw_nic_cq hnic_cq; + struct nes_hw_nic nesnic; + struct nes_hw_nic_cq nesnic_cq; + u32 cqp_mem_size; + u32 nic_mem_size; + u32 base_pd; + + u32 int_req; + u32 intf_int_req; + + struct list_head list; + + u16 base_doorbell_index; +}; + + +#ifndef readq +static inline u64 readq(const void __iomem * addr) +{ + u64 ret = readl(addr + 4); + ret <<= 32; + ret |= readl(addr); + + return ret; +} +#endif + +#ifndef writeq +static inline void writeq(u64 val, void __iomem * addr) +{ + writel((u32) (val), addr); + writel((u32) (val >> 32), (addr + 4)); +} +#endif + +/* Read from memory-mapped device */ +static inline u32 nes_read_indexed(void __iomem * addr, u32 reg_index) +{ + writel(cpu_to_le32(reg_index), addr); + return le32_to_cpu(readl(addr + 4)); +} + +static inline u32 nes_read32(const void __iomem * addr) +{ + return le32_to_cpu(readl(addr)); +} + +static inline u16 nes_read16(const void __iomem * addr) +{ + return le16_to_cpu(readw(addr)); +} + +static inline u8 nes_read8(const void __iomem * addr) +{ + return readb(addr); +} + +/* Write to memory-mapped device */ +static inline void nes_write_indexed(void __iomem * addr, u32 reg_index, u32 val) +{ +// dprintk("Writing %08X, to indexed offset %08X using address %p and %p.\n", val, reg_index, addr, addr+4 ); + writel(cpu_to_le32(reg_index), addr); + writel(cpu_to_le32(val), addr + 4); +} + +static inline void nes_write32(void __iomem * addr, u32 val) +{ +// dprintk("Writing %08X, to address %p.\n", val, addr ); + writel(cpu_to_le32(val), addr); +} + +static inline void nes_write16(void __iomem * addr, u16 val) +{ + writew(cpu_to_le16(val), addr); +} + +static inline void nes_write8(void __iomem * addr, u8 val) +{ + writeb(val, addr); +} + +static inline u16 nes_read16_eeprom(void __iomem * addr, u16 offset) +{ + writel(cpu_to_le32(NES_EEPROM_READ_REQUEST + (offset >> 1)), addr + NES_EEPROM_COMMAND); + do { + } while ((le32_to_cpu(readl(addr + NES_EEPROM_COMMAND)) & NES_EEPROM_READ_REQUEST)); + return le16_to_cpu(readw(addr + NES_EEPROM_DATA)); +} + +static inline int nes_alloc_resource(struct nes_adapter *nesadapter, + unsigned long *resource_array, u32 max_resources, + u32 *req_resource_num, u32 *next) +{ + unsigned long flags; + u32 resource_num; + + spin_lock_irqsave(&nesadapter->resource_lock, flags); + dprintk("nes_alloc_resource: resource_array=%p, max_resources=%u, req_resource=%u, next=%u\n",resource_array, max_resources, *req_resource_num, *next); + + resource_num = find_next_zero_bit(resource_array, max_resources, *next ); + dprintk("nes_alloc_resource: resource_num=%u\n", resource_num); + if (resource_num >= max_resources) { + resource_num = find_first_zero_bit(resource_array, max_resources ); + if (resource_num >= max_resources) { + printk(KERN_ERR PFX "%s: No available resourcess.\n", __FUNCTION__); + spin_unlock_irqrestore(&nesadapter->resource_lock, flags); + return -EMFILE; + } + } + dprintk("%s: find_next_zero_bit returned = %u (max = %u).\n", __FUNCTION__, resource_num, max_resources); + set_bit(resource_num, resource_array); + *next = resource_num+1; + if (*next == max_resources) { + *next = 0; + } + spin_unlock_irqrestore(&nesadapter->resource_lock, flags); + *req_resource_num = resource_num; + return 0; +} + +static inline void nes_free_resource(struct nes_adapter *nesadapter, + unsigned long *resource_array, u32 resource_num) +{ + unsigned long flags; + + spin_lock_irqsave(&nesadapter->resource_lock, flags); + clear_bit(resource_num, resource_array); + spin_unlock_irqrestore(&nesadapter->resource_lock, flags); +} + +static inline struct nes_dev *to_nesdev(struct ib_device *ibdev) { + return container_of(ibdev, struct nes_dev, ibdev); +} + +static inline struct nes_pd *to_nespd(struct ib_pd *ibpd) { + return container_of(ibpd, struct nes_pd, ibpd); +} + +static inline struct nes_ucontext *to_nesucontext(struct ib_ucontext *ibucontext) { + return container_of(ibucontext, struct nes_ucontext, ibucontext); +} + +static inline struct nes_mr *to_nesmr(struct ib_mr *ibmr) { + return container_of(ibmr, struct nes_mr, ibmr); +} + +static inline struct nes_cq *to_nescq(struct ib_cq *ibcq) { + return container_of(ibcq, struct nes_cq, ibcq); +} + +static inline struct nes_qp *to_nesqp(struct ib_qp *ibqp) { + return container_of(ibqp, struct nes_qp, ibqp); +} + +static inline int is_rnic_addr(struct net_device *netdev, u32 addr) +{ + struct in_device *ind; + int ret = 0; + + ind = in_dev_get(netdev); + if (!ind) + return 0; + + for_ifa(ind) { + if (ifa->ifa_address == addr) { + ret = 1; + break; + } + } + endfor_ifa(ind); + in_dev_put(ind); + return ret; +} + +/* Utils */ +#define CRC32C_POLY 0x1EDC6F41 +#define ORDER 32 +#define REFIN 1 +#define REFOUT 1 +#define NES_HASH_CRC_INITAL_VALUE 0xFFFFFFFF +#define NES_HASH_CRC_FINAL_XOR 0xFFFFFFFF + + +/* nes.c */ +void nes_add_ref(struct ib_qp *); +void nes_rem_ref(struct ib_qp *); +struct ib_qp *nes_get_qp(struct ib_device *, int); + +/* nes_hw.c */ +struct nes_adapter *nes_adapter_init(struct nes_dev *, unsigned long); +int nes_cqp_init(struct nes_dev *); +int nes_phy_init(struct nes_dev *); +int nes_nic_qp_init(struct nes_dev *, struct net_device *); +void nes_dpc(unsigned long); +void nes_process_ceq(struct nes_dev *, struct nes_hw_ceq *); +void nes_process_aeq(struct nes_dev *, struct nes_hw_aeq *); +void nes_process_mac_intr(struct nes_dev *, u32); +void nes_hnic_ce_handler(struct nes_dev *, struct nes_hw_nic_cq *); +void cqp_ce_handler(struct nes_dev *, struct nes_hw_cq *); +void nes_process_iwarp_aeqe(struct nes_dev *, struct nes_hw_aeqe *); +void iwarp_ce_handler(struct nes_dev *, struct nes_hw_cq *); + +/* nes_nic.c */ +struct net_device *nes_netdev_init(struct nes_dev *, void __iomem *); +void nes_netdev_exit(struct nes_dev *); +void nes_adapter_free(struct nes_adapter *); + +/* nes_cm.c */ +int nes_start_cm(struct nes_dev *, struct in_ifaddr *); +int nes_stop_cm(struct nes_dev *); +void nes_update_arp(unsigned char *pMacAddress, u32 u32IpAddress, u32 u32ArpTimeout, u16 u16Entry, u16 type); +void nes_sock_release(struct nes_qp*, unsigned long *); +int nes_connect(struct iw_cm_id *, struct iw_cm_conn_param *); +void nes_disconnect_worker(void *); +int nes_disconnect(struct iw_cm_id *, int); +int nes_accept(struct iw_cm_id *, struct iw_cm_conn_param *); +int nes_reject(struct iw_cm_id *, const void *, u8); +int nes_create_listen(struct iw_cm_id *, int); +int nes_destroy_listen(struct iw_cm_id *); + +/* nes_verbs.c */ +int nes_modify_qp(struct ib_qp *, struct ib_qp_attr *, int); +int nes_register_device(struct nes_dev *); +void nes_unregister_device(struct nes_dev *); + +/* nes_util.c */ +int nes_read_eeprom_values(struct nes_dev *); +void nes_write_10G_phy_reg(void __iomem *, u16, u8, u16); +void nes_read_10G_phy_reg(void __iomem *, u16, u8); +u16 nes_arp_table_update(struct nes_dev *, u32, u32); +u32 nes_crc32(u32, u32, u32, u32, u8 *, u32, u32, u32); + +#endif /* __NES_H */ diff -ruNp old/drivers/infiniband/hw/nes/nes_hw.h new/drivers/infiniband/hw/nes/nes_hw.h --- old/drivers/infiniband/hw/nes/nes_hw.h 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_hw.h 2006-10-25 10:15:50.000000000 -0500 @@ -0,0 +1,839 @@ + /* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __NES_HW_H +#define __NES_HW_H + +enum pci_regs { + NES_INT_STAT = 0x0000, + NES_INT_MASK = 0x0004, + NES_INT_PENDING = 0x0008, + NES_INTF_INT_STAT = 0x000C, + NES_INTF_INT_MASK = 0x0010, + NES_EEPROM_COMMAND = 0x0020, + NES_EEPROM_DATA = 0x0024, + NES_SOFTWARE_RESET = 0x0030, + NES_CQ_ACK = 0x0034, + NES_WQE_ALLOC = 0x0040, + NES_CQE_ALLOC = 0x0044, +}; + +enum indexed_regs { + NES_IDX_CREATE_CQP_LOW = 0x0000, + NES_IDX_CREATE_CQP_HIGH = 0x0004, + NES_IDX_QP_CONTROL = 0x0040, + NES_IDX_FLM_CONTROL = 0x0080, + NES_IDX_INT_CPU_STATUS = 0x00A0, + NES_IDX_GPIO_CONTROL = 0x00F0, + NES_IDX_GPIO_DATA = 0x00F4, + NES_IDX_TCP_CONFIG0 = 0x01E4, + NES_IDX_TCP_TIMER_CONFIG = 0x01EC, + NES_IDX_TCP_NOW = 0x01F0, + NES_IDX_QP_MAX_CFG_SIZES = 0x0200, + NES_IDX_QP_CTX_SIZE = 0x0218, + NES_IDX_TCP_TIMER_SIZE0 = 0x0238, + NES_IDX_TCP_TIMER_SIZE1 = 0x0240, + NES_IDX_CQ_CTX_SIZE = 0x0260, + NES_IDX_MRT_SIZE = 0x0278, + NES_IDX_PBL_REGION_SIZE = 0x0280, + NES_IDX_IRRQ_COUNT = 0x02B0, + NES_IDX_DST_IP_ADDR = 0x0400, + NES_IDX_PCIX_DIAG = 0x08E8, + NES_IDX_MPP_DEBUG = 0x0A00, + NES_IDX_MPP_LB_DEBUG = 0x0B00, + NES_IDX_DENALI_CTL_22 = 0x1058, + NES_IDX_MAC_TX_CONTROL = 0x2000, + NES_IDX_MAC_TX_CONFIG = 0x2004, + NES_IDX_MAC_RX_CONTROL = 0x200C, + NES_IDX_MAC_RX_CONFIG = 0x2010, + NES_IDX_MAC_MDIO_CONTROL = 0x2084, + NES_IDX_MAC_TX_OCTETS_LOW = 0x2100, + NES_IDX_MAC_TX_OCTETS_HIGH = 0x2104, + NES_IDX_MAC_TX_FRAMES_LOW = 0x2108, + NES_IDX_MAC_TX_FRAMES_HIGH = 0x210C, + NES_IDX_MAC_RX_OCTETS_LOW = 0x213C, + NES_IDX_MAC_RX_OCTETS_HIGH = 0x2140, + NES_IDX_MAC_RX_FRAMES_LOW = 0x2144, + NES_IDX_MAC_RX_FRAMES_HIGH = 0x2148, + NES_IDX_MAC_RX_BC_FRAMES_LOW = 0x214C, + NES_IDX_MAC_RX_MC_FRAMES_HIGH = 0x2150, + NES_IDX_MAC_INT_STATUS = 0x21f0, + NES_IDX_MAC_INT_MASK = 0x21f4, + NES_IDX_CM_CONFIG = 0x5100, + NES_IDX_NIC_ACTIVE = 0x6010, + NES_IDX_NIC_UNICAST_ALL = 0x6018, + NES_IDX_NIC_MULTICAST_ALL = 0x6020, + NES_IDX_NIC_MULTICAST_ENABLE = 0x6028, + NES_IDX_NIC_BROADCAST_ON = 0x6030, + NES_IDX_QUAD_HASH_TABLE_SIZE = 0x6148, + NES_IDX_PERFECT_FILTER_LOW = 0x6200, + NES_IDX_PERFECT_FILTER_HIGH = 0x6204, + NES_IDX_DEBUG_ERROR_CONTROL_STATUS = 0x913C, +}; + +enum nes_cqp_opcodes { + NES_CQP_CREATE_QP = 0x00, + NES_CQP_MODIFY_QP = 0x01, + NES_CQP_DESTROY_QP = 0x02, + NES_CQP_CREATE_CQ = 0x03, + NES_CQP_MODIFY_CQ = 0x04, + NES_CQP_DESTROY_CQ = 0x05, + NES_CQP_ALLOCATE_STAG = 0x09, + NES_CQP_REGISTER_STAG = 0x0a, + NES_CQP_QUERY_STAG = 0x0b, + NES_CQP_REGISTER_SHARED_STAG = 0x0c, + NES_CQP_DEALLOCATE_STAG = 0x0d, + NES_CQP_MANAGE_ARP_CACHE = 0x0f, + NES_CQP_SUSPEND_QPS = 0x11, + NES_CQP_UPLOAD_CONTEXT = 0x80000013, + NES_CQP_CREATE_CEQ = 0x16, + NES_CQP_DESTROY_CEQ = 0x18, + NES_CQP_CREATE_AEQ = 0x19, + NES_CQP_DESTROY_AEQ = 0x1b, + NES_CQP_LMI_ACCESS = 0x20, + NES_CQP_FLUSH_WQES = 0x22 +}; + +enum nes_cqp_wqe_word_idx { + NES_CQP_WQE_OPCODE_IDX = 0, + NES_CQP_WQE_ID_IDX = 1, + NES_CQP_WQE_COMP_CTX_LOW_IDX = 2, + NES_CQP_WQE_COMP_CTX_HIGH_IDX = 3, + NES_CQP_WQE_COMP_SCRATCH_LOW_IDX = 4, + NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX = 5, +}; + +enum nes_cqp_cq_wqeword_idx { + NES_CQP_CQ_WQE_PBL_LOW_IDX = 6, + NES_CQP_CQ_WQE_PBL_HIGH_IDX = 7, + NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX = 8, + NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX = 9, + NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX = 10, +}; + +enum nes_cqp_stag_wqeword_idx { + NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX = 1, + NES_CQP_STAG_WQE_LEN_HIGH_PD_IDX = 6, + NES_CQP_STAG_WQE_LEN_LOW_IDX = 7, + NES_CQP_STAG_WQE_STAG_IDX = 8, + NES_CQP_STAG_WQE_VA_LOW_IDX = 10, + NES_CQP_STAG_WQE_VA_HIGH_IDX = 11, + NES_CQP_STAG_WQE_PA_LOW_IDX = 12, + NES_CQP_STAG_WQE_PA_HIGH_IDX = 13, + NES_CQP_STAG_WQE_PBL_LEN_IDX = 14 +}; + +enum nes_cqp_qp_bits { + NES_CQP_QP_ARP_VALID = (1<<8), + NES_CQP_QP_WINBUF_VALID = (1<<9), + NES_CQP_QP_CONTEXT_VALID = (1<<10), + NES_CQP_QP_ORD_VALID = (1<<11), + NES_CQP_QP_VIRT_WQS = (1<<13), + NES_CQP_QP_DEL_HTE = (1<<14), + NES_CQP_QP_CQS_VALID = (1<<15), + NES_CQP_QP_TYPE_TOE = 0, + NES_CQP_QP_TYPE_IWARP = (1<<16), + NES_CQP_QP_TYPE_CQP = (4<<16), + NES_CQP_QP_TYPE_NIC = (5<<16), + NES_CQP_QP_MSS_CHG = (1<<20), + NES_CQP_QP_STATIC_RESOURCES = (1<<21), + NES_CQP_QP_IGNORE_MW_BOUND = (1<<22), + NES_CQP_QP_VWQ_USE_LMI = (1<<23), + NES_CQP_QP_IWARP_STATE_IDLE = (1<<28), + NES_CQP_QP_IWARP_STATE_RTS = (2<<28), + NES_CQP_QP_IWARP_STATE_CLOSING = (3<<28), + NES_CQP_QP_IWARP_STATE_TERMINATE = (5<<28), + NES_CQP_QP_IWARP_STATE_ERROR = (6<<28), + NES_CQP_QP_IWARP_STATE_MASK = (7<<28), + NES_CQP_QP_RESET = (1<<31), +}; + +enum nes_cqp_qp_wqe_word_idx { + NES_CQP_QP_WQE_CONTEXT_LOW_IDX = 6, + NES_CQP_QP_WQE_CONTEXT_HIGH_IDX = 7, + NES_CQP_QP_WQE_NEW_MSS_IDX = 15, +}; + +enum nes_nic_ctx_bits { + NES_NIC_CTX_RQ_SIZE_32 = (3<<8), + NES_NIC_CTX_RQ_SIZE_512 = (3<<8), + NES_NIC_CTX_SQ_SIZE_32 = (1<<10), + NES_NIC_CTX_SQ_SIZE_512 = (3<<10), +}; + +enum nes_nic_qp_ctx_word_idx { + NES_NIC_CTX_MISC_IDX = 0, + NES_NIC_CTX_SQ_LOW_IDX = 2, + NES_NIC_CTX_SQ_HIGH_IDX = 3, + NES_NIC_CTX_RQ_LOW_IDX = 4, + NES_NIC_CTX_RQ_HIGH_IDX = 5, +}; + +enum nes_cqp_cq_bits { + NES_CQP_CQ_CEQE_MASK = (1<<9), + NES_CQP_CQ_CEQ_VALID = (1<<10), + NES_CQP_CQ_RESIZE = (1<<11), + NES_CQP_CQ_CHK_OVERFLOW = (1<<12), + NES_CQP_CQ_4KB_CHUNK = (1<<14), + NES_CQP_CQ_VIRT = (1<<15), +}; + +enum nes_cqp_stag_bits { + NES_CQP_STAG_VA_TO = (1<<9), + NES_CQP_STAG_DEALLOC_PBLS = (1<<10), + NES_CQP_STAG_PBL_BLK_SIZE = (1<<11), + NES_CQP_STAG_MR = (1<<13), + NES_CQP_STAG_RIGHTS_LOCAL_READ = (1<<16), + NES_CQP_STAG_RIGHTS_LOCAL_WRITE = (1<<17), + NES_CQP_STAG_RIGHTS_REMOTE_READ = (1<<18), + NES_CQP_STAG_RIGHTS_REMOTE_WRITE = (1<<19), + NES_CQP_STAG_RIGHTS_WINDOW_BIND = (1<<20), + NES_CQP_STAG_REM_ACC_EN = (1<<21), + NES_CQP_STAG_LEAVE_PENDING = (1<<31), +}; + +enum nes_cqp_ceq_wqeword_idx { + NES_CQP_CEQ_WQE_ELEMENT_COUNT_IDX = 1, + NES_CQP_CEQ_WQE_PBL_LOW_IDX = 6, + NES_CQP_CEQ_WQE_PBL_HIGH_IDX = 7, +}; + +enum nes_cqp_ceq_bits { + NES_CQP_CEQ_4KB_CHUNK = (1<<14), + NES_CQP_CEQ_VIRT = (1<<15), +}; + +enum nes_cqp_aeq_wqeword_idx { + NES_CQP_AEQ_WQE_ELEMENT_COUNT_IDX = 1, + NES_CQP_AEQ_WQE_PBL_LOW_IDX = 6, + NES_CQP_AEQ_WQE_PBL_HIGH_IDX = 7, +}; + +enum nes_cqp_aeq_bits { + NES_CQP_AEQ_4KB_CHUNK = (1<<14), + NES_CQP_AEQ_VIRT = (1<<15), +}; + +enum nes_cqp_lmi_wqeword_idx { + NES_CQP_LMI_WQE_LMI_OFFSET_IDX = 1, + NES_CQP_LMI_WQE_FRAG_LOW_IDX = 8, + NES_CQP_LMI_WQE_FRAG_HIGH_IDX = 9, + NES_CQP_LMI_WQE_FRAG_LEN_IDX = 10, +}; + +enum nes_cqp_arp_wqeword_idx { + NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX = 6, + NES_CQP_ARP_WQE_MAC_HIGH_IDX = 7, + NES_CQP_ARP_WQE_REACHABILITY_MAX_IDX = 1, +}; + +enum nes_cqp_upload_wqeword_idx { + NES_CQP_UPLOAD_WQE_CTXT_LOW_IDX = 6, + NES_CQP_UPLOAD_WQE_CTXT_HIGH_IDX = 7, + NES_CQP_UPLOAD_WQE_HTE_IDX = 8, +}; + +enum nes_cqp_arp_bits { + NES_CQP_ARP_VALID = (1<<8), + NES_CQP_ARP_PERM = (1<<9), +}; + +#define NES_CQP_ARP_AEQ_INDEX_MASK 0x000f0000 +#define NES_CQP_ARP_AEQ_INDEX_SHIFT 16 + +#define NES_ARP_TABLE_SIZE 4096 +#define NES_ARP_INDEX_ADD 1 +#define NES_ARP_INDEX_DELETE 2 +#define NES_ARP_INDEX_RESOLVE 3 + +struct _nes_arp_table { + u32 ip_address; +}; + +enum nes_cqe_opcode_bits { + NES_CQE_STAG_VALID = (1<<6), + NES_CQE_ERROR = (1<<7), + NES_CQE_SQ = (1<<8), + NES_CQE_SE = (1<<9), + NES_CQE_PSH = (1<<29), + NES_CQE_FIN = (1<<30), + NES_CQE_VALID = (1<<31), +}; + +enum nes_cqe_word_idx { + NES_CQE_PAYLOAD_LENGTH_IDX = 0, + NES_CQE_COMP_COMP_CTX_LOW_IDX = 2, + NES_CQE_COMP_COMP_CTX_HIGH_IDX = 3, + NES_CQE_INV_STAG_IDX = 4, + NES_CQE_QP_ID_IDX = 5, + NES_CQE_ERROR_CODE_IDX = 6, + NES_CQE_OPCODE_IDX = 7, +}; + +enum nes_ceqe_word_idx { + NES_CEQE_CQ_CTX_LOW_IDX = 0, + NES_CEQE_CQ_CTX_HIGH_IDX = 1, +}; + +enum nes_ceqe_status_bit { + NES_CEQE_VALID = (1<<31), +}; + +enum nes_int_bits { + NES_INT_CEQ0 = (1<<0), + NES_INT_CEQ1 = (1<<1), + NES_INT_CEQ2 = (1<<2), + NES_INT_CEQ3 = (1<<3), + NES_INT_CEQ4 = (1<<4), + NES_INT_CEQ5 = (1<<5), + NES_INT_CEQ6 = (1<<6), + NES_INT_CEQ7 = (1<<7), + NES_INT_CEQ8 = (1<<8), + NES_INT_CEQ9 = (1<<9), + NES_INT_CEQ10 = (1<<10), + NES_INT_CEQ11 = (1<<11), + NES_INT_CEQ12 = (1<<12), + NES_INT_CEQ13 = (1<<13), + NES_INT_CEQ14 = (1<<14), + NES_INT_CEQ15 = (1<<15), + NES_INT_AEQ0 = (1<<16), + NES_INT_AEQ1 = (1<<17), + NES_INT_AEQ2 = (1<<18), + NES_INT_AEQ3 = (1<<19), + NES_INT_AEQ4 = (1<<20), + NES_INT_AEQ5 = (1<<21), + NES_INT_AEQ6 = (1<<22), + NES_INT_AEQ7 = (1<<23), + NES_INT_MAC0 = (1<<24), + NES_INT_MAC1 = (1<<25), + NES_INT_MAC2 = (1<<26), + NES_INT_MAC3 = (1<<27), + NES_INT_TSW = (1<<28), + NES_INT_TIMER = (1<<29), + NES_INT_INTF = (1<<30), +}; + +enum nes_intf_int_bits { + NES_INTF_INT_PCIERR = (1<<0), + NES_INTF_INT_CRITERR = (1<<14), + NES_INTF_INT_AEQ0_OFLOW = (1<<16), + NES_INTF_INT_AEQ1_OFLOW = (1<<17), + NES_INTF_INT_AEQ2_OFLOW = (1<<18), + NES_INTF_INT_AEQ3_OFLOW = (1<<19), + NES_INTF_INT_AEQ4_OFLOW = (1<<20), + NES_INTF_INT_AEQ5_OFLOW = (1<<21), + NES_INTF_INT_AEQ6_OFLOW = (1<<22), + NES_INTF_INT_AEQ7_OFLOW = (1<<23), + NES_INTF_INT_AEQ_OFLOW = (0xff<<16), +}; + +enum nes_mac_int_bits { + NES_MAC_INT_LINK_STAT_CHG = (1<<1), + NES_MAC_INT_XGMII_EXT = (1<<2), + NES_MAC_INT_TX_UNDERFLOW = (1<<6), + NES_MAC_INT_TX_ERROR = (1<<7), +}; + +enum nes_cqe_allocate_bits { + NES_CQE_ALLOC_INC_SELECT = (1<<28), + NES_CQE_ALLOC_NOTIFY_NEXT = (1<<29), + NES_CQE_ALLOC_NOTIFY_SE = (1<<30), + NES_CQE_ALLOC_RESET = (1<<31), +}; + +enum nes_nic_rq_wqe_word_idx { + NES_NIC_RQ_WQE_LENGTH_1_0_IDX = 0, + NES_NIC_RQ_WQE_LENGTH_3_2_IDX = 1, + NES_NIC_RQ_WQE_FRAG0_LOW_IDX = 2, + NES_NIC_RQ_WQE_FRAG0_HIGH_IDX = 3, + NES_NIC_RQ_WQE_FRAG1_LOW_IDX = 4, + NES_NIC_RQ_WQE_FRAG1_HIGH_IDX = 5, + NES_NIC_RQ_WQE_FRAG2_LOW_IDX = 6, + NES_NIC_RQ_WQE_FRAG2_HIGH_IDX = 7, + NES_NIC_RQ_WQE_FRAG3_LOW_IDX = 8, + NES_NIC_RQ_WQE_FRAG3_HIGH_IDX = 9, +}; + +enum nes_nic_sq_wqe_word_idx { + NES_NIC_SQ_WQE_MISC_IDX = 0, + NES_NIC_SQ_WQE_TOTAL_LENGTH_IDX = 1, + NES_NIC_SQ_WQE_LENGTH_0_TAG_IDX = 3, + NES_NIC_SQ_WQE_LENGTH_2_1_IDX = 4, + NES_NIC_SQ_WQE_LENGTH_4_3_IDX = 5, + NES_NIC_SQ_WQE_FRAG0_LOW_IDX = 6, + NES_NIC_SQ_WQE_FRAG0_HIGH_IDX = 7, + NES_NIC_SQ_WQE_FRAG1_LOW_IDX = 8, + NES_NIC_SQ_WQE_FRAG1_HIGH_IDX = 9, + NES_NIC_SQ_WQE_FRAG2_LOW_IDX = 10, + NES_NIC_SQ_WQE_FRAG2_HIGH_IDX = 11, + NES_NIC_SQ_WQE_FRAG3_LOW_IDX = 12, + NES_NIC_SQ_WQE_FRAG3_HIGH_IDX = 13, + NES_NIC_SQ_WQE_FRAG4_LOW_IDX = 14, + NES_NIC_SQ_WQE_FRAG4_HIGH_IDX = 15, +}; + +enum nes_iwarp_sq_wqe_word_idx { + NES_IWARP_SQ_WQE_MISC_IDX = 0, + NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX = 1, + NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX = 2, + NES_IWARP_SQ_WQE_COMP_CTX_HIGH_IDX = 3, + NES_IWARP_SQ_WQE_COMP_SCRATCH_LOW_IDX = 4, + NES_IWARP_SQ_WQE_COMP_SCRATCH_HIGH_IDX = 5, + NES_IWARP_SQ_WQE_INV_STAG_LOW_IDX = 7, + NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX = 8, + NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX = 9, + NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX = 10, + NES_IWARP_SQ_WQE_RDMA_STAG_IDX = 11, + NES_IWARP_SQ_WQE_FRAG0_LOW_IDX = 16, + NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX = 17, + NES_IWARP_SQ_WQE_LENGTH0_IDX = 18, + NES_IWARP_SQ_WQE_STAG0_IDX = 19, + NES_IWARP_SQ_WQE_FRAG1_LOW_IDX = 20, + NES_IWARP_SQ_WQE_FRAG1_HIGH_IDX = 21, + NES_IWARP_SQ_WQE_LENGTH1_IDX = 22, + NES_IWARP_SQ_WQE_STAG1_IDX = 23, + NES_IWARP_SQ_WQE_FRAG2_LOW_IDX = 24, + NES_IWARP_SQ_WQE_FRAG2_HIGH_IDX = 25, + NES_IWARP_SQ_WQE_LENGTH2_IDX = 26, + NES_IWARP_SQ_WQE_STAG2_IDX = 27, + NES_IWARP_SQ_WQE_FRAG3_LOW_IDX = 28, + NES_IWARP_SQ_WQE_FRAG3_HIGH_IDX = 29, + NES_IWARP_SQ_WQE_LENGTH3_IDX = 30, + NES_IWARP_SQ_WQE_STAG3_IDX = 31, +}; + +enum nes_iwarp_rq_wqe_word_idx { + NES_IWARP_RQ_WQE_TOTAL_PAYLOAD_IDX = 1, + NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX = 2, + NES_IWARP_RQ_WQE_COMP_CTX_HIGH_IDX = 3, + NES_IWARP_RQ_WQE_COMP_SCRATCH_LOW_IDX = 4, + NES_IWARP_RQ_WQE_COMP_SCRATCH_HIGH_IDX = 5, + NES_IWARP_RQ_WQE_FRAG0_LOW_IDX = 8, + NES_IWARP_RQ_WQE_FRAG0_HIGH_IDX = 9, + NES_IWARP_RQ_WQE_LENGTH0_IDX = 10, + NES_IWARP_RQ_WQE_STAG0_IDX = 11, + NES_IWARP_RQ_WQE_FRAG1_LOW_IDX = 12, + NES_IWARP_RQ_WQE_FRAG1_HIGH_IDX = 13, + NES_IWARP_RQ_WQE_LENGTH1_IDX = 14, + NES_IWARP_RQ_WQE_STAG1_IDX = 15, + NES_IWARP_RQ_WQE_FRAG2_LOW_IDX = 16, + NES_IWARP_RQ_WQE_FRAG2_HIGH_IDX = 17, + NES_IWARP_RQ_WQE_LENGTH2_IDX = 18, + NES_IWARP_RQ_WQE_STAG2_IDX = 19, + NES_IWARP_RQ_WQE_FRAG3_LOW_IDX = 20, + NES_IWARP_RQ_WQE_FRAG3_HIGH_IDX = 21, + NES_IWARP_RQ_WQE_LENGTH3_IDX = 22, + NES_IWARP_RQ_WQE_STAG3_IDX = 23, +}; + +enum nes_nic_sq_wqe_bits { + NES_NIC_SQ_WQE_DISABLE_CHKSUM = (1<<30), + NES_NIC_SQ_WQE_COMPLETION = (1<<31), +}; + +enum nes_nic_cqe_word_idx { + NES_NIC_CQE_ACCQP_ID_IDX = 0, + NES_NIC_CQE_TAG_PKT_TYPE_IDX = 2, + NES_NIC_CQE_MISC_IDX = 3, +}; + +enum nes_nic_cqe_bits { + NES_NIC_CQE_SQ = (1<<24), + NES_NIC_CQE_ACCQP_PORT = (1<<28), + NES_NIC_CQE_ACCQP_VALID = (1<<29), + NES_NIC_CQE_TAG_VALID = (1<<30), + NES_NIC_CQE_VALID = (1<<31), +}; + +enum nes_aeqe_word_idx { + NES_AEQE_COMP_CTXT_LOW_IDX = 0, + NES_AEQE_COMP_CTXT_HIGH_IDX = 1, + NES_AEQE_COMP_QP_CQ_ID_IDX = 2, + NES_AEQE_MISC_IDX = 3, +}; + +enum nes_aeqe_bits { + NES_AEQE_QP = (1<<16), + NES_AEQE_CQ = (1<<17), + NES_AEQE_SQ = (1<<18), + NES_AEQE_INBOUND_RDMA = (1<<19), + NES_AEQE_IWARP_STATE_MASK = (7<<20), + NES_AEQE_TCP_STATE_MASK = (0xf<<24), + NES_AEQE_VALID = (1<<31), +}; +#define NES_AEQE_IWARP_STATE_SHIFT 20 +#define NES_AEQE_TCP_STATE_SHIFT 24 + +enum nes_aeqe_aeid { + NES_AEQE_AEID_AMP_UNALLOCATED_STAG = 0x0102, + NES_AEQE_AEID_AMP_INVALID_STAG = 0x0103, + NES_AEQE_AEID_AMP_BAD_QP = 0x0104, + NES_AEQE_AEID_AMP_BAD_PD = 0x0105, + NES_AEQE_AEID_AMP_BAD_STAG_KEY = 0x0106, + NES_AEQE_AEID_AMP_BAD_STAG_INDEX = 0x0107, + NES_AEQE_AEID_AMP_BOUNDS_VIOLATION = 0x0108, + NES_AEQE_AEID_AMP_RIGHTS_VIOLATION = 0x0109, + NES_AEQE_AEID_AMP_TO_WRAP = 0x010a, + NES_AEQE_AEID_AMP_FASTREG_SHARED = 0x010b, + NES_AEQE_AEID_AMP_FASTREG_VALID_STAG = 0x010c, + NES_AEQE_AEID_AMP_FASTREG_MW_STAG = 0x010d, + NES_AEQE_AEID_AMP_FASTREG_INVALID_RIGHTS = 0x010e, + NES_AEQE_AEID_AMP_FASTREG_PBL_TABLE_OVERFLOW = 0x010f, + NES_AEQE_AEID_AMP_FASTREG_INVALID_LENGTH = 0x0110, + NES_AEQE_AEID_AMP_INVALIDATE_SHARED = 0x0111, + NES_AEQE_AEID_AMP_INVALIDATE_NO_REMOTE_ACCESS_RIGHTS = 0x0112, + NES_AEQE_AEID_AMP_INVALIDATE_MR_WITH_BOUND_WINDOWS = 0x0113, + NES_AEQE_AEID_AMP_MWBIND_VALID_STAG = 0x0114, + NES_AEQE_AEID_AMP_MWBIND_OF_MR_STAG = 0x0115, + NES_AEQE_AEID_AMP_MWBIND_TO_ZERO_BASED_STAG = 0x0116, + NES_AEQE_AEID_AMP_MWBIND_TO_MW_STAG = 0x0117, + NES_AEQE_AEID_AMP_MWBIND_INVALID_RIGHTS = 0x0118, + NES_AEQE_AEID_AMP_MWBIND_INVALID_BOUNDS = 0x0119, + NES_AEQE_AEID_AMP_MWBIND_TO_INVALID_PARENT = 0x011a, + NES_AEQE_AEID_AMP_MWBIND_BIND_DISABLED = 0x011b, + NES_AEQE_AEID_BAD_CLOSE = 0x0201, + NES_AEQE_AEID_RDMAP_ROE_BAD_LLP_CLOSE = 0x0202, + NES_AEQE_AEID_CQ_OPERATION_ERROR = 0x0203, + NES_AEQE_AEID_PRIV_OPERATION_DENIED = 0x0204, + NES_AEQE_AEID_RDMA_READ_WHILE_ORD_ZERO = 0x0205, + NES_AEQE_AEID_STAG_ZERO_INVALID = 0x0206, + NES_AEQE_AEID_DDP_INVALID_MSN_GAP_IN_MSN = 0x0301, + NES_AEQE_AEID_DDP_INVALID_MSN_RANGE_IS_NOT_VALID = 0x0302, + NES_AEQE_AEID_DDP_UBE_DDP_MESSAGE_TOO_LONG_FOR_AVAILABLE_BUFFER = 0x0303, + NES_AEQE_AEID_DDP_UBE_INVALID_DDP_VERSION = 0x0304, + NES_AEQE_AEID_DDP_UBE_INVALID_MO = 0x0305, + NES_AEQE_AEID_DDP_UBE_INVALID_MSN_NO_BUFFER_AVAILABLE = 0x0306, + NES_AEQE_AEID_DDP_UBE_INVALID_QN = 0x0307, + NES_AEQE_AEID_DDP_NO_L_BIT = 0x0308, + NES_AEQE_AEID_RDMAP_ROE_INVALID_RDMAP_VERSION = 0x0311, + NES_AEQE_AEID_RDMAP_ROE_UNEXPECTED_OPCODE = 0x0312, + NES_AEQE_AEID_ROE_INVALID_RDMA_READ_REQUEST = 0x0313, + NES_AEQE_AEID_ROE_INVALID_RDMA_WRITE_OR_READ_RESP = 0x0314, + NES_AEQE_AEID_INVALID_ARP_ENTRY = 0x0401, + NES_AEQE_AEID_INVALID_TCP_OPTION_RCVD = 0x0402, + NES_AEQE_AEID_STALE_ARP_ENTRY = 0x0403, + NES_AEQE_AEID_LLP_CLOSE_COMPLETE = 0x0501, + NES_AEQE_AEID_LLP_CONNECTION_RESET = 0x0502, + NES_AEQE_AEID_LLP_FIN_RECEIVED = 0x0503, + NES_AEQE_AEID_LLP_RECEIVED_MARKER_AND_LENGTH_FIELDS_DONT_MATCH = 0x0504, + NES_AEQE_AEID_LLP_RECEIVED_MPA_CRC_ERROR = 0x0505, + NES_AEQE_AEID_LLP_SEGMENT_TOO_LARGE = 0x0506, + NES_AEQE_AEID_LLP_SEGMENT_TOO_SMALL = 0x0507, + NES_AEQE_AEID_LLP_SYN_RECEIVED = 0x0508, + NES_AEQE_AEID_LLP_TERMINATE_RECEIVED = 0x0509, + NES_AEQE_AEID_LLP_TOO_MANY_RETRIES = 0x050a, + NES_AEQE_AEID_LLP_TOO_MANY_KEEPALIVE_RETRIES = 0x050b, + NES_AEQE_AEID_RESET_SENT = 0x0601, + NES_AEQE_AEID_TERMINATE_SENT = 0x0602, + NES_AEQE_AEID_DDP_LCE_LOCAL_CATASTROPHIC = 0x0700 +}; + +enum nes_iwarp_sq_opcodes { + NES_IWARP_SQ_WQE_STREAMING = (1<<23), + NES_IWARP_SQ_WQE_READ_FENCE = (1<<29), + NES_IWARP_SQ_WQE_LOCAL_FENCE = (1<<30), + NES_IWARP_SQ_WQE_SIGNALED_COMPL = (1<<31), +}; + +enum nes_iwarp_sq_wqe_bits { + NES_IWARP_SQ_OP_RDMAW = 0, + NES_IWARP_SQ_OP_RDMAR = 1, + NES_IWARP_SQ_OP_SEND = 3, + NES_IWARP_SQ_OP_SENDINV = 4, + NES_IWARP_SQ_OP_SENDSE = 5, + NES_IWARP_SQ_OP_SENDSEINV = 6, + NES_IWARP_SQ_OP_BIND = 8, + NES_IWARP_SQ_OP_FAST_REG = 9, + NES_IWARP_SQ_OP_LOCINV = 10, + NES_IWARP_SQ_OP_RDMAR_LOCINV = 11, + NES_IWARP_SQ_OP_NOP = 12, +}; + +#define NES_EEPROM_READ_REQUEST (1<<16) +#define NES_MAC_ADDR_VALID (1<<20) + +/* + * NES010 index registers init values. + */ +struct nes_init_values { + u32 index; + u32 data; +}; + +/* + * NES010 registers in BAR0. + */ +struct nes_pci_regs { + u32 int_status; + u32 int_mask; + u32 int_pending; + u32 intf_int_status; + u32 intf_int_mask; + u32 other_regs[59]; //pad out to 256 bytes for now +}; + +#define NES_CQP_SQ_SIZE 32 +#define NES_CCQ_SIZE 32 +#define NES_NIC_WQ_SIZE 512 +#define NES_NIC_CTX_SIZE ((NES_NIC_CTX_RQ_SIZE_512) | (NES_NIC_CTX_SQ_SIZE_512)) + +struct nes_dev; + +struct nes_hw_nic_qp_context { + u32 context_words[6]; +}; + +struct nes_hw_nic_sq_wqe { + u32 wqe_words[16]; +}; + +struct nes_hw_nic_rq_wqe { + u32 wqe_words[16]; +}; + +struct nes_hw_nic_cqe { + u32 cqe_words[4]; +}; + +struct nes_hw_cqp_qp_context { + u32 context_words[4]; +}; + +struct nes_hw_cqp_wqe { + u32 wqe_words[16]; +}; + +struct nes_hw_qp_wqe { + u32 wqe_words[32]; +}; + +struct nes_hw_cqe { + u32 cqe_words[8]; +}; + +struct nes_hw_ceqe { + u32 ceqe_words[2]; +}; + +struct nes_hw_aeqe { + u32 aeqe_words[4]; +}; + +struct nes_hw_cqp { + struct nes_hw_cqp_wqe *sq_vbase; + dma_addr_t sq_pbase; + spinlock_t lock; + wait_queue_head_t waitq; + u16 qp_id; + u16 sq_head; + u16 sq_tail; + u16 sq_size; +}; + +#define NES_FIRST_FRAG_SIZE 64 +struct nes_first_frag { + u8 buffer[NES_FIRST_FRAG_SIZE]; +}; + +struct nes_hw_nic { + struct nes_hw_nic_sq_wqe *sq_vbase; /* virtual address of the PCI memory for sq */ + struct nes_hw_nic_rq_wqe *rq_vbase; /* virtual address of the PCI memory for rq */ + struct nes_first_frag *first_frag_vbase; /* virtual address of the PCI memory for first frags */ + struct sk_buff *tx_skb[NES_NIC_WQ_SIZE]; + struct sk_buff *rx_skb[NES_NIC_WQ_SIZE]; + dma_addr_t sq_pbase; /* PCI memory for host rings */ + dma_addr_t rq_pbase; /* PCI memory for host rings */ + dma_addr_t frag_paddr[NES_NIC_WQ_SIZE]; + + u16 qp_id; + u16 sq_head; + u16 sq_tail; + u16 sq_size; + u16 rq_head; + u16 rq_tail; + u16 rq_size; + + spinlock_t sq_lock; +}; + +struct nes_hw_nic_cq { + struct nes_hw_nic_cqe volatile *cq_vbase; /* PCI memory for host rings */ + void (*ce_handler)(struct nes_dev *nesdev, struct nes_hw_nic_cq *cq); + dma_addr_t cq_pbase; /* PCI memory for host rings */ + u16 cq_head; + u16 cq_size; + u16 cq_number; +}; + +struct nes_hw_qp { + struct nes_hw_qp_wqe *sq_vbase; /* PCI memory for host rings */ + struct nes_hw_qp_wqe *rq_vbase; /* PCI memory for host rings */ + void *q2_vbase; /* PCI memory for host rings */ + dma_addr_t sq_pbase; /* PCI memory for host rings */ + dma_addr_t rq_pbase; /* PCI memory for host rings */ + dma_addr_t q2_pbase; /* PCI memory for host rings */ + u32 qp_id; + u16 sq_head; + u16 sq_tail; + u16 sq_size; + u16 rq_head; + u16 rq_tail; + u16 rq_size; + u8 rq_encoded_size; + u8 sq_encoded_size; +}; + +struct nes_hw_cq { + struct nes_hw_cqe volatile *cq_vbase; /* PCI memory for host rings */ + void (*ce_handler)(struct nes_dev *nesdev, struct nes_hw_cq *cq); + dma_addr_t cq_pbase; /* PCI memory for host rings */ + u16 cq_head; + u16 cq_size; + u16 cq_number; +}; + +struct nes_hw_ceq { + struct nes_hw_ceqe volatile *ceq_vbase; /* PCI memory for host rings */ + dma_addr_t ceq_pbase; /* PCI memory for host rings */ + u16 ceq_head; + u16 ceq_size; +}; + +struct nes_hw_aeq { + struct nes_hw_aeqe volatile *aeq_vbase; /* PCI memory for host rings */ + dma_addr_t aeq_pbase; /* PCI memory for host rings */ + u16 aeq_head; + u16 aeq_size; +}; + +struct nes_adapter { + u64 fw_ver; + struct nes_qp **qp_table; + unsigned long doorbell_start; + unsigned long csr_start; + u32 hw_rev; + u32 device_cap_flags; + u32 vendor_id; + u32 vendor_part_id; + u32 tick_delta; + + /* RNIC Resource Lists */ + unsigned long *allocated_qps; + unsigned long *allocated_cqs; + unsigned long *allocated_mrs; + unsigned long *allocated_pds; + struct list_head active_listeners; + spinlock_t resource_lock; + + /* arp table */ + unsigned long *allocated_arps; + unsigned long arp_table_size; + struct _nes_arp_table arp_table[NES_ARP_TABLE_SIZE]; + u32 arp_index; + + /* Adapter CEQ and AEQs */ + struct nes_hw_ceq ceq[16]; + struct nes_hw_aeq aeq[8]; + + /* RNIC Limits */ + u32 max_mr; + u32 max_256pbl; + u32 max_4kpbl; + u32 free_256pbl; + u32 free_4kpbl; + u32 max_mr_size; + u32 max_qp; + u32 next_qp; + u32 max_irrq; + u32 max_qp_wr; + u32 max_sge; + u32 max_cq; + u32 next_cq; + u32 max_cqe; + u32 max_pd; + u32 base_pd; + u32 next_pd; + u32 hte_index_mask; + + /* Adapter base MAC address */ + u32 mac_addr_low; + u16 mac_addr_high; + + u16 firmware_eeprom_offset; + u16 software_eeprom_offset; + + u16 max_irrq_wr; + + /* PCI information */ + unsigned int devfn; + unsigned char bus_number; + + struct list_head list; + unsigned char ref_count; + + /* the 1G phy index for each port */ + u8 phy_index[4]; +}; + +struct nes_pbl { + u64 *pbl_vbase; + dma_addr_t pbl_pbase; + unsigned long user_base; + u32 pbl_size; + struct list_head list; + /* TODO: need to add list for two level tables */ +}; + +struct nes_listener { + struct work_struct work; + struct workqueue_struct *wq; + struct nes_dev *nesdev; + struct iw_cm_id *cm_id; + struct list_head list; + unsigned long socket; + struct socket *ksock; + u8 accept_failed; +}; + +struct nes_port { + u32 msg_enable; + u32 linkup; + struct nes_dev *nesdev; + struct net_device *netdev; + + spinlock_t tx_lock; + u32 tx_avail; + void *mem; /* PCI memory for host rings */ + struct net_device_stats netstats; +}; + +#endif /* __NES_HW_H */ diff -ruNp old/drivers/infiniband/hw/nes/nes_stack.h new/drivers/infiniband/hw/nes/nes_stack.h --- old/drivers/infiniband/hw/nes/nes_stack.h 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_stack.h 2006-10-25 10:15:51.000000000 -0500 @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef _NES_STACK_H_ +#define _NES_STACK_H_ + +#include +#include +#include "nes_tcpip/include/nes_sockets.h" + +struct nes_stack_ops +{ + int (*stack_init)(void *netdev); + int (*stack_exit)(void *netdev); + void (*set_dev_name)(char *dev_name); + void (*nesif_rx)(void *skb_p); + void (*add_ipaddr)(uint ipaddr, uint mask); + void (*del_ipaddr)(uint ipaddr, uint mask); + void (*add_route)(uint dest_addr, uint gw_addr, uint gw_mask, int flags); +// int (*update_route) (char *if_name); + void (*accelerate_socket)(ulong socket, void *qp_context_in); + void (*decelerate_socket)(ulong socket, void *qp_context_in); + + void (*dhcp_control)(unsigned char dhcp_enable); + void (*get_ip_info)(ulong *addr, u32 *mask); + void (*set_ip_info)(ulong addr, u32 mask); + void (*dump_rt_table)(void); + + // pointer to socket operations ops structure + SOCKET_CALLS *sock_ops_p; + + // this function is set during stack registration, for use by stack + void (*update_arp)(unsigned char *pMacAddress, u32 u32IpAddress, + u32 u32ArpTimeout, u16 u16Entry, u16 type); +}; + + +void set_dev_name(char *dev_name); +void accelerate_socket(ulong socket, void *qp_context); +void decelerate_socket(UINTPTR socket, void *uploaded_qp_context); +void nesif_rx(void *skb); +void get_ip_info(ulong *addr, u32 *mask); +void set_ip_info(ulong addr, u32 mask); +void dhcp_control(unsigned char dhcp_enable); + +// exported function from nes_tcpip module, call to init stack_ops +int nes_register_stack_client(ulong, struct nes_stack_ops **, void (*)); +#endif diff -ruNp old/drivers/infiniband/hw/nes/nes_user.h new/drivers/infiniband/hw/nes/nes_user.h --- old/drivers/infiniband/hw/nes/nes_user.h 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_user.h 2006-10-25 10:15:51.000000000 -0500 @@ -0,0 +1,93 @@ +/* + * Copyright (c) 2006 NetEffect. All rights reserved. + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef NES_USER_H +#define NES_USER_H + +#include + +/* + * Make sure that all structs defined in this file remain laid out so + * that they pack the same way on 32-bit and 64-bit architectures (to + * avoid incompatibility between 32-bit userspace and 64-bit kernels). + * In particular do not use pointer types -- pass pointers in __u64 + * instead. + */ + +struct nes_alloc_ucontext_resp { + __u32 max_pds; /* maximum pds allowed for this user process */ + __u32 max_qps; /* maximum qps allowed for this user process */ + __u32 wq_size; /* defines the size of the WQs (sq+rq) allocated to the mmaped area */ + __u32 reserved; +}; + +struct nes_alloc_pd_resp { + __u32 pd_id; + __u32 mmap_db_index; +}; + +struct nes_create_cq_req { + __u64 user_cq_buffer; +}; + +enum iwnes_memreg_type { + IWNES_MEMREG_TYPE_MEM = 0x0000, + IWNES_MEMREG_TYPE_QP = 0x0001, + IWNES_MEMREG_TYPE_CQ = 0x0002, +}; + +struct nes_mem_reg_req { + __u32 reg_type; /* indicates if id is memory, QP or CQ */ + __u32 reserved; +}; + +struct nes_create_cq_resp { + __u32 cq_id; + __u32 cq_size; + __u32 mmap_db_index; + __u32 reserved; +}; + +struct nes_create_qp_resp { + __u32 qp_id; + __u32 actual_sq_size; + __u32 actual_rq_size; + __u32 mmap_sq_db_index; + __u32 mmap_rq_db_index; + __u32 reserved; +}; + +#endif /* NES_USER_H */ From ggrundstrom at NetEffect.com Thu Oct 26 17:17:21 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:17:21 -0500 Subject: [openib-general] [PATCH 6/9] NetEffect 10Gb RNIC Driver: kernel network interface c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBE@venom2> Kernel driver patch 6 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_nic.c new/drivers/infiniband/hw/nes/nes_nic.c --- old/drivers/infiniband/hw/nes/nes_nic.c 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_nic.c 2006-10-25 10:15:50.000000000 -0500 @@ -0,0 +1,567 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include +#include +#include +#include +#include + +#include "nes.h" + +static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK + | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN; +static int debug = -1; + +static int nes_netdev_open(struct net_device *); +static int nes_netdev_stop(struct net_device *); +static int nes_netdev_start_xmit(struct sk_buff *, struct net_device *); +static struct net_device_stats *nes_netdev_get_stats(struct net_device *); +static void nes_netdev_tx_timeout(struct net_device *); +static int nes_netdev_set_mac_address(struct net_device *, void *); +static int nes_netdev_change_mtu(struct net_device *, int); + + +/** + * nes_netdev_open + * + * @param netdev + * + * @return int + */ +static int nes_netdev_open(struct net_device *netdev) +{ + struct nes_port *nes_port = netdev_priv(netdev); + struct nes_dev *nesdev = nes_port->nesdev; + u32 u32temp; + u32 nic_active_bit; + u32 nic_active; + u16 link_up = 0; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + assert(nesdev != NULL); + + if (netif_msg_ifup(nes_port)) + dprintk(KERN_INFO PFX "%s: enabling interface\n", netdev->name); + + /* clear the MAC interrupt status */ + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_INT_STATUS ); + dprintk("Phy interrupt status = 0x%X.\n", u32temp); + nes_write_indexed(nesdev->index_reg, NES_IDX_MAC_INT_STATUS, u32temp); + + nes_phy_init(nesdev); + + nes_nic_qp_init(nesdev, netdev); + + // Set packet filters + nic_active_bit = 1<pcidev->devfn); + nic_active = nes_read_indexed(nesdev->index_reg, NES_IDX_NIC_ACTIVE); + nic_active |= nic_active_bit; + nic_active |= 2; + nes_write_indexed(nesdev->index_reg, NES_IDX_NIC_ACTIVE, nic_active); + nic_active = nes_read_indexed(nesdev->index_reg, NES_IDX_NIC_MULTICAST_ALL); + nic_active |= nic_active_bit; + nes_write_indexed(nesdev->index_reg, NES_IDX_NIC_MULTICAST_ALL, nic_active); + nic_active = nes_read_indexed(nesdev->index_reg, NES_IDX_NIC_BROADCAST_ON); + nic_active |= nic_active_bit; + nes_write_indexed(nesdev->index_reg, NES_IDX_NIC_BROADCAST_ON, nic_active); + + + nes_write32(nesdev->regs+NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT | + nesdev->hnic_cq.cq_number ); + + // TODO: add proper way to setup packet filters + // TODO: move some of the code from init_netdev? + + if ( link_up ) { + /* Enable network packets */ + nes_port->linkup = 1; + netif_start_queue(netdev); + } else { + nes_port->linkup = 0; + netif_carrier_off(netdev); + } + + nes_write_indexed(nesdev->index_reg, NES_IDX_MAC_INT_MASK, + ~(NES_MAC_INT_LINK_STAT_CHG | NES_MAC_INT_XGMII_EXT | + NES_MAC_INT_TX_UNDERFLOW | NES_MAC_INT_TX_ERROR) ); + + return 0; +} + + +/** + * nes_netdev_stop + * + * @param netdev + * + * @return int + */ +static int nes_netdev_stop(struct net_device *netdev) +{ + struct nes_port *nes_port = netdev_priv(netdev); + struct nes_dev *nesdev = nes_port->nesdev; + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_hw_nic_rq_wqe *nic_rqe; + u64 wqe_frag; + u32 cqp_head; + u32 nic_active_mask; + u32 nic_active; + unsigned long flags; + int ret; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + nes_stop_cm(nesdev); + + if (netif_msg_ifdown(nes_port)) + dprintk(KERN_INFO PFX "%s: disabling interface\n", netdev->name); + + nes_write_indexed(nesdev->index_reg, NES_IDX_MAC_INT_MASK, 0xffffffff ); + + nic_active_mask = ~((u32)(1<pcidev->devfn))); + nic_active_mask = ~((u32)(1 << (PCI_FUNC(nesdev->pcidev->devfn) + 1))); + nes_write_indexed(nesdev->index_reg, + NES_IDX_PERFECT_FILTER_HIGH+((PCI_FUNC(nesdev->pcidev->devfn)+1)*8), 0); + nes_write_indexed(nesdev->index_reg, + NES_IDX_PERFECT_FILTER_HIGH+(PCI_FUNC(nesdev->pcidev->devfn)*8), 0); + nic_active = nes_read_indexed(nesdev->index_reg, NES_IDX_NIC_ACTIVE); + nic_active &= nic_active_mask; + nes_write_indexed(nesdev->index_reg, NES_IDX_NIC_ACTIVE, nic_active); + nic_active = nes_read_indexed(nesdev->index_reg, NES_IDX_NIC_MULTICAST_ALL); + nic_active &= nic_active_mask; + nes_write_indexed(nesdev->index_reg, NES_IDX_NIC_MULTICAST_ALL, nic_active); + nic_active = nes_read_indexed(nesdev->index_reg, NES_IDX_NIC_BROADCAST_ON); + nic_active &= nic_active_mask; + nes_write_indexed(nesdev->index_reg, NES_IDX_NIC_BROADCAST_ON, nic_active); + + + /* Disable network packets */ + netif_stop_queue(netdev); + + // Free remaining NIC receive buffers + while (nesdev->hnic.rq_head != nesdev->hnic.rq_tail) { + nic_rqe = &nesdev->hnic.rq_vbase[nesdev->hnic.rq_tail]; + wqe_frag = nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]; + wqe_frag += ((u64)nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX])<<32; + pci_unmap_single(nesdev->pcidev, (dma_addr_t)wqe_frag, + max_frame_len, PCI_DMA_FROMDEVICE); + dev_kfree_skb(nesdev->hnic.rx_skb[nesdev->hnic.rq_tail++]); + nesdev->hnic.rq_tail &= nesdev->hnic.rq_size - 1; + } + + // Destroy NIC QP + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_QP | NES_CQP_QP_TYPE_NIC); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(nesdev->hnic_cq.cq_number); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + if (++cqp_head >= nesdev->cqp.sq_size) cqp_head = 0; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + + // Destroy NIC CQ + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_CQ | (nesdev->hnic_cq.cq_size<<16)); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(nesdev->hnic_cq.cq_number | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<16)); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + if (++cqp_head >= nesdev->cqp.sq_size) cqp_head = 0; + + nesdev->cqp.sq_head = cqp_head; + barrier(); + + // Ring doorbell (2 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id ); + + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("Waiting for destroy NIC QP to complete.\n"); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + + dprintk("Destroy NIC QP completed, wait_event_timeout ret = %u.\n", ret); + + // Free the NIC memory + pci_free_consistent(nesdev->pcidev, nesdev->nic_mem_size, nesdev->hnic.first_frag_vbase, + nesdev->hnic.frag_paddr[0]); + + return 0; +} + + +/** + * nes_netdev_start_xmit + * + * @param skb + * @param netdev + * + * @return int + */ +static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev) +{ + struct nes_port *nes_port = netdev_priv(netdev); + struct nes_dev *nesdev = nes_port->nesdev; + struct nes_hw_nic *nesnic = &nesdev->hnic; + struct nes_hw_nic_sq_wqe *nic_sqe; + unsigned long flags; + dma_addr_t bus_address; + +// printk(KERN_ERR PFX "%s: Request to transmit a NIC packet length %u (%u frags), first address = %p...\n", +// pci_name(nesdev->pcidev), skb_headlen(skb), skb_shinfo(skb)->nr_frags, skb->data); + local_irq_save(flags); + if (!spin_trylock(&nesnic->sq_lock)) { + local_irq_restore(flags); + return NETDEV_TX_LOCKED; + } + + /* Check if SQ is full */ + if ((((nesnic->sq_tail+(nesnic->sq_size*2))-nesnic->sq_head) & (nesnic->sq_size - 1)) == 1 ) { + netif_stop_queue(netdev); + spin_unlock_irqrestore(&nesnic->sq_lock, flags); + dprintk(KERN_WARNING PFX "%s: HNIC SQ full when queue awake!\n", netdev->name); + return NETDEV_TX_BUSY; + } + + /* Check if too many fragments */ + if (skb_shinfo(skb)->nr_frags) { + /* TODO: if too many fragments copy the data or enable EFBs */ + spin_unlock_irqrestore(&nesnic->sq_lock, flags); + kfree_skb(skb); + dprintk(KERN_WARNING PFX "%s: HNIC TODO: Need support for more 4 fragments! Packet with %u fragments not sent.\n", + netdev->name, skb_shinfo(skb)->nr_frags+1); + return NETDEV_TX_OK; + } + + memcpy(&nesnic->first_frag_vbase[nesnic->sq_head].buffer, + skb->data, min(((unsigned int)NES_FIRST_FRAG_SIZE), (skb_headlen(skb)))); + + nic_sqe = &nesnic->sq_vbase[nesnic->sq_head]; + + nic_sqe->wqe_words[NES_NIC_SQ_WQE_MISC_IDX] = cpu_to_le32(NES_NIC_SQ_WQE_DISABLE_CHKSUM | NES_NIC_SQ_WQE_COMPLETION); + nic_sqe->wqe_words[NES_NIC_SQ_WQE_TOTAL_LENGTH_IDX] = cpu_to_le32(skb_headlen(skb)); + if (skb_headlen(skb)>NES_FIRST_FRAG_SIZE) { + bus_address = pci_map_single(nesdev->pcidev, skb->data+NES_FIRST_FRAG_SIZE, + skb_headlen(skb)-NES_FIRST_FRAG_SIZE, PCI_DMA_TODEVICE); + nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_2_1_IDX] = cpu_to_le32(skb_headlen(skb)-NES_FIRST_FRAG_SIZE); + nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG1_LOW_IDX] = cpu_to_le32((u32)bus_address); + nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG1_HIGH_IDX] = cpu_to_le32((u32)((u64)bus_address>>32)); +// dprintk("Mapping sq fragment 0x%08X%08X, length = %u.\n", +// nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG1_HIGH_IDX], +// nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG1_LOW_IDX], +// nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_2_1_IDX]); + nesnic->tx_skb[nesnic->sq_head] = skb; + } else { + nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_2_1_IDX] = 0; + nesnic->tx_skb[nesnic->sq_head] = 0; + dev_kfree_skb(skb); + } + + nesnic->sq_head++; + nesnic->sq_head &= nesnic->sq_size-1; + + barrier(); + + nes_write32(nesdev->regs+NES_WQE_ALLOC, (1<<24) | (1<<23) | nesdev->hnic.qp_id ); + + netdev->trans_start = jiffies; + nes_port->netstats.tx_packets++; + nes_port->netstats.tx_bytes += skb_headlen(skb); + spin_unlock_irqrestore(&nesnic->sq_lock, flags); + + return NETDEV_TX_OK; +} + + +/** + * nes_netdev_get_stats + * + * @param netdev + * + * @return struct net_device_stats* + */ +static struct net_device_stats *nes_netdev_get_stats(struct net_device *netdev) +{ + struct nes_port *nes_port = netdev_priv(netdev); + + return (&nes_port->netstats); +} + + +/** + * nes_netdev_tx_timeout + * + * @param netdev + */ +static void nes_netdev_tx_timeout(struct net_device *netdev) +{ + struct nes_port *nes_port = netdev_priv(netdev); + + if (netif_msg_timer(nes_port)) + dprintk(KERN_DEBUG PFX "%s: tx timeout\n", netdev->name); +} + + +/** + * nes_netdev_set_mac_address + * + * @param netdev + * @param p + * + * @return int + */ +static int nes_netdev_set_mac_address(struct net_device *netdev, void *p) +{ + return -1; +} + + +/** + * nes_netdev_change_mtu + * + * @param netdev + * @param new_mtu + * + * @return int + */ +static int nes_netdev_change_mtu(struct net_device *netdev, int new_mtu) +{ + int ret = 0; + + if ( (new_mtu < ETH_ZLEN) || (new_mtu > max_mtu) ) + return -EINVAL; + + netdev->mtu = new_mtu; + + if (netif_running(netdev)) { + nes_netdev_stop(netdev); + nes_netdev_open(netdev); + } + + return ret; +} + + +/** + * nes_netdev_init - initialize network device + * + * @param nesdev + * @param mmio_addr + * + * @return struct net_device* + */ +struct net_device *nes_netdev_init(struct nes_dev *nesdev, void __iomem *mmio_addr) +{ + struct nes_port *nes_port = NULL; + struct net_device *netdev = alloc_etherdev(sizeof(*nes_port)); + u32 count=0; + + if (!netdev) { + dprintk(KERN_ERR PFX "nes_port etherdev alloc failed"); + return NULL; + } + + SET_MODULE_OWNER(netdev); + SET_NETDEV_DEV(netdev, &nesdev->pcidev->dev); + + netdev->open = nes_netdev_open; + netdev->stop = nes_netdev_stop; + netdev->hard_start_xmit = nes_netdev_start_xmit; + netdev->get_stats = nes_netdev_get_stats; + netdev->tx_timeout = nes_netdev_tx_timeout; + netdev->set_mac_address = nes_netdev_set_mac_address; + netdev->change_mtu = nes_netdev_change_mtu; + netdev->watchdog_timeo = NES_TX_TIMEOUT; + netdev->irq = nesdev->pcidev->irq; + netdev->mtu = max_mtu; + netdev->hard_header_len = ETH_HLEN; + netdev->addr_len = ETH_ALEN; + netdev->type = ARPHRD_ETHER; + + /* Setup the burned in MAC address */ + netdev->dev_addr[0] = (u8)(nesdev->nesadapter->mac_addr_high>>8); + netdev->dev_addr[1] = (u8)nesdev->nesadapter->mac_addr_high; + netdev->dev_addr[2] = (u8)(nesdev->nesadapter->mac_addr_low>>24); + netdev->dev_addr[3] = (u8)(nesdev->nesadapter->mac_addr_low>>16); + netdev->dev_addr[4] = (u8)(nesdev->nesadapter->mac_addr_low>>8); + netdev->dev_addr[5] = (u8)nesdev->nesadapter->mac_addr_low; + + /* Program the various MAC regs */ + nes_write_indexed(nesdev->index_reg, + NES_IDX_PERFECT_FILTER_LOW+(PCI_FUNC(nesdev->pcidev->devfn)*8), + nesdev->nesadapter->mac_addr_low+PCI_FUNC(nesdev->pcidev->devfn)); + nes_write_indexed(nesdev->index_reg, + NES_IDX_PERFECT_FILTER_HIGH+(PCI_FUNC(nesdev->pcidev->devfn)*8), + (u32)nesdev->nesadapter->mac_addr_high | NES_MAC_ADDR_VALID | + ((((u32)PCI_FUNC(nesdev->pcidev->devfn))<<16))); + nes_write_indexed(nesdev->index_reg, + NES_IDX_PERFECT_FILTER_LOW+((PCI_FUNC(nesdev->pcidev->devfn)+1)*8), + nesdev->nesadapter->mac_addr_low+PCI_FUNC(nesdev->pcidev->devfn)); + nes_write_indexed(nesdev->index_reg, + NES_IDX_PERFECT_FILTER_HIGH+((PCI_FUNC(nesdev->pcidev->devfn)+1)*8), + (u32)nesdev->nesadapter->mac_addr_high | NES_MAC_ADDR_VALID | + ((((u32)PCI_FUNC(nesdev->pcidev->devfn))<<16))); + + /* Fill in the port structure */ + nes_port = netdev_priv(netdev); + nes_port->netdev = netdev; + nes_port->nesdev = nesdev; + nes_port->msg_enable = netif_msg_init(debug, default_msg); + + spin_lock_init(&nes_port->tx_lock); + + nes_cqp_init(nesdev); + + // Arm the CCQ + nes_write32(nesdev->regs+NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT | + PCI_FUNC(nesdev->pcidev->devfn) ); + + // Enable the interrupts + nesdev->int_req = (1<pcidev->devfn)) | + (1<<(PCI_FUNC(nesdev->pcidev->devfn)+16)) | + (1<<(PCI_FUNC(nesdev->pcidev->devfn)+24)); + nesdev->intf_int_req &= ~NES_INTF_INT_CRITERR; + nes_write32(nesdev->regs+NES_INTF_INT_MASK, ~(nesdev->intf_int_req)); + + nes_write32(nesdev->regs+NES_INT_MASK, ~nesdev->int_req); + dprintk("Waiting for create CQP init to complete.\n"); + do { + if (count++ > 1000) + break; + udelay(10); + } while (!(nesdev->cqp.sq_head == nesdev->cqp.sq_tail)); + + nesdev->netdev = netdev; + + list_add_tail(&nesdev->list, &nes_dev_list); + + return netdev; +} + + +/** + * nes_netdev_exit + * + * @param nesdev + */ +void nes_netdev_exit(struct nes_dev *nesdev) +{ + struct nes_hw_cqp_wqe *cqp_wqe; + u32 count=0; + u32 cqp_head; + unsigned long flags; + int ret; + + dprintk("Waiting for CQP work to complete.\n"); + do { + if (count++ > 1000) break; + udelay(10); + } while ( !(nesdev->cqp.sq_head == nesdev->cqp.sq_tail) ); + + // Reset CCQ + nes_write32(nesdev->regs+NES_CQE_ALLOC, NES_CQE_ALLOC_RESET | + nesdev->ccq.cq_number ); + // Disable device interrupts + nes_write32(nesdev->regs+NES_INT_MASK, 0x7fffffff ); + // Destroy the AEQ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_AEQ + (PCI_FUNC(nesdev->pcidev->devfn)<<8)); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + // Destroy the CEQ + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_CEQ + (PCI_FUNC(nesdev->pcidev->devfn)<<8)); + // Destroy the CCQ + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_CQ); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32( PCI_FUNC(nesdev->pcidev->devfn) || (PCI_FUNC(nesdev->pcidev->devfn)<<16)); + // Destroy CQP + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_QP | NES_CQP_QP_TYPE_CQP); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(nesdev->cqp.qp_id); + + barrier(); + // Ring doorbell (4 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x04800000 | nesdev->cqp.qp_id); + + // Wait for the destroy to complete + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("%s:Waiting for DestroyQP.\n",__FUNCTION__); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("%s:Done waiting for DestroyQP. wait_event_timeout ret = %d.\n",__FUNCTION__, ret); + // Free the control structures + pci_free_consistent(nesdev->pcidev, nesdev->cqp_mem_size, nesdev->cqp.sq_vbase, + nesdev->cqp.sq_pbase); + list_del(&nesdev->list); +} + + +/** + * nes_adapter_free - free network adapter + * + * @param nesadapter + */ +void nes_adapter_free(struct nes_adapter *nesadapter) +{ + struct nes_adapter *tmp_adapter; + list_for_each_entry(tmp_adapter, &nes_adapter_list, list) { + dprintk("%s: Nes Adapter list entry = 0x%p.\n", __FUNCTION__, tmp_adapter); + } + + nesadapter->ref_count--; + if (!nesadapter->ref_count) { + dprintk("nes_adapter_free: Deleting adapter from adapter list.\n" ); + list_del(&nesadapter->list); + /* TODO: free the resources from the resource list */ + dprintk("nes_adapter_free: Freeing adapter structure.\n"); + kfree(nesadapter); + } + dprintk("nes_adapter_free: Done.\n"); +} + + From davem at davemloft.net Thu Oct 26 17:04:09 2006 From: davem at davemloft.net (David Miller) Date: Thu, 26 Oct 2006 17:04:09 -0700 (PDT) Subject: [openib-general] [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles In-Reply-To: References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> Message-ID: <20061026.170409.18289073.davem@davemloft.net> From: Roland Dreier Date: Thu, 26 Oct 2006 16:58:41 -0700 > > -Idrivers/infiniband/hw/nes/nes_tcpip/include > > I guess this is the mysterious TCP stack module. What is this thing? From rdreier at cisco.com Thu Oct 26 17:19:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 17:19:17 -0700 Subject: [openib-general] [PATCH 2/9] NetEffect 10Gb RNIC Driver: main kernel driver c file In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EB6@venom2> (Glenn Grundstrom's message of "Thu, 26 Oct 2006 18:54:18 -0500") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EB6@venom2> Message-ID: > +static int nes_device_event(struct notifier_block *notifier, unsigned > long event, void *ptr); > +static int nes_inetaddr_event(struct notifier_block *notifier, unsigned > long event, void *ptr); > +static void nes_print_macaddr(struct net_device *netdev); > +static irqreturn_t nes_interrupt(int, void *, struct pt_regs *); > +static int __devinit nes_probe(struct pci_dev *, const struct > pci_device_id *); > +static int nes_suspend(struct pci_dev *, pm_message_t); > +static int nes_resume(struct pci_dev *); > +static void __devexit nes_remove(struct pci_dev *); > +static int __init nes_init_module(void); > +static void __exit nes_exit_module(void); Some of these declarations are already unneeded (eg at least nes_init_module and nes_exit_module), and it would be good to rearrange your code so that the rest can be removed too. > +// _the_ function interface handle to nes_tcpip module We prefer /* */ style comments > +static struct notifier_block nes_dev_notifier = { > + notifier_call: nes_device_event > +}; Standard C syntax (rather than gcc extension is preferred), like: static struct notifier_block nes_dev_notifier = { .notifier_call = nes_device_event }; > +/** > + * nes_device_event > + * > + * @param notifier > + * @param event > + * @param ptr > + * > + * @return int > + */ There's no point to comments like this. I can read the function declaration just fine, so save the screen real estate unless you have something more to say. > + unsigned long reg0_start, reg0_flags, reg0_len; > + unsigned long reg1_start, reg1_flags, reg1_len; PCI bars are type resource_size_t, which can be bigger than long... > + assert(pcidev != NULL); > + assert(ent != NULL); BUG_ON() is more idiomatic. But this looks kind of useless anyway -- you'll get a nice enough oops if they are NULL. > + /* Enable PCI device */ > + ret = pci_enable_device(pcidev); This isn't major, but comments like this just waste screen space. I mean, someone who can't guess what pci_enable_device() does is probably not going to be helped by the comment either. > + /* pci tweaks */ > + pci_write_config_word(pcidev, 0x000c, 0xfc10); > + pci_write_config_dword(pcidev, 0x0048, 0x00480007); Looks rather magic and fragile. Register 0xc is the cacheline size and latency, right? Why are you tweaking that? And I assume 0x48 is somewhere in a capability structure. It's much better to use pci_find_capability() in that case. That way when the hardware guys tell you they have to rearrange the PCI header in the next rev of the chip, you don't have to touch the chip. However this tweaking probably needs to be justified too. > +/** > + * nes_suspend - power management > + */ > +static int nes_suspend(struct pci_dev *pcidev, pm_message_t state) > +{ > + dprintk("pcidev=%p\n", pcidev); > + > + return (0); > +} > Umm, just don't have suspend/resume methods if you don't support it. > + nes_adapter_free(nesdev->nesadapter); > + > + dprintk("nes_remove: calling iounmap.\n"); > + /* Unmap adapter PA space */ > + iounmap(nesdev->regs); > + > + /* Unregister with OpenFabrics */ > + if (nesdev->of_device_registered) { > + dprintk("nes_remove: calling nes_unregister_device.\n"); > + nes_unregister_device(nesdev); > + } You can still have upper layers calling into you until ib_unregister_device() returns, so it looks bogus to do things like iounmap before then. I think your cleanup needs to be reordered. And I don't think you're unregistering with OpenFabrics -- you're just unregistering with the RDMA midlayer. > + return (pci_module_init(&nes_pci_driver)); Just use pci_register_driver(). - R. From rdreier at cisco.com Thu Oct 26 17:21:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 17:21:08 -0700 Subject: [openib-general] [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles In-Reply-To: <20061026.170409.18289073.davem@davemloft.net> (David Miller's message of "Thu, 26 Oct 2006 17:04:09 -0700 (PDT)") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> <20061026.170409.18289073.davem@davemloft.net> Message-ID: David> What is this thing? Good question. I haven't gotten a straight answer yet, which is why I called it "mysterious". - R. From ggrundstrom at NetEffect.com Thu Oct 26 17:25:46 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:25:46 -0500 Subject: [openib-general] [PATCH 7/9] NetEffect 10Gb RNIC Driver: utility routines c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EC1@venom2> Kernel driver patch 7 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_utils.c new/drivers/infiniband/hw/nes/nes_utils.c --- old/drivers/infiniband/hw/nes/nes_utils.c 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_utils.c 2006-10-25 10:15:51.000000000 -0500 @@ -0,0 +1,488 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include +#include +#include "nes.h" +#include "nes_verbs.h" + +#define BITMASK(X) (1L << (X)) +#define NES_CRC_WID 32 + +static u32 nesCRCTable[256]; +static u32 nesCRCInitialized = 0; + +static u32 nesCRCWidMask(u32); +static u32 nes_crc_table_gen(u32 *, u32, u32, u32); +static u32 reflect(u32, u32); +static u32 byte_swap(u32, u32); + + +/** + * nes_read_eeprom_values - + * + * @param nesdev + * + * @return int + */ +int nes_read_eeprom_values(struct nes_dev *nesdev) +{ + struct nes_adapter *nesadapter = nesdev->nesadapter; + u32 mac_addr_low; + u16 mac_addr_high; + u16 eeprom_data; + u16 eeprom_offset; + + if (0 == nesadapter->firmware_eeprom_offset) { + /* Read the EEPROM Parameters */ + eeprom_data = nes_read16_eeprom(nesdev->regs, 0); + dprintk("EEPROM Offset 0 = 0x%04X\n", eeprom_data); + eeprom_offset = 2 + (((eeprom_data & 0x007f)<<3)<<((eeprom_data & 0x0080)>>7)); + dprintk("Firmware Offset = 0x%04X\n", eeprom_offset); + nesadapter->firmware_eeprom_offset = eeprom_offset; + eeprom_data = nes_read16_eeprom(nesdev->regs, eeprom_offset+4); + if (eeprom_data != 0x5746) { + dprintk("Not a valid Firmware Image = 0x%04X\n", eeprom_data); + return -1; + } + + eeprom_data = nes_read16_eeprom(nesdev->regs, eeprom_offset+2); + dprintk("EEPROM Offset %u = 0x%04X\n", eeprom_offset+2, eeprom_data); + eeprom_offset += ((eeprom_data & 0x00ff)<<3)<<((eeprom_data & 0x0100)>>8); + dprintk("Software Offset = 0x%04X\n", eeprom_offset); + nesadapter->software_eeprom_offset = eeprom_offset; + eeprom_data = nes_read16_eeprom(nesdev->regs, eeprom_offset); + dprintk("EEPROM Offset %u = 0x%04X\n", eeprom_offset, eeprom_data); + eeprom_data = nes_read16_eeprom(nesdev->regs, eeprom_offset+4); + if (eeprom_data != 0x5753) { + dprintk("Not a valid Software Image = 0x%04X\n", eeprom_data); + return -1; + } + + eeprom_offset = nesadapter->software_eeprom_offset; + eeprom_offset += 10; + mac_addr_high = nes_read16_eeprom(nesdev->regs, eeprom_offset); + eeprom_offset += 2; + mac_addr_low = (u32)nes_read16_eeprom(nesdev->regs, eeprom_offset); + eeprom_offset += 2; + mac_addr_low <<= 16; + mac_addr_low += (u32)nes_read16_eeprom(nesdev->regs, eeprom_offset); + dprintk("MAC Address = 0x%04X%08X\n", mac_addr_high, mac_addr_low); + + nesadapter->mac_addr_low = mac_addr_low; + nesadapter->mac_addr_high = mac_addr_high; + } + + /* TODO: Get this from EEPROM */ + nesdev->nesadapter->phy_index[0] = 6; + + /* TODO: get this from EEPROM and stop setting in init loop */ + nesdev->base_doorbell_index = 1; + + return 0; +} + + + + +/** + * nes_write_10G_phy_reg + * + * @param addr + * @param phy_reg + * @param phy_addr + * @param data + */ +void nes_write_10G_phy_reg(void __iomem *addr, u16 phy_reg, u8 phy_addr, u16 data) +{ + u32 dev_addr; + u32 port_addr; + u32 u32temp; + u32 counter; + + dev_addr = 5; + port_addr = 0; + + // set address + nes_write_indexed(addr, NES_IDX_MAC_MDIO_CONTROL, 0x00020000 | phy_reg | (dev_addr<<18) | (port_addr<<23)); + for (counter=0; counter<100 ; counter++ ) { + udelay(30); + u32temp = nes_read_indexed(addr, NES_IDX_MAC_INT_STATUS); + if (u32temp & 1) { +// dprintk("Address phase; Phy interrupt status = 0x%X.\n", u32temp); + nes_write_indexed(addr, NES_IDX_MAC_INT_STATUS, 1); + break; + } + } + if (!(u32temp & 1)) dprintk("Phy is not responding. interrupt status = 0x%X.\n", u32temp); + + // set data + nes_write_indexed(addr, NES_IDX_MAC_MDIO_CONTROL, 0x10020000 | data | (dev_addr<<18) | (port_addr<<23) ); + for (counter=0; counter<100 ; counter++ ) { + udelay(30); + u32temp = nes_read_indexed(addr, NES_IDX_MAC_INT_STATUS); + if (u32temp & 1) { +// dprintk("Write phase; Phy interrupt status = 0x%X.\n", u32temp); + nes_write_indexed(addr, NES_IDX_MAC_INT_STATUS, 1); + break; + } + } + if (!(u32temp & 1)) + dprintk("Phy is not responding. interrupt status = 0x%X.\n", u32temp); +} + + +/** + * nes_read_1G_phy_reg + * This routine only issues the read, the data must be read + * separately. + * + * @param addr + * @param phy_reg + * @param phy_addr + */ +void nes_read_10G_phy_reg(void __iomem *addr, u16 phy_reg, u8 phy_addr) +{ + u32 dev_addr; + u32 port_addr; + u32 u32temp; + u32 counter; + + dev_addr = 5; + port_addr = 0; + + // set address + nes_write_indexed(addr, NES_IDX_MAC_MDIO_CONTROL, 0x00020000 | phy_reg | (dev_addr<<18) | (port_addr<<23)); + for (counter=0; counter<100 ; counter++ ) { + udelay(30); + u32temp = nes_read_indexed(addr, NES_IDX_MAC_INT_STATUS); + if (u32temp & 1) { +// dprintk("Address phase; Phy interrupt status = 0x%X.\n", u32temp); + nes_write_indexed(addr, NES_IDX_MAC_INT_STATUS, 1); + break; + } + } + if (!(u32temp & 1)) dprintk("Phy is not responding. interrupt status = 0x%X.\n", u32temp); + + // issue read + nes_write_indexed(addr, NES_IDX_MAC_MDIO_CONTROL, 0x30020000 | (dev_addr<<18) | (port_addr<<23)); + for (counter=0; counter<100 ; counter++ ) { + udelay(30); + u32temp = nes_read_indexed(addr, NES_IDX_MAC_INT_STATUS); + if (u32temp & 1) { +// dprintk("Read phase; Phy interrupt status = 0x%X.\n", u32temp); + nes_write_indexed(addr, NES_IDX_MAC_INT_STATUS, 1); + break; + } + } + if (!(u32temp & 1)) + dprintk("Phy is not responding. interrupt status = 0x%X.\n", u32temp); +} + + +/** + * nes_arp_table_update + * + * @param nesdev + * @param ip_address + * @param action + * + * @return u16 + */ +u16 nes_arp_table_update(struct nes_dev *nesdev, u32 ip_address, u32 action) +{ + struct nes_adapter *nesadapter = nesdev->nesadapter; + u32 arp_index = 0; + int err=0; + + // dprintk("%s: ip_address=%08X, action=%d\n", __FUNCTION__, ip_address, action); + + if (action == NES_ARP_INDEX_ADD) { + err = nes_alloc_resource(nesadapter, nesadapter->allocated_arps, nesadapter->arp_table_size, &arp_index, + &nesadapter->arp_index); + if (err) { + dprintk("%s: nes_alloc_resource returned error = %u\n", __FUNCTION__, err); + return ((u16)((u32)ERR_PTR(err))); + } + dprintk("nes_arp_table_update: ADD, ip_address=%08X, arp_index=%d\n", ip_address, arp_index); + + nesadapter->arp_table[arp_index].ip_address = ip_address; + return ((u16)arp_index); + } + + // DELETE or RESOLVE + for (arp_index = 0; arp_index < nesadapter->arp_table_size; arp_index++) { + if (nesadapter->arp_table[arp_index].ip_address == ip_address) + break; + } + + if (arp_index == nesadapter->arp_table_size) + return (nesadapter->arp_table_size); // did not find ip address; return something else? + + if (action == NES_ARP_INDEX_DELETE) { + dprintk("nes_arp_table_update: DELETE, ip_address=%08X, arp_index=%d\n", ip_address, arp_index); + nesadapter->arp_table[arp_index].ip_address = 0; + nes_free_resource(nesadapter, nesadapter->allocated_arps, arp_index); + return (0); + } + + if (action == NES_ARP_INDEX_RESOLVE) { + dprintk("nes_arp_table_update: RESOLVE, ip_address=%08X, arp_index=%d\n", ip_address, arp_index); + return ((u16)arp_index); + } + + return (err); +} + + +/* +"Everything you wanted to know about CRC algorithms, but were afraid to ask + for fear that errors in your understanding might be detected." Version : 3. +Date : 19 August 1993. +Author : Ross N. Williams. +Net : ross at guest.adelaide.edu.au. +FTP : ftp.adelaide.edu.au/pub/rocksoft/crc_v3.txt +Company : Rocksoft(tm) Pty Ltd. +Snail : 16 Lerwick Avenue, Hazelwood Park 5066, Australia. +Fax : +61 8 373-4911 (c/- Internode Systems Pty Ltd). +Phone : +61 8 379-9217 (10am to 10pm Adelaide Australia time). +Note : "Rocksoft" is a trademark of Rocksoft Pty Ltd, Australia. +Status : Copyright (C) Ross Williams, 1993. However, permission is granted to + make and distribute verbatim copies of this document provided that this information + block and copyright notice is included. Also, the C code modules included in this + document are fully public domain. + +Thanks : Thanks to Jean-loup Gailly (jloup at chorus.fr) and Mark Adler + (me at quest.jpl.nasa.gov) who both proof read this document and picked + out lots of nits as well as some big fat bugs. + +The current web page for this seems to be http://www.ross.net/crc/crcpaper.html. + +*/ + +/********************************************************************** ******/ +/* Generate width mask */ +/********************************************************************** ******/ +/* */ +/* Returns a longword whose value is (2^p_cm->cm_width)-1. */ +/* The trick is to do this portably (e.g. without doing <<32). */ +/* */ +/* Author: Tristan Gross */ +/* Source: "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" */ +/* Ross N. Williams */ +/* http://www.rocksoft.com */ +/* */ +/********************************************************************** ******/ + +static u32 nesCRCWidMask (u32 width) +{ + return (((1L<<(((u32)width)-1))-1L)<<1)|1L; +} + + +/********************************************************************** ******/ +/* Generate CRC table */ +/********************************************************************** ******/ +/* */ +/* Source: "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" */ +/* Ross N. Williams */ +/* http://www.rocksoft.com */ +/* */ +/********************************************************************** ******/ +static u32 nes_crc_table_gen ( u32 *pCRCTable, + u32 poly, + u32 order, + u32 reflectIn) +{ + u32 i; + u32 reg; + u32 byte; + u32 topbit = BITMASK(NES_CRC_WID-1); + u32 tmp; + + for (byte=0;byte<256;byte++) { + + // If we need to creat a reflected table we must reflect the index (byte) and + // reflect the final reg + tmp = (reflectIn) ? reflect(byte,8): byte; + + reg = tmp << (NES_CRC_WID-8); + + for (i=0; i<8; i++) { + if (reg & topbit) { + reg = (reg << 1) ^ poly; + } + else { + reg <<= 1; + } + } + + reg = (reflectIn) ? reflect(reg,order): reg; + pCRCTable[byte] = reg & nesCRCWidMask(NES_CRC_WID); + } + + return(0); +} + + +/********************************************************************** ******/ +/* Perform 32 bit based CRC calculation */ +/********************************************************************** ******/ +/* */ +/* Source: "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" */ +/* Ross N. Williams */ +/* http://www.rocksoft.com */ +/* */ +/* This performs a standard 32 bit crc on an array of arbitrary length */ +/* with an arbitrary initial value and passed generator polynomial */ +/* in the form of a crc table. */ +/* */ +/********************************************************************** ******/ +static u32 reflect (u32 data, u32 num) +{ + // Reflects the lower num bits in 'data' around their center point. + u32 i; + u32 j = 1; + u32 result = 0; + + for (i=(u32)1<<(num-1); i; i>>=1) { + if (data & i) result|=j; + j <<= 1; + } + return (result); +} + + +/** + * byte_swap + * + * @param data + * @param num + * + * @return u32 + */ +static u32 byte_swap (u32 data, u32 num) +{ + u32 i; + u32 result = 0; + + if (num%16) { + dprintk("\nbyte_swap: ERROR: num is not an even number of bytes\n"); +// ASSERT(0); + } + + for(i=0;i> i)) << (num-8-i); + } + + return (result); +} + + +/** + * nes_crc32 - + * This is a reflected table algorithm. ReflectIn basically means to reflect each incomming byte of + * the data. But to make things more complicated, we can instead reflect the initial value, + * the final crc, and shift data to the right using a reflected pCRCTable. CRC is FUN!! + * + * @param reverse + * @param initialValue + * @param finalXOR + * @param messageLength + * @param pMessage + * @param order + * @param reflectIn + * @param reflectOut + * + * @return u32 + */ +u32 nes_crc32 ( u32 reverse, + u32 initialValue, + u32 finalXOR, + u32 messageLength, + u8 *pMessage, + u32 order, + u32 reflectIn, + u32 reflectOut) + +{ + u8 *pBlockAddr = pMessage; + u32 mlen = messageLength; + u32 crc; + + if (0 == nesCRCInitialized) { + nes_crc_table_gen( &nesCRCTable[0], CRC32C_POLY, ORDER, REFIN ); + nesCRCInitialized = 1; + } + + crc = (reflectIn) ? reflect(initialValue,order): initialValue; + + while (mlen--) { + //printf("byte = %x, index = %u, crctable[index] = %x\n", *pBlockAddr, (crc & 0xffL) ^ *pBlockAddr, nesCRCTable[(crc & 0xffL) ^ *pBlockAddr]); + if (reflectIn) { + crc = nesCRCTable[(crc & 0xffL ) ^ *pBlockAddr++] ^ (crc >> 8); + } + else { + crc = nesCRCTable[((crc>>24) ^ *pBlockAddr++) & 0xFFL] ^ (crc << 8); + } + } + + // if reflectOut and reflectIn are both set, we don't + // do anything since reflecting twice effectively does nothing. + crc = ((reflectIn)^(reflectOut)) ? reflect(crc,order): crc; + + crc = crc^finalXOR; + + // We don't really use this, but it is here for completeness + crc = (reverse) ? byte_swap(crc,32): crc; + + return crc; +} + From ggrundstrom at NetEffect.com Thu Oct 26 17:30:43 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:30:43 -0500 Subject: [openib-general] [PATCH 8/9] NetEffect 10Gb RNIC Driver: openfabrics verbs interface c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EC3@venom2> Kernel driver patch 8 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_verbs.c new/drivers/infiniband/hw/nes/nes_verbs.c --- old/drivers/infiniband/hw/nes/nes_verbs.c 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_verbs.c 2006-10-25 10:15:51.000000000 -0500 @@ -0,0 +1,2714 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include +#include +#include + +#include +#include +#include + +#include "nes.h" + +extern int disable_mpa_crc; + + +/** + * nes_query_device + * + * @param ibdev + * @param props + * + * @return int + */ +static int nes_query_device(struct ib_device *ibdev, struct ib_device_attr *props) +{ + struct nes_dev *nesdev = to_nesdev(ibdev); + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + memset(props, 0, sizeof(*props)); + memcpy(&props->sys_image_guid, nesdev->netdev->dev_addr, 6); + + props->fw_ver = nesdev->nesadapter->fw_ver; + props->device_cap_flags = nesdev->nesadapter->device_cap_flags; + props->vendor_id = nesdev->nesadapter->vendor_id; + props->vendor_part_id = nesdev->nesadapter->vendor_part_id; + props->hw_ver = nesdev->nesadapter->hw_rev; + props->max_mr_size = 0x80000000; + props->max_qp = nesdev->nesadapter->max_qp-NES_FIRST_QPN; + props->max_qp_wr = nesdev->nesadapter->max_qp_wr - 2; + props->max_sge = nesdev->nesadapter->max_sge; + props->max_cq = nesdev->nesadapter->max_cq-NES_FIRST_QPN; + props->max_cqe = nesdev->nesadapter->max_cqe - 1; + props->max_mr = nesdev->nesadapter->max_mr; + props->max_mw = nesdev->nesadapter->max_mr; + props->max_pd = nesdev->nesadapter->max_pd; + props->max_sge_rd = 1; + switch (nesdev->nesadapter->max_irrq_wr) { + case 0: + props->max_qp_rd_atom = 1; + break; + case 1: + props->max_qp_rd_atom = 4; + break; + case 2: + props->max_qp_rd_atom = 16; + break; + case 3: + props->max_qp_rd_atom = 32; + break; + default: + props->max_qp_rd_atom = 0; + } + props->max_qp_init_rd_atom = props->max_qp_wr; + props->atomic_cap = IB_ATOMIC_NONE; + + return 0; +} + + +/** + * nes_query_port + * + * @param ibdev + * @param port + * @param props + * + * @return int + */ +static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props) +{ +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + memset(props, 0, sizeof(*props)); + + props->max_mtu = IB_MTU_2048; + props->lid = 1; + props->lmc = 0; + props->sm_lid = 0; + props->sm_sl = 0; + props->state = IB_PORT_ACTIVE; + props->phys_state = 0; + props->port_cap_flags = + IB_PORT_CM_SUP | + IB_PORT_REINIT_SUP | + IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP; + props->gid_tbl_len = 1; + props->pkey_tbl_len = 1; + props->qkey_viol_cntr = 0; + props->active_width = 1; + props->active_speed = 1; + props->max_msg_sz = 0x10000000; + + return 0; +} + + +/** + * nes_modify_port + * + * @param ibdev + * @param port + * @param port_modify_mask + * @param props + * + * @return int + */ +static int nes_modify_port(struct ib_device *ibdev, u8 port, + int port_modify_mask, struct ib_port_modify *props) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return 0; +} + + +/** + * nes_query_pkey + * + * @param ibdev + * @param port + * @param index + * @param pkey + * + * @return int + */ +static int nes_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 * pkey) +{ +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + *pkey = 0; + return 0; +} + + +/** + * nes_query_gid + * + * @param ibdev + * @param port + * @param index + * @param gid + * + * @return int + */ +static int nes_query_gid(struct ib_device *ibdev, u8 port, + int index, union ib_gid *gid) +{ + struct nes_dev *nesdev = to_nesdev(ibdev); + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + memset(&(gid->raw[0]), 0, sizeof(gid->raw)); + memcpy(&(gid->raw[0]), nesdev->netdev->dev_addr, 6); + + return 0; +} + + +/** + * nes_alloc_ucontext - Allocate the user context data structure. This keeps track + * of all objects associated with a particular user-mode client. + * + * @param ibdev + * @param udata + * + * @return struct ib_ucontext* + */ +static struct ib_ucontext *nes_alloc_ucontext(struct ib_device *ibdev, + struct ib_udata *udata) { + struct nes_dev *nesdev = to_nesdev(ibdev); + struct nes_alloc_ucontext_resp uresp; + struct nes_ucontext *nes_ucontext; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + memset(&uresp, 0, sizeof uresp); + + uresp.max_qps = nesdev->nesadapter->max_qp; + uresp.max_pds = nesdev->nesadapter->max_pd; + uresp.wq_size = nesdev->nesadapter->max_qp_wr*2; + + nes_ucontext = kmalloc(sizeof *nes_ucontext, GFP_KERNEL); + if (!nes_ucontext) + return ERR_PTR(-ENOMEM); + + memset(nes_ucontext, 0, sizeof(*nes_ucontext)); + + nes_ucontext->nesdev = nesdev; + /* TODO: much better ways to manage this area */ + /* TODO: cqs should be user buffers */ + nes_ucontext->mmap_wq_offset = ((uresp.max_pds * 4096)+PAGE_SIZE-1)/PAGE_SIZE; + nes_ucontext->mmap_cq_offset = nes_ucontext->mmap_wq_offset + + ((sizeof(struct nes_hw_qp_wqe) * uresp.max_qps * 2)+PAGE_SIZE-1)/PAGE_SIZE; + + if (ib_copy_to_udata(udata, &uresp, sizeof uresp)) { + kfree(nes_ucontext); + return ERR_PTR(-EFAULT); + } + + INIT_LIST_HEAD(&nes_ucontext->cq_reg_mem_list); + return &nes_ucontext->ibucontext; +} + + +/** + * nes_dealloc_ucontext + * + * @param context + * + * @return int + */ +static int nes_dealloc_ucontext(struct ib_ucontext *context) +{ +// struct nes_dev *nesdev = to_nesdev(context->device); + struct nes_ucontext *nes_ucontext = to_nesucontext(context); + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + kfree(nes_ucontext); + return 0; +} + + +/** + * nes_mmap + * + * @param context + * @param vma + * + * @return int + */ +static int nes_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) +{ + unsigned long index; + struct nes_dev *nesdev = to_nesdev(context->device); +// struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_ucontext *nes_ucontext; + struct nes_qp *nesqp; + + nes_ucontext = to_nesucontext(context); + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + + if (vma->vm_pgoff >= nes_ucontext->mmap_wq_offset) { + index = (vma->vm_pgoff - nes_ucontext->mmap_wq_offset) * PAGE_SIZE; + index /= ((sizeof(struct nes_hw_qp_wqe) * nesdev->nesadapter->max_qp_wr * 2)+PAGE_SIZE-1)&(~(PAGE_SIZE-1)); + if (!test_bit(index, nes_ucontext->allocated_wqs)) { + dprintk("%s: wq %lu not allocated\n",__FUNCTION__, index); + return -EFAULT; + } + nesqp = nes_ucontext->mmap_nesqp[index]; + if (NULL == nesqp) { + dprintk("%s: wq %lu has a NULL QP base.\n",__FUNCTION__, index); + return -EFAULT; + } + if (remap_pfn_range(vma, vma->vm_start, + nesqp->hwqp.sq_pbase>>PAGE_SHIFT, + vma->vm_end-vma->vm_start, + vma->vm_page_prot)) { + return(-EAGAIN); + } + vma->vm_private_data = nesqp; + return 0; + } else { + index = vma->vm_pgoff; + if (!test_bit(index, nes_ucontext->allocated_doorbells)) + return -EFAULT; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + if ( io_remap_pfn_range(vma, vma->vm_start, + (nesdev->nesadapter->doorbell_start+ + ((nes_ucontext->mmap_db_index[index]-nesdev->base_doorbell_index)*4096)) + >> PAGE_SHIFT, PAGE_SIZE, vma->vm_page_prot)) + return -EAGAIN; + vma->vm_private_data = nes_ucontext; + return 0; + } + + return -ENOSYS; + return 0; +} + + +/** + * nes_alloc_pd + * + * @param ibdev + * @param context + * @param udata + * + * @return struct ib_pd* + */ +static struct ib_pd *nes_alloc_pd(struct ib_device *ibdev, + struct ib_ucontext *context, + struct ib_udata *udata) { + struct nes_pd *nespd; + struct nes_dev *nesdev = to_nesdev(ibdev); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_ucontext *nes_ucontext; + struct nes_alloc_pd_resp uresp; + u32 pd_num = 0; + int err; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + err = nes_alloc_resource(nesadapter, nesadapter->allocated_pds, + nesadapter->max_pd, &pd_num, &nesadapter->next_pd); + if (err) { + return ERR_PTR(err); + } + + nespd = kmalloc(sizeof *nespd, GFP_KERNEL); + if (!nespd) { + nes_free_resource(nesadapter, nesadapter->allocated_pds, pd_num); + return ERR_PTR(-ENOMEM); + } + dprintk("Allocating PD (%p) for ib device %s\n", nespd, nesdev->ibdev.name); + + memset(nespd, 0, sizeof(*nespd)); + + /* TODO: consider per function considerations */ + nespd->pd_id = pd_num+nesadapter->base_pd; + err = 0; + if (err) { + nes_free_resource(nesadapter, nesadapter->allocated_pds, pd_num); + kfree(nespd); + return ERR_PTR(err); + } + + if (context) { + nes_ucontext = to_nesucontext(context); + nespd->mmap_db_index = find_next_zero_bit(nes_ucontext->allocated_doorbells, + NES_MAX_USER_DB_REGIONS, nes_ucontext->first_free_db ); + dprintk("find_first_zero_biton doorbells returned %u, mapping pd_id %u.\n", nespd->mmap_db_index, nespd->pd_id); + if (nespd->mmap_db_index > NES_MAX_USER_DB_REGIONS) { + nes_free_resource(nesadapter, nesadapter->allocated_pds, pd_num); + kfree(nespd); + return ERR_PTR(-ENOMEM); + } + + uresp.pd_id = nespd->pd_id; + uresp.mmap_db_index = nespd->mmap_db_index; + if (ib_copy_to_udata(udata, &uresp, sizeof uresp)) { + nes_free_resource(nesadapter, nesadapter->allocated_pds, pd_num); + kfree(nespd); + return ERR_PTR(-EFAULT); + } + set_bit(nespd->mmap_db_index, nes_ucontext->allocated_doorbells); + nes_ucontext->mmap_db_index[nespd->mmap_db_index] = nespd->pd_id; + nes_ucontext->first_free_db = nespd->mmap_db_index + 1; + } + + dprintk("%s: PD%u structure located @%p.\n", __FUNCTION__, nespd->pd_id, nespd); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + return (&nespd->ibpd); +} + + +/** + * nes_dealloc_pd + * + * @param ibpd + * + * @return int + */ +static int nes_dealloc_pd(struct ib_pd *ibpd) +{ + struct nes_ucontext *nes_ucontext; + struct nes_pd *nespd = to_nespd(ibpd); + struct nes_dev *nesdev = to_nesdev(ibpd->device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + // TODO: Do work here. + if ((ibpd->uobject)&&(ibpd->uobject->context)) { + nes_ucontext = to_nesucontext(ibpd->uobject->context); + dprintk("%s: Clearing bit %u from allocated doorbells\n", __FUNCTION__, nespd->mmap_db_index); + clear_bit(nespd->mmap_db_index, nes_ucontext->allocated_doorbells); + nes_ucontext->mmap_db_index[nespd->mmap_db_index] = 0; + if (nes_ucontext->first_free_db > nespd->mmap_db_index) { + nes_ucontext->first_free_db = nespd->mmap_db_index; + } + } + + dprintk("%s: Deallocating PD%u structure located @%p.\n", __FUNCTION__, nespd->pd_id, nespd); + nes_free_resource(nesadapter, nesadapter->allocated_pds, nespd->pd_id-nesadapter->base_pd); + kfree(nespd); + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return 0; +} + + +/** + * nes_create_ah + * + * @param pd + * @param ah_attr + * + * @return struct ib_ah* + */ +static struct ib_ah *nes_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + return ERR_PTR(-ENOSYS); +} + + +/** + * nes_destroy_ah + * + * @param ah + * + * @return int + */ +static int nes_destroy_ah(struct ib_ah *ah) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return -ENOSYS; +} + + +/** + * nes_create_qp + * + * @param ib_pd + * @param init_attr + * @param udata + * + * @return struct ib_qp* + */ +static struct ib_qp *nes_create_qp(struct ib_pd *ib_pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata) { + u64 u64temp= 0, u64nesqp = 0; + struct nes_pd *nespd = to_nespd(ib_pd); + struct nes_dev *nesdev = to_nesdev(ib_pd->device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_qp *nesqp; + struct nes_cq *nescq; + struct nes_ucontext *nes_ucontext; + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_create_qp_resp uresp; + u32 cqp_head = 0; + u32 qp_num = 0; +// u32 counter = 0; + void *mem; + + unsigned long flags; + int ret; + int err; + int sq_size; + int rq_size; + u8 sq_encoded_size; + u8 rq_encoded_size; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + switch (init_attr->qp_type) { + case IB_QPT_RC: + /* TODO: */ + init_attr->cap.max_inline_data = 0; + + if (init_attr->cap.max_send_wr < 32) { + sq_size = 32; + sq_encoded_size = 1; + } else if (init_attr->cap.max_send_wr < 128) { + sq_size = 128; + sq_encoded_size = 2; + } else if (init_attr->cap.max_send_wr < 512) { + sq_size = 512; + sq_encoded_size = 3; + } else { + printk(KERN_ERR PFX "%s: SQ size (%u) too large.\n", __FUNCTION__, init_attr->cap.max_send_wr); + return ERR_PTR(-EINVAL); + } + init_attr->cap.max_send_wr = sq_size - 2; + if (init_attr->cap.max_recv_wr < 32) { + rq_size = 32; + rq_encoded_size = 1; + } else if (init_attr->cap.max_recv_wr < 128) { + rq_size = 128; + rq_encoded_size = 2; + } else if (init_attr->cap.max_recv_wr < 512) { + rq_size = 512; + rq_encoded_size = 3; + } else { + printk(KERN_ERR PFX "%s: RQ size (%u) too large.\n", __FUNCTION__, init_attr->cap.max_recv_wr); + return ERR_PTR(-EINVAL); + } + init_attr->cap.max_recv_wr = rq_size -1; + dprintk("%s: RQ size = %u, SQ Size = %u.\n", __FUNCTION__, rq_size, sq_size); + + ret = nes_alloc_resource(nesadapter, nesadapter->allocated_qps, nesadapter->max_qp, &qp_num, &nesadapter->next_qp); + if (ret) { + return ERR_PTR(ret); + } + + /* Need 512 (actually now 1024) byte alignment on this structure */ + mem = kzalloc(sizeof(*nesqp)+NES_SW_CONTEXT_ALIGN-1, GFP_KERNEL); + if (!mem) { + nes_free_resource(nesadapter, nesadapter->allocated_qps, qp_num); + dprintk("%s: Unable to allocate QP\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + u64nesqp = (u64)mem; //u64nesqp = (u64)((uint)mem); + u64nesqp += ((u64)NES_SW_CONTEXT_ALIGN) - 1; + u64temp = ((u64)NES_SW_CONTEXT_ALIGN) - 1; + u64nesqp &= ~u64temp; + nesqp = (struct nes_qp *)u64nesqp; + dprintk("nesqp = %p, allocated buffer = %p. Rounded to closest %u\n", nesqp, mem, NES_SW_CONTEXT_ALIGN); + nesqp->allocated_buffer = mem; + + if (udata) { + if ((ib_pd->uobject)&&(ib_pd->uobject->context)) { + nesqp->user_mode = 1; + nes_ucontext = to_nesucontext(ib_pd->uobject->context); + nesqp->mmap_sq_db_index = find_next_zero_bit(nes_ucontext->allocated_wqs, + NES_MAX_USER_WQ_REGIONS, nes_ucontext->first_free_wq); + dprintk("find_first_zero_biton wqs returned %u\n", nespd->mmap_db_index); + if (nesqp->mmap_sq_db_index>NES_MAX_USER_WQ_REGIONS) { + dprintk("%s: db index is greater than max user reqions, failing create QP\n", __FUNCTION__); + nes_free_resource(nesadapter, nesadapter->allocated_qps, qp_num); + kfree(nesqp->allocated_buffer); + return ERR_PTR(-ENOMEM); + } + set_bit(nesqp->mmap_sq_db_index, nes_ucontext->allocated_wqs); + nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = nesqp; + nes_ucontext->first_free_wq = nesqp->mmap_sq_db_index + 1; + } else { + nes_free_resource(nesadapter, nesadapter->allocated_qps, qp_num); + kfree(nesqp->allocated_buffer); + return ERR_PTR(-EFAULT); + } + } + + // Allocate Memory + nesqp->qp_mem_size = (sizeof(struct nes_hw_qp_wqe)*sq_size) + /* needs 512 byte alignment */ + (sizeof(struct nes_hw_qp_wqe)*rq_size) + /* needs 512 byte alignment */ + max((u32)sizeof(struct nes_qp_context),((u32)256)) + /* needs 8 byte alignment */ + 256; /* this is Q2 */ + /* Round up to a multiple of a page */ + nesqp->qp_mem_size += PAGE_SIZE - 1; + nesqp->qp_mem_size &= ~(PAGE_SIZE - 1); + + /* TODO: Need to separate out nesqp_context at that point too!!!! */ + mem = pci_alloc_consistent(nesdev->pcidev, nesqp->qp_mem_size, + &nesqp->hwqp.sq_pbase); + if (!mem) { + nes_free_resource(nesadapter, nesadapter->allocated_qps, qp_num); + dprintk(KERN_ERR PFX "Unable to allocate memory for host descriptor rings\n"); + kfree(nesqp->allocated_buffer); + return ERR_PTR(-ENOMEM); + } + dprintk(PFX "%s: PCI consistent memory for " + "host descriptor rings located @ %p (pa = 0x%08lX.) size = %u.\n", + __FUNCTION__, mem, (unsigned long)nesqp->hwqp.sq_pbase, + nesqp->qp_mem_size); + memset(mem,0, nesqp->qp_mem_size); + + nesqp->hwqp.sq_vbase = mem; + nesqp->hwqp.sq_size = sq_size; + nesqp->hwqp.sq_encoded_size = sq_encoded_size; + nesqp->hwqp.sq_head = 1; + mem += sizeof(struct nes_hw_qp_wqe)*sq_size; + + nesqp->hwqp.rq_vbase = mem; + nesqp->hwqp.rq_size = rq_size; + nesqp->hwqp.rq_encoded_size = rq_encoded_size; + nesqp->hwqp.rq_pbase = nesqp->hwqp.sq_pbase + sizeof(struct nes_hw_qp_wqe)*sq_size; + mem += sizeof(struct nes_hw_qp_wqe)*rq_size; + + nesqp->hwqp.q2_vbase = mem; + nesqp->hwqp.q2_pbase = nesqp->hwqp.rq_pbase + sizeof(struct nes_hw_qp_wqe)*rq_size; + mem += 256; + memset(nesqp->hwqp.q2_vbase, 0, 256); + + nesqp->nesqp_context = mem; + nesqp->nesqp_context_pbase = nesqp->hwqp.q2_pbase + 256; + memset(nesqp->nesqp_context, 0, sizeof(*nesqp->nesqp_context)); + + nesqp->hwqp.qp_id = qp_num; + nesqp->ibqp.qp_num = nesqp->hwqp.qp_id; + nesqp->nespd = nespd; + + nescq = to_nescq(init_attr->send_cq); + nesqp->nesscq = nescq; + nescq = to_nescq(init_attr->recv_cq); + nesqp->nesrcq = nescq; + + /* TODO: account for these things already being filled in over in the CM code */ + nesqp->nesqp_context->misc |= (u32)PCI_FUNC(nesdev->pcidev->devfn) << NES_QPCONTEXT_MISC_PCI_FCN_SHIFT; + nesqp->nesqp_context->misc |= (u32)nesqp->hwqp.rq_encoded_size << NES_QPCONTEXT_MISC_RQ_SIZE_SHIFT; + nesqp->nesqp_context->misc |= (u32)nesqp->hwqp.sq_encoded_size << NES_QPCONTEXT_MISC_SQ_SIZE_SHIFT; + if (!udata) { + nesqp->nesqp_context->misc |= NES_QPCONTEXT_MISC_PRIV_EN; + } + //NES_QPCONTEXT_MISC_IWARP_VER_SHIFT + nesqp->nesqp_context->cqs = nesqp->nesscq->hw_cq.cq_number + ((u32)nesqp->nesrcq->hw_cq.cq_number << 16); + u64temp = (u64)nesqp->hwqp.sq_pbase; + nesqp->nesqp_context->sq_addr_low = (u32)u64temp; + nesqp->nesqp_context->sq_addr_high = (u32)(u64temp>>32); + u64temp = (u64)nesqp->hwqp.rq_pbase; + nesqp->nesqp_context->rq_addr_low = (u32)u64temp; + nesqp->nesqp_context->rq_addr_high = (u32)(u64temp>>32); + /* TODO: create a nic index value and a ip index in nes_dev */ + if (qp_num & 1) { + nesqp->nesqp_context->misc2 |= (u32)PCI_FUNC(nesdev->pcidev->devfn+1) << NES_QPCONTEXT_MISC2_NIC_INDEX_SHIFT; + } else { + nesqp->nesqp_context->misc2 |= (u32)PCI_FUNC(nesdev->pcidev->devfn) << NES_QPCONTEXT_MISC2_NIC_INDEX_SHIFT; + } + nesqp->nesqp_context->pd_index_wscale |= (u32)nesqp->nespd->pd_id << 16; + u64temp = (u64)nesqp->hwqp.q2_pbase; + nesqp->nesqp_context->q2_addr_low = (u32)u64temp; + nesqp->nesqp_context->q2_addr_high = (u32)(u64temp>>32); + *((struct nes_qp **)&nesqp->nesqp_context->aeq_token_low) = nesqp; + nesqp->nesqp_context->ird_ord_sizes = NES_QPCONTEXT_ORDIRD_ALSMM | + ((((u32)nesadapter->max_irrq_wr)<nesqp_context->ird_ord_sizes |= NES_QPCONTEXT_ORDIRD_RNMC; + } + + /* Create the QP */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_CREATE_QP | NES_CQP_QP_TYPE_IWARP | NES_CQP_QP_IWARP_STATE_IDLE; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_QP_CQS_VALID; + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nesqp->hwqp.qp_id; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + u64temp = (u64)nesqp->nesqp_context_pbase; + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = (u32)u64temp; + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = (u32)(u64temp>>32); + + barrier(); + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("Waiting for create iWARP QP%u to complete.\n", nesqp->hwqp.qp_id); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Create iwarp QP completed, wait_event_timeout ret = %u.\n", ret); + /* TODO: Catch error code... */ + + if (ib_pd->uobject) { + uresp.mmap_sq_db_index = nesqp->mmap_sq_db_index; + uresp.actual_sq_size = sq_size; + uresp.actual_rq_size = rq_size; + uresp.qp_id = nesqp->hwqp.qp_id; + if (ib_copy_to_udata(udata, &uresp, sizeof uresp)) { + /* TODO: Much more clean up to do here */ + nes_free_resource(nesadapter, nesadapter->allocated_qps, qp_num); + kfree(nesqp->allocated_buffer); + return ERR_PTR(-EFAULT); + } + } + + + dprintk("%s: QP%u structure located @%p.Size = %u.\n", __FUNCTION__, nesqp->hwqp.qp_id, nesqp, (u32)sizeof(*nesqp)); + spin_lock_init(&nesqp->lock); + init_waitqueue_head( &nesqp->state_waitq ); + nes_add_ref(&nesqp->ibqp); + nesqp->aewq = create_singlethread_workqueue("NesDisconnectWQ"); + break; + default: + dprintk("%s: Invalid QP type: %d\n", __FUNCTION__, + init_attr->qp_type); + return ERR_PTR(-EINVAL); + break; + } + + /* update the QP table */ + nesdev->nesadapter->qp_table[nesqp->hwqp.qp_id-NES_FIRST_QPN] = nesqp; + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + return &nesqp->ibqp; +} + + +/** + * nes_destroy_qp + * + * @param ib_qp + * + * @return int + */ +static int nes_destroy_qp(struct ib_qp *ib_qp) +{ + u64 u64temp; + struct nes_qp *nesqp = to_nesqp(ib_qp); + struct nes_dev *nesdev = to_nesdev(ib_qp->device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_ucontext *nes_ucontext; + struct ib_qp_attr attr; + unsigned long flags; + int ret; + u32 cqp_head; + + dprintk("%s:%s:%u: Destroying QP%u\n", __FILE__, __FUNCTION__, __LINE__, nesqp->hwqp.qp_id); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + /* Blow away the connection if it exists. */ + if (nesqp->cm_id && nesqp->cm_id->provider_data) { + /* TODO: Probably want to use error as the state */ + attr.qp_state = IB_QPS_SQD; + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE ); + } + + destroy_workqueue(nesqp->aewq); + /* TODO: Add checks... MW bound count, others ? */ + + /* Destroy the QP */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_DESTROY_QP | NES_CQP_QP_TYPE_IWARP); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(nesqp->hwqp.qp_id); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + u64temp = (u64)nesqp->nesqp_context_pbase; + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + + barrier(); + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + + /* Wait for CQP */ + dprintk("Waiting for destroy iWARP QP%u to complete.\n", nesqp->hwqp.qp_id); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Destroy iwarp QP completed, wait_event_timeout ret = %u.\n", ret); + + /* TODO: Catch error cases */ + + if (nesqp->user_mode) { + if ((ib_qp->uobject)&&(ib_qp->uobject->context)) { + nes_ucontext = to_nesucontext(ib_qp->uobject->context); + clear_bit(nesqp->mmap_sq_db_index, nes_ucontext->allocated_wqs); + nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = NULL; + if (nes_ucontext->first_free_wq > nesqp->mmap_sq_db_index) { + nes_ucontext->first_free_wq = nesqp->mmap_sq_db_index; + } + } + } + // Free the control structures + pci_free_consistent(nesdev->pcidev, nesqp->qp_mem_size, nesqp->hwqp.sq_vbase, + nesqp->hwqp.sq_pbase); + + nesadapter->qp_table[nesqp->hwqp.qp_id-NES_FIRST_QPN] = NULL; + nes_free_resource(nesadapter, nesadapter->allocated_qps, nesqp->hwqp.qp_id); + + nes_rem_ref(&nesqp->ibqp); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + return 0; +} + + +/** + * nes_create_cq + * + * @param ibdev + * @param entries + * @param context + * @param udata + * + * @return struct ib_cq* + */ +static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int entries, + struct ib_ucontext *context, + struct ib_udata *udata) { + u64 u64temp; + struct nes_dev *nesdev = to_nesdev(ibdev); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_cq *nescq; + struct nes_ucontext *nes_ucontext = NULL; + void *mem; + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_pbl *nespbl = NULL; + struct nes_create_cq_req req; + struct nes_create_cq_resp resp; + u32 cqp_head; + u32 cq_num= 0; + u32 pbl_entries = 1; + int err = -ENOSYS; + unsigned long flags; + int ret; + + dprintk("%s:%s:%u: entries = %u\n", __FILE__, __FUNCTION__, __LINE__, entries); + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + err = nes_alloc_resource(nesadapter, nesadapter->allocated_cqs, nesadapter->max_cq, &cq_num, &nesadapter->next_cq); + if (err) { + return ERR_PTR(err); + } + + nescq = kmalloc(sizeof(*nescq), GFP_KERNEL); + if (!nescq) { + dprintk("%s: Unable to allocate CQ\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + + memset(nescq, 0, sizeof *nescq); + nescq->hw_cq.cq_size = max(entries+1,5); /* four usable entries seems like a reasonable min */ + nescq->hw_cq.cq_number = cq_num; + nescq->ibcq.cqe = nescq->hw_cq.cq_size - 1; + + if (context) { + nes_ucontext = to_nesucontext(context); + if (ib_copy_from_udata(&req, udata, sizeof(req))) + return ERR_PTR(-EFAULT); + dprintk("%s: CQ Virtual Address = %08lX, size = %u.\n", + __FUNCTION__, (unsigned long)req.user_cq_buffer, entries); + list_for_each_entry(nespbl, &nes_ucontext->cq_reg_mem_list, list) { + if (nespbl->user_base == (unsigned long )req.user_cq_buffer) { + list_del(&nespbl->list); + err = 0; + dprintk("%s: Found PBL for virtual CQ. nespbl=%p.\n", __FUNCTION__, nespbl); + break; + } + } + if (err) { + nes_free_resource(nesadapter, nesadapter->allocated_cqs, cq_num); + kfree(nescq); + return ERR_PTR(err); + } + pbl_entries = nespbl->pbl_size >> 3; + nescq->cq_mem_size = 0; + } else { + nescq->cq_mem_size = nescq->hw_cq.cq_size * sizeof(struct nes_hw_cqe); + dprintk("%s: Attempting to allocate pci memory (%u entries, %u bytes) for CQ%u.\n", + __FUNCTION__, entries, nescq->cq_mem_size, nescq->hw_cq.cq_number); + + /* allocate the physical buffer space */ + /* TODO: look into how to allocate this memory to be used for user space */ + mem = pci_alloc_consistent(nesdev->pcidev, nescq->cq_mem_size, + &nescq->hw_cq.cq_pbase); + if (!mem) { + nes_free_resource(nesadapter, nesadapter->allocated_cqs, cq_num); + dprintk(KERN_ERR PFX "Unable to allocate pci memory for cq\n"); + return ERR_PTR(-ENOMEM); + } + + memset(mem, 0, nescq->cq_mem_size); + nescq->hw_cq.cq_vbase = mem; + nescq->hw_cq.cq_head = 0; + dprintk("%s: CQ%u virtual address @ %p, phys = 0x%08X .\n", + __FUNCTION__, nescq->hw_cq.cq_number, nescq->hw_cq.cq_vbase, (u32)nescq->hw_cq.cq_pbase); + } + + nescq->hw_cq.ce_handler = iwarp_ce_handler; + spin_lock_init(&nescq->lock); + + /* Send CreateCQ request to CQP */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_CREATE_CQ | NES_CQP_CQ_CEQ_VALID | + NES_CQP_CQ_CEQE_MASK |(nescq->hw_cq.cq_size<<16); + if (1 != pbl_entries) { + if (0 == nesadapter->free_256pbl) { + /* TODO: need to backout */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + nes_free_resource(nesadapter, nesadapter->allocated_cqs, cq_num); + kfree(nescq); + return ERR_PTR(-ENOMEM); + } else { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_CQ_VIRT; + nescq->virtual_cq = 1; + nesadapter->free_256pbl--; + } + } + + /* TODO: Separate iWARP from to its own CEQ? */ + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nescq->hw_cq.cq_number | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<16); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + if (context) { + if (1 != pbl_entries) + u64temp = (u64)nespbl->pbl_pbase; + else + u64temp = nespbl->pbl_vbase[0]; + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = nes_ucontext->mmap_db_index[0]; + } else { + u64temp = (u64)nescq->hw_cq.cq_pbase; + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = 0; + } + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_LOW_IDX] = (u32)u64temp; + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_HIGH_IDX] = (u32)(u64temp>>32); + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX] = 0; + *((struct nes_hw_cq **)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) = &nescq->hw_cq; + *((u64 *)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) >>= 1; + + barrier(); + dprintk("%s: CQ%u context = 0x%08X:0x%08X.\n", __FUNCTION__, nescq->hw_cq.cq_number, + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX], + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]); + + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("Waiting for create iWARP CQ%u to complete.\n", nescq->hw_cq.cq_number); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Create iwarp CQ completed, wait_event_timeout ret = %d.\n", ret); + /* TODO: Catch error cases */ + + if (context) { + /* free the nespbl */ + pci_free_consistent(nesdev->pcidev, nespbl->pbl_size, + nespbl->pbl_vbase, nespbl->pbl_pbase); + kfree(nespbl); + /* write back the parameters */ + resp.cq_id = nescq->hw_cq.cq_number; + resp.cq_size = nescq->hw_cq.cq_size; + resp.mmap_db_index = 0; + if (ib_copy_to_udata(udata, &resp, sizeof resp)) { + nes_free_resource(nesadapter, nesadapter->allocated_cqs, cq_num); + kfree(nescq); + return ERR_PTR(-EFAULT); + } + } + + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + return &nescq->ibcq; +} + + +/** + * nes_destroy_cq + * + * @param ib_cq + * + * @return int + */ +static int nes_destroy_cq(struct ib_cq *ib_cq) +{ + struct nes_cq *nescq; + struct nes_dev *nesdev; + struct nes_adapter *nesadapter; + struct nes_hw_cqp_wqe *cqp_wqe; + u32 cqp_head; + unsigned long flags; + int ret; + + dprintk("%s:%s:%u: %p.\n", __FILE__, __FUNCTION__, __LINE__, ib_cq); + + if (ib_cq == NULL) + return 0; + + nescq = to_nescq(ib_cq); + nesdev = to_nesdev(ib_cq->device); + nesadapter = nesdev->nesadapter; + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + + /* Send DestroyCQ request to CQP */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + if (nescq->virtual_cq) { + nesadapter->free_256pbl++; + if (nesadapter->free_256pbl > nesadapter->max_256pbl) { + printk(KERN_ERR PFX "%s: free 256B PBLs(%u) has exceeded the max(%u)\n", + __FUNCTION__, nesadapter->free_256pbl, nesadapter->max_256pbl); + } + } + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_DESTROY_CQ | (nescq->hw_cq.cq_size<<16); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nescq->hw_cq.cq_number | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<16); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + + barrier(); + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("Waiting for destroy iWARP CQ%u to complete.\n", nescq->hw_cq.cq_number); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Destroy iwarp CQ completed, wait_event_timeout ret = %u.\n", ret); + /* TODO: catch CQP error cases */ + + if (nescq->cq_mem_size) + pci_free_consistent(nesdev->pcidev, nescq->cq_mem_size, (void *)nescq->hw_cq.cq_vbase, + nescq->hw_cq.cq_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_cqs, nescq->hw_cq.cq_number); + kfree(nescq); + + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, atomic_read(&nesdev->netdev->refcnt)); + return 0; +} + + +/** + * nes_reg_mr + * + * @param nesdev + * @param nespd + * @param stag + * @param region_length + * @param root_vpbl + * @param single_buffer + * @param pbl_count + * @param residual_page_count + * @param acc + * @param iova_start + * + * @return int + */ +static int nes_reg_mr(struct nes_dev *nesdev, + struct nes_pd *nespd, + u32 stag, + u64 region_length, + struct nes_root_vpbl *root_vpbl, + dma_addr_t single_buffer, + u16 pbl_count, + u16 residual_page_count, + int acc, + u64 * iova_start) +{ + struct nes_hw_cqp_wqe *cqp_wqe; + unsigned long flags; + u32 cqp_head; + int ret; + struct nes_adapter *nesadapter = nesdev->nesadapter; +// int count; + + /* Register the region with the adapter */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + + /* track PBL resources */ + if (pbl_count != 0) { + if (pbl_count > 1) { + /* Two level PBL */ + if ((pbl_count+1) > nesadapter->free_4kpbl) { + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + return (-ENOMEM); + } else { + nesadapter->free_4kpbl -= pbl_count+1; + } + } else if (residual_page_count > 32) { + if (pbl_count > nesadapter->free_4kpbl) { + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + return -ENOMEM; + } else { + nesadapter->free_4kpbl -= pbl_count; + } + } else { + if (pbl_count > nesadapter->free_256pbl) { + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + return -ENOMEM; + } else { + nesadapter->free_256pbl -= pbl_count; + } + } + } + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_REGISTER_STAG | NES_CQP_STAG_RIGHTS_LOCAL_READ; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_STAG_VA_TO | NES_CQP_STAG_MR; + if (acc & IB_ACCESS_LOCAL_WRITE) { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_STAG_RIGHTS_LOCAL_WRITE; + } + if (acc & IB_ACCESS_REMOTE_WRITE) { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_STAG_RIGHTS_REMOTE_WRITE | NES_CQP_STAG_REM_ACC_EN; + } + if (acc & IB_ACCESS_REMOTE_READ) { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_STAG_RIGHTS_REMOTE_READ | NES_CQP_STAG_REM_ACC_EN; + } + if (acc & IB_ACCESS_MW_BIND) { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_STAG_RIGHTS_WINDOW_BIND | NES_CQP_STAG_REM_ACC_EN; + } + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_VA_LOW_IDX] = cpu_to_le32((u32)*iova_start); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_VA_HIGH_IDX] = cpu_to_le32((u32)((((u64)*iova_start)>>32))); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_LEN_LOW_IDX] = cpu_to_le32((u32)region_length); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_LEN_HIGH_PD_IDX] = cpu_to_le32((u32)(region_length>>8)&0xff000000); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_LEN_HIGH_PD_IDX] |= cpu_to_le32(nespd->pd_id&0x00007fff); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_STAG_IDX] = cpu_to_le32(stag); + + if (pbl_count == 0) { + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_LOW_IDX] = cpu_to_le32((u32)single_buffer); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_HIGH_IDX] = cpu_to_le32((u32)((((u64)single_buffer)>>32))); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_LEN_IDX] = 0; + } else { + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_LOW_IDX] = cpu_to_le32((u32)root_vpbl->pbl_pbase); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_HIGH_IDX] = cpu_to_le32((u32)((((u64)root_vpbl->pbl_pbase)>>32))); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX] = cpu_to_le32(pbl_count); + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_LEN_IDX] = cpu_to_le32(((pbl_count-1)*4096)+(residual_page_count*8)); + if ((pbl_count > 1)||(residual_page_count > 32)) { + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= NES_CQP_STAG_PBL_BLK_SIZE; + } + } + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX]); + + barrier(); + + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("%s: Register STag 0x%08X completed, wait_event_timeout ret = %u.\n", __FUNCTION__, stag, ret); + /* TODO: Catch error code... */ + + return 0; +} + + +/** + * nes_reg_phys_mr + * + * @param ib_pd + * @param buffer_list + * @param num_phys_buf + * @param acc + * @param iova_start + * + * @return struct ib_mr* + */ +static struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd, + struct ib_phys_buf *buffer_list, + int num_phys_buf, int acc, u64 * iova_start) { + u64 region_length; + struct nes_pd *nespd = to_nespd(ib_pd); + struct nes_dev *nesdev = to_nesdev(ib_pd->device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_mr *nesmr; + struct ib_mr *ibmr; + struct nes_vpbl vpbl; + struct nes_root_vpbl root_vpbl; + u32 stag; + u32 i; + u32 stag_index = 0; + u32 next_stag_index = 0; + u32 driver_key = 0; + u32 root_pbl_index = 0; + u32 cur_pbl_index = 0; + int err = 0, pbl_depth = 0; + int ret = 0; + u16 pbl_count = 0; + u8 single_page = 1; + u8 stag_key = 0; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + pbl_depth = 0; + region_length = 0; + vpbl.pbl_vbase = NULL; + root_vpbl.pbl_vbase = NULL; + root_vpbl.pbl_pbase = 0; + + get_random_bytes(&next_stag_index, sizeof(next_stag_index)); + stag_key = (u8)next_stag_index; + + driver_key = 0; + + next_stag_index >>= 8; + next_stag_index %= nesadapter->max_mr; + if (num_phys_buf > (1024*512)){ + return ERR_PTR(-E2BIG); + } + + err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, nesadapter->max_mr, &stag_index, &next_stag_index); + if (err) { + return ERR_PTR(err); + } + + nesmr = kmalloc(sizeof(*nesmr), GFP_KERNEL); + if (!nesmr) { + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + return ERR_PTR(-ENOMEM); + } + + for (i = 0; i < num_phys_buf; i++) { + + if ((i & 0x01FF) == 0) { + if (1 == root_pbl_index) { + /* Allocate the root PBL */ + root_vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 8192, + &root_vpbl.pbl_pbase); + dprintk("%s: Allocating root PBL, va = %p, pa = 0x%08X\n", + __FUNCTION__, root_vpbl.pbl_vbase, (unsigned int)root_vpbl.pbl_pbase); + if (!root_vpbl.pbl_vbase) { + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + kfree(nesmr); + return ERR_PTR(-ENOMEM); + } + root_vpbl.leaf_vpbl = kmalloc(sizeof(*root_vpbl.leaf_vpbl)*1024, GFP_KERNEL); + if (!root_vpbl.leaf_vpbl) { + pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, + root_vpbl.pbl_pbase); + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + kfree(nesmr); + return ERR_PTR(-ENOMEM); + } + root_vpbl.pbl_vbase[0].pa_low = cpu_to_le32((u32)vpbl.pbl_pbase); + root_vpbl.pbl_vbase[0].pa_high = cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); + root_vpbl.leaf_vpbl[0] = vpbl; + } + /* Allocate a 4K buffer for the PBL */ + vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 4096, + &vpbl.pbl_pbase); + dprintk("%s: Allocating leaf PBL, va = %p, pa = 0x%016lX\n", + __FUNCTION__, vpbl.pbl_vbase, (unsigned long)vpbl.pbl_pbase); + if (!vpbl.pbl_vbase) { + /* TODO: Unwind allocated buffers */ + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + ibmr = ERR_PTR(-ENOMEM); + kfree(nesmr); + goto reg_phys_err; + } + /* Fill in the root table */ + if (1 <= root_pbl_index) { + root_vpbl.pbl_vbase[root_pbl_index].pa_low = cpu_to_le32((u32)vpbl.pbl_pbase); + root_vpbl.pbl_vbase[root_pbl_index].pa_high = cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); + root_vpbl.leaf_vpbl[root_pbl_index] = vpbl; + } + root_pbl_index++; + cur_pbl_index = 0; + } + if (buffer_list[i].addr & ~PAGE_MASK) { + /* TODO: Unwind allocated buffers */ + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + dprintk("Unaligned Memory Buffer: 0x%x\n", + (unsigned int) buffer_list[i].addr); + ibmr = ERR_PTR(-EINVAL); + kfree(nesmr); + goto reg_phys_err; + } + + if (!buffer_list[i].size) { + /* TODO: Unwind allocated buffers */ + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + dprintk("Invalid Buffer Size\n"); + ibmr = ERR_PTR(-EINVAL); + kfree(nesmr); + goto reg_phys_err; + } + + region_length += buffer_list[i].size; + if ((i != 0) && (single_page)) { + if ((buffer_list[i-1].addr+PAGE_SIZE) != buffer_list[i].addr) + single_page = 0; + } + vpbl.pbl_vbase[cur_pbl_index].pa_low = cpu_to_le32((u32)buffer_list[i].addr); + vpbl.pbl_vbase[cur_pbl_index++].pa_high = cpu_to_le32((u32)((((u64)buffer_list[i].addr)>>32))); + } + + stag = stag_index<<8; + stag |= driver_key; + /* TODO: key should come from consumer */ + stag += (u32)stag_key; + + dprintk("%s: Registering STag 0x%08X, VA = 0x%016lX, length = 0x%016lX, index = 0x%08X\n", + __FUNCTION__, stag, (unsigned long)*iova_start, (unsigned long)region_length, stag_index); + + /* TODO: Should the region length be reduced by iova_start &PAGE_MASK, think so */ + region_length -= (*iova_start)&PAGE_MASK; + + /* Make the leaf PBL the root if only one PBL */ + if (root_pbl_index == 1) { + root_vpbl.pbl_pbase = vpbl.pbl_pbase; + } + + if (single_page) { + pbl_count = 0; + } else { + pbl_count = root_pbl_index; + } + ret = nes_reg_mr( nesdev, nespd, stag, region_length, &root_vpbl, + buffer_list[0].addr, pbl_count, (u16)cur_pbl_index, + acc, iova_start); + + if (ret == 0) { + nesmr->ibmr.rkey = stag; + nesmr->ibmr.lkey = stag; + nesmr->mode = IWNES_MEMREG_TYPE_MEM; + ibmr = &nesmr->ibmr; + nesmr->pbl_4k = ((pbl_count>1)||(cur_pbl_index>32)) ? 1 : 0; + nesmr->pbls_used = pbl_count; + if (pbl_count > 1) { + nesmr->pbls_used++; + } + } else { + kfree(nesmr); + ibmr = ERR_PTR(-ENOMEM); + } + +reg_phys_err: + /* free the resources */ + if (root_pbl_index == 1) { + /* single PBL case */ + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + } else { + for (i=0; ipcidev, 4096, root_vpbl.leaf_vpbl[i].pbl_vbase, + root_vpbl.leaf_vpbl[i].pbl_pbase); + } + kfree(root_vpbl.leaf_vpbl); + pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, + root_vpbl.pbl_pbase); + } + + return ibmr; +} + + +/** + * nes_get_dma_mr + * + * @param pd + * @param acc + * + * @return struct ib_mr* + */ +static struct ib_mr *nes_get_dma_mr(struct ib_pd *pd, int acc) { + struct ib_phys_buf bl; + u64 kva = 0; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + bl.size = 0xffffffffff; + bl.addr = 0; + return nes_reg_phys_mr(pd, &bl, 1, acc, &kva); +} + + +/** + * nes_reg_user_mr + * + * @param pd + * @param region + * @param acc + * @param udata + * + * @return struct ib_mr* + */ +static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, struct ib_umem *region, + int acc, struct ib_udata *udata) +{ + u64 iova_start; + u64 *pbl; + u64 region_length; + dma_addr_t last_dma_addr = 0; + dma_addr_t first_dma_addr = 0; + struct nes_pd *nespd = to_nespd(pd); + struct nes_dev *nesdev = to_nesdev(pd->device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct ib_mr *ibmr; + struct ib_umem_chunk *chunk; + struct nes_ucontext *nes_ucontext; + struct nes_pbl *nespbl; + struct nes_mr *nesmr; + struct nes_mem_reg_req req; + struct nes_vpbl vpbl; + struct nes_root_vpbl root_vpbl; + int j; + int page_count = 0; + int err, pbl_depth = 0; + int ret; + u32 stag; + u32 stag_index = 0; + u32 next_stag_index; + u32 driver_key; + u32 root_pbl_index = 0; + u32 cur_pbl_index = 0; + u16 pbl_count; + u8 single_page = 1; + u8 stag_key; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + dprintk("%s: User base = 0x%lX, Virt base = 0x%lX, length = %u, offset = %u, page size = %u.\n", + __FUNCTION__, region->user_base, region->virt_base, (u32)region->length, region->offset, region->page_size); + + if (ib_copy_from_udata(&req, udata, sizeof(req))) + return ERR_PTR(-EFAULT); + dprintk("%s: Memory Registration type = %08X.\n", __FUNCTION__, req.reg_type); + + switch (req.reg_type) { + case IWNES_MEMREG_TYPE_MEM: + pbl_depth = 0; + region_length = 0; + vpbl.pbl_vbase = NULL; + root_vpbl.pbl_vbase = NULL; + root_vpbl.pbl_pbase = 0; + + get_random_bytes(&next_stag_index, sizeof(next_stag_index)); + stag_key = (u8)next_stag_index; + + driver_key = 0; + + next_stag_index >>= 8; + next_stag_index %= nesadapter->max_mr; + + err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, nesadapter->max_mr, &stag_index, &next_stag_index); + if (err) { + return ERR_PTR(err); + } + + nesmr = kmalloc(sizeof(*nesmr), GFP_KERNEL); + if (!nesmr) { + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + return ERR_PTR(-ENOMEM); + } + + /* todo: make this code and reg_phy_mr loop more common!!! */ + list_for_each_entry(chunk, ®ion->chunk_list, list) { + dprintk("%s: Chunk: nents = %u, nmap = %u .\n", __FUNCTION__, chunk->nents, chunk->nmap ); + for (j = 0; j < chunk->nmap; ++j) { + dprintk("%s: \tsg_dma_addr = 0x%08lx, length = %u.\n", + __FUNCTION__, (unsigned long)sg_dma_address(&chunk->page_list[j]), sg_dma_len(&chunk->page_list[j]) ); + + if ((page_count&0x01FF) == 0) { + if (page_count>(1024*512)) { + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + kfree(nesmr); + return ERR_PTR(-E2BIG); + } + if (1 == root_pbl_index) { + root_vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 8192, + &root_vpbl.pbl_pbase); + dprintk("%s: Allocating root PBL, va = %p, pa = 0x%08X\n", + __FUNCTION__, root_vpbl.pbl_vbase, (unsigned int)root_vpbl.pbl_pbase); + if (!root_vpbl.pbl_vbase) { + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + kfree(nesmr); + return ERR_PTR(-ENOMEM); + } + root_vpbl.leaf_vpbl = kmalloc(sizeof(*root_vpbl.leaf_vpbl)*1024, GFP_KERNEL); + if (!root_vpbl.leaf_vpbl) { + pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, + root_vpbl.pbl_pbase); + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + kfree(nesmr); + return ERR_PTR(-ENOMEM); + } + root_vpbl.pbl_vbase[0].pa_low = cpu_to_le32((u32)vpbl.pbl_pbase); + root_vpbl.pbl_vbase[0].pa_high = cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); + root_vpbl.leaf_vpbl[0] = vpbl; + } + vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 4096, + &vpbl.pbl_pbase); + dprintk("%s: Allocating leaf PBL, va = %p, pa = 0x%08X\n", + __FUNCTION__, vpbl.pbl_vbase, (unsigned int)vpbl.pbl_pbase); + if (!vpbl.pbl_vbase) { + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + ibmr = ERR_PTR(-ENOMEM); + kfree(nesmr); + goto reg_user_mr_err; + } + if (1 <= root_pbl_index) { + root_vpbl.pbl_vbase[root_pbl_index].pa_low = cpu_to_le32((u32)vpbl.pbl_pbase); + root_vpbl.pbl_vbase[root_pbl_index].pa_high = cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); + root_vpbl.leaf_vpbl[root_pbl_index] = vpbl; + } + root_pbl_index++; + cur_pbl_index = 0; + } + if (sg_dma_address(&chunk->page_list[j]) & ~PAGE_MASK) { + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + dprintk("%s: Unaligned Memory Buffer: 0x%x\n", __FUNCTION__, + (unsigned int) sg_dma_address(&chunk->page_list[j])); + ibmr = ERR_PTR(-EINVAL); + kfree(nesmr); + goto reg_user_mr_err; + } + + if (!sg_dma_len(&chunk->page_list[j])) { + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + dprintk("%s: Invalid Buffer Size\n", __FUNCTION__); + ibmr = ERR_PTR(-EINVAL); + kfree(nesmr); + goto reg_user_mr_err; + } + + region_length += sg_dma_len(&chunk->page_list[j]); + if (single_page) { + if (page_count != 0) { + if ((last_dma_addr+PAGE_SIZE) != sg_dma_address(&chunk->page_list[j])) + single_page = 0; + last_dma_addr = sg_dma_address(&chunk->page_list[j]); + } else { + first_dma_addr = sg_dma_address(&chunk->page_list[j]); + last_dma_addr = first_dma_addr; + } + } + + vpbl.pbl_vbase[cur_pbl_index].pa_low = cpu_to_le32((u32)sg_dma_address(&chunk->page_list[j])); + vpbl.pbl_vbase[cur_pbl_index].pa_high = cpu_to_le32((u32)((((u64)sg_dma_address(&chunk->page_list[j]))>>32))); + dprintk("%s: PBL %u (@%p) = 0x%08X:%08X\n", __FUNCTION__, cur_pbl_index, + &vpbl.pbl_vbase[cur_pbl_index], vpbl.pbl_vbase[cur_pbl_index].pa_high, + vpbl.pbl_vbase[cur_pbl_index].pa_low); + cur_pbl_index++; + page_count++; + } + } + stag = stag_index<<8; + stag |= driver_key; + /* TODO: key should come from consumer */ + stag += (u32)stag_key; + + iova_start = (u64)region->virt_base; + dprintk("%s: Registering STag 0x%08X, VA = 0x%08X, length = 0x%08X, index = 0x%08X, region->length=0x%08x\n", + __FUNCTION__, stag, (unsigned int)iova_start, (unsigned int)region_length, stag_index, region->length); + + + /* Make the leaf PBL the root if only one PBL */ + if (root_pbl_index == 1) { + root_vpbl.pbl_pbase = vpbl.pbl_pbase; + } + + if (single_page) { + pbl_count = 0; + } else { + pbl_count = root_pbl_index; + first_dma_addr = 0; + } + ret = nes_reg_mr( nesdev, nespd, stag, region->length, &root_vpbl, + first_dma_addr, pbl_count, (u16)cur_pbl_index, + acc, &iova_start); + + if (ret == 0) { + nesmr->ibmr.rkey = stag; + nesmr->ibmr.lkey = stag; + nesmr->mode = IWNES_MEMREG_TYPE_MEM; + ibmr = &nesmr->ibmr; + nesmr->pbl_4k = ((pbl_count>1)||(cur_pbl_index>32)) ? 1 : 0; + nesmr->pbls_used = pbl_count; + if (pbl_count > 1) { + nesmr->pbls_used++; + } + } else { + kfree(nesmr); + ibmr = ERR_PTR(-ENOMEM); + } + +reg_user_mr_err: + /* free the resources */ + if (root_pbl_index == 1) { + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + } else { + for (j=0; jpcidev, 4096, root_vpbl.leaf_vpbl[j].pbl_vbase, + root_vpbl.leaf_vpbl[j].pbl_pbase); + } + kfree(root_vpbl.leaf_vpbl); + pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, + root_vpbl.pbl_pbase); + } + + return ibmr; + break; + case IWNES_MEMREG_TYPE_QP: + return ERR_PTR(-ENOSYS); + break; + case IWNES_MEMREG_TYPE_CQ: + nespbl = kmalloc(sizeof(*nespbl), GFP_KERNEL); + if (!nespbl) { + dprintk("%s: Unable to allocate PBL\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + memset(nespbl, 0, sizeof(*nespbl)); + nesmr = kmalloc(sizeof(*nesmr), GFP_KERNEL); + if (!nesmr) { + kfree(nespbl); + dprintk("%s: Unable to allocate nesmr\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + memset(nesmr, 0, sizeof(*nesmr)); + nes_ucontext = to_nesucontext(pd->uobject->context); + pbl_depth = region->length >> PAGE_SHIFT; + pbl_depth += (region->length & ~PAGE_MASK) ? 1 : 0; + nespbl->pbl_size = pbl_depth*sizeof(u64); + dprintk("%s: Attempting to allocate CQ PBL memory, %u bytes, %u entries.\n", __FUNCTION__, nespbl->pbl_size, pbl_depth ); + pbl = pci_alloc_consistent(nesdev->pcidev, nespbl->pbl_size, + &nespbl->pbl_pbase); + if (!pbl) { + kfree(nesmr); + kfree(nespbl); + dprintk("%s: Unable to allocate cq PBL memory\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + + nespbl->pbl_vbase = pbl; + nespbl->user_base = region->user_base; + + list_for_each_entry(chunk, ®ion->chunk_list, list) { + for (j = 0; j < chunk->nmap; ++j) { + *pbl++ = cpu_to_le64((u64)sg_dma_address(&chunk->page_list[j])); + } + } + list_add_tail(&nespbl->list, &nes_ucontext->cq_reg_mem_list); + nesmr->ibmr.rkey = -1; + nesmr->ibmr.lkey = -1; + nesmr->mode = IWNES_MEMREG_TYPE_CQ; + return &nesmr->ibmr; + break; + } + + return ERR_PTR(-ENOSYS); +} + + +/** + * nes_dereg_mr + * + * @param ib_mr + * + * @return int + */ +static int nes_dereg_mr(struct ib_mr *ib_mr) +{ + struct nes_mr *nesmr = to_nesmr(ib_mr); + struct nes_dev *nesdev = to_nesdev(ib_mr->device); + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_hw_cqp_wqe *cqp_wqe; + u32 cqp_head; + int err; + unsigned long flags; + int ret; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + if (nesmr->mode != IWNES_MEMREG_TYPE_MEM) { + /* TODO: Any cross checking with CQ/QP that owned? */ + kfree(nesmr); + return 0; + } + + /* Deallocate the region with the adapter */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + + if (0 != nesmr->pbls_used) { + if (nesmr->pbl_4k) { + nesadapter->free_4kpbl += nesmr->pbls_used; + if (nesadapter->free_4kpbl > nesadapter->max_4kpbl) { + printk(KERN_ERR PFX "free 4KB PBLs(%u) has exceeded the max(%u)\n", + nesadapter->free_4kpbl, nesadapter->max_4kpbl); + } + } else { + nesadapter->free_256pbl += nesmr->pbls_used; + if (nesadapter->free_256pbl > nesadapter->max_256pbl) { + printk(KERN_ERR PFX "free 256B PBLs(%u) has exceeded the max(%u)\n", + nesadapter->free_256pbl, nesadapter->max_256pbl); + } + } + } + + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_DEALLOCATE_STAG | NES_CQP_STAG_VA_TO | + NES_CQP_STAG_DEALLOC_PBLS | NES_CQP_STAG_MR; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_LEN_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_STAG_IDX] = ib_mr->rkey; + + barrier(); + + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("Waiting for deallocate STag 0x%08X to complete.\n", ib_mr->rkey); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Deallocate STag completed, wait_event_timeout ret = %u.\n", ret); + /* TODO: Catch error code... */ + + nes_free_resource(nesadapter, nesadapter->allocated_mrs, (ib_mr->rkey&0x0fffff00)>>8); + + err = 0; + if (err) + dprintk("nes_stag_dealloc failed: %d\n", err); + else + kfree(nesmr); + + return err; +} + + +/** + * show_rev + * + * @param cdev + * @param buf + * + * @return ssize_t + */ +static ssize_t show_rev(struct class_device *cdev, char *buf) +{ + struct nes_dev *nesdev = container_of(cdev, struct nes_dev, ibdev.class_dev); + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return sprintf(buf, "%x\n", nesdev->nesadapter->hw_rev); +} + + +/** + * show_fw_ver + * + * @param cdev + * @param buf + * + * @return ssize_t + */ +static ssize_t show_fw_ver(struct class_device *cdev, char *buf) +{ + struct nes_dev *nesdev = container_of(cdev, struct nes_dev, ibdev.class_dev); + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return sprintf(buf, "%x.%x.%x\n", + (int) (nesdev->nesadapter->fw_ver >> 32), + (int) (nesdev->nesadapter->fw_ver >> 16) & 0xffff, + (int) (nesdev->nesadapter->fw_ver & 0xffff)); +} + + +/** + * show_hca + * + * @param cdev + * @param buf + * + * @return ssize_t + */ +static ssize_t show_hca(struct class_device *cdev, char *buf) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return sprintf(buf, "NES010\n"); +} + + +/** + * show_board + * + * @param cdev + * @param buf + * + * @return ssize_t + */ +static ssize_t show_board(struct class_device *cdev, char *buf) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return sprintf(buf, "%.*s\n", 32, "NES010 Board ID"); +} + +static CLASS_DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL); +static CLASS_DEVICE_ATTR(fw_ver, S_IRUGO, show_fw_ver, NULL); +static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca, NULL); +static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board, NULL); + +static struct class_device_attribute *nes_class_attributes[] = { + &class_device_attr_hw_rev, + &class_device_attr_fw_ver, + &class_device_attr_hca_type, + &class_device_attr_board_id +}; + + +/** + * nes_query_qp + * + * @param qp + * @param qp_attr + * @param qp_attr_mask + * @param qp_init_attr + * + * @return int + */ +static int nes_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, + struct ib_qp_init_attr *qp_init_attr) +{ + int err; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + // TODO: Do work here + err = 0; + + return err; +} + + +/** + * nes_modify_qp + * + * @param ibqp + * @param attr + * @param attr_mask + * + * @return int + */ +int nes_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask) +{ + u64 u64temp; + struct nes_qp *nesqp = to_nesqp(ibqp); + struct nes_dev *nesdev = to_nesdev(ibqp->device); + struct nes_hw_cqp_wqe *cqp_wqe; + struct iw_cm_id *cm_id = nesqp->cm_id; + struct iw_cm_event cm_event; + u8 abrupt_disconnect = 0; + u32 cqp_head; +// u32 counter; + u32 next_iwarp_state = 0; + int err; + /* TODO: don't need both of these!!! */ + unsigned long flags; + unsigned long qplockflags; + int ret; + u8 issue_modify_qp = 0; + u8 issue_disconnect = 0; + + spin_lock_irqsave(&nesqp->lock, qplockflags); +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + dprintk("%s:QP%u: QP State = %u, cur QP State = %u, iwarp_state = 0x%X. \n", + __FUNCTION__, nesqp->hwqp.qp_id, attr->qp_state, nesqp->ibqp_state, nesqp->iwarp_state); + dprintk("%s:QP%u: QP Access Flags = 0x%X, attr_mask = 0x%0x. \n", + __FUNCTION__, nesqp->hwqp.qp_id, attr->qp_access_flags, attr_mask ); + + + if (attr_mask & IB_QP_STATE) { + switch (attr->qp_state) { + case IB_QPS_INIT: + dprintk("%s:QP%u: new state = init. \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_IDLE) { + /* TODO: Need to add code to handle back from error or closing */ + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + } + next_iwarp_state = NES_CQP_QP_IWARP_STATE_IDLE; + issue_modify_qp = 1; + break; + case IB_QPS_RTR: + dprintk("%s:QP%u: new state = rtr. \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_IDLE) { + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + } + next_iwarp_state = NES_CQP_QP_IWARP_STATE_IDLE; + issue_modify_qp = 1; + break; + case IB_QPS_RTS: + dprintk("%s:QP%u: new state = rts. \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_RTS) { + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + } + next_iwarp_state = NES_CQP_QP_IWARP_STATE_RTS; + if (nesqp->iwarp_state != NES_CQP_QP_IWARP_STATE_RTS) + next_iwarp_state |= NES_CQP_QP_CONTEXT_VALID | NES_CQP_QP_ARP_VALID | NES_CQP_QP_ORD_VALID; + issue_modify_qp = 1; + break; + case IB_QPS_SQD: + dprintk("%s:QP%u: new state = closing. SQ head = %u, SQ tail = %u. \n", + __FUNCTION__, nesqp->hwqp.qp_id, nesqp->hwqp.sq_head, nesqp->hwqp.sq_tail ); + if (nesqp->iwarp_state==(u32)NES_CQP_QP_IWARP_STATE_CLOSING) { + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return 0; + } else if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_CLOSING) { + dprintk("%s:QP%u: State change to closing ignored due to current iWARP state. \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + } + next_iwarp_state = NES_CQP_QP_IWARP_STATE_CLOSING; + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_RTS){ + issue_disconnect = 1; + } else + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_IDLE) { + /* Free up the connect_worker thread if needed */ + if (nesqp->ksock) { + nes_sock_release( nesqp, &qplockflags ); + } + } + break; + case IB_QPS_SQE: + dprintk("%s:QP%u: new state = terminate. \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->iwarp_state>=(u32)NES_CQP_QP_IWARP_STATE_TERMINATE) { + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + } + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_RTS){ + issue_disconnect = 1; + abrupt_disconnect = 1; + } + next_iwarp_state = NES_CQP_QP_IWARP_STATE_TERMINATE; + issue_modify_qp = 1; + break; + case IB_QPS_ERR: + case IB_QPS_RESET: + if (nesqp->iwarp_state==(u32)NES_CQP_QP_IWARP_STATE_ERROR) { + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + } + dprintk("%s:QP%u: new state = error. \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + next_iwarp_state = NES_CQP_QP_IWARP_STATE_ERROR; + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_RTS){ + issue_disconnect = 1; + } + issue_modify_qp = 1; + break; + default: + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + return -EINVAL; + break; + } + + /* TODO: Do state checks */ + + nesqp->ibqp_state = attr->qp_state; + if ( ((nesqp->iwarp_state & NES_CQP_QP_IWARP_STATE_MASK) == (u32)NES_CQP_QP_IWARP_STATE_RTS) && + ((next_iwarp_state & NES_CQP_QP_IWARP_STATE_MASK) > (u32)NES_CQP_QP_IWARP_STATE_RTS)) { + nesqp->iwarp_state = next_iwarp_state & NES_CQP_QP_IWARP_STATE_MASK; + issue_disconnect = 1; + } else + nesqp->iwarp_state = next_iwarp_state & NES_CQP_QP_IWARP_STATE_MASK; + /* TODO: nesqp->iwarp_state vs.next_iwarp_state */ + } + + if (attr_mask & IB_QP_ACCESS_FLAGS) { + if (attr->qp_access_flags & IB_ACCESS_LOCAL_WRITE) { + /* TODO: had to add rdma read here for user mode access, doesn't seem quite correct */ + /* actually, might need to remove rdma write here too */ + nesqp->nesqp_context->misc |= NES_QPCONTEXT_MISC_RDMA_WRITE_EN | NES_QPCONTEXT_MISC_RDMA_READ_EN; + issue_modify_qp = 1; + } + if (attr->qp_access_flags & IB_ACCESS_REMOTE_WRITE) { + nesqp->nesqp_context->misc |= NES_QPCONTEXT_MISC_RDMA_WRITE_EN; + issue_modify_qp = 1; + } + if (attr->qp_access_flags & IB_ACCESS_REMOTE_READ) { + nesqp->nesqp_context->misc |= NES_QPCONTEXT_MISC_RDMA_READ_EN; + issue_modify_qp = 1; + } + if (attr->qp_access_flags & IB_ACCESS_MW_BIND) { + nesqp->nesqp_context->misc |= NES_QPCONTEXT_MISC_WBIND_EN; + issue_modify_qp = 1; + } + } + + if (issue_disconnect) + { + dprintk("%s:QP%u: Issuing Disconnect.\n", __FUNCTION__, nesqp->hwqp.qp_id ); + } + spin_unlock_irqrestore(&nesqp->lock, qplockflags); + if (issue_disconnect) + { + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_UPLOAD_CONTEXT | NES_CQP_QP_TYPE_IWARP); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(nesqp->hwqp.qp_id); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + u64temp = (u64)nesqp->nesqp_context_pbase; + cqp_wqe->wqe_words[NES_CQP_UPLOAD_WQE_CTXT_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_UPLOAD_WQE_CTXT_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + /* TODO: this value should already be swapped? */ + cqp_wqe->wqe_words[NES_CQP_UPLOAD_WQE_HTE_IDX] = nesqp->nesqp_context->hte_index; + + barrier(); + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); +// dprintk("Waiting for modify iWARP QP%u to complete.\n", nesqp->hwqp.qp_id); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + + /* TODO: Catch error code... */ + nes_disconnect(nesqp->cm_id, abrupt_disconnect); + + dprintk("%s:Generating a Close Complete Event (reset) for QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + /* Send up the close complete event */ + cm_event.event = IW_CM_EVENT_CLOSE; + cm_event.status = IW_CM_EVENT_STATUS_OK; + cm_event.provider_data = cm_id->provider_data; + cm_event.local_addr = cm_id->local_addr; + cm_event.remote_addr = cm_id->remote_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + + cm_id->event_handler(cm_id, &cm_event); + + } + + if (issue_modify_qp) { + spin_lock_irqsave(&nesdev->cqp.lock, flags); + + cqp_head = nesdev->cqp.sq_head++; + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_MODIFY_QP | NES_CQP_QP_TYPE_IWARP | next_iwarp_state; + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nesqp->hwqp.qp_id; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + u64temp = (u64)nesqp->nesqp_context_pbase; + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = (u32)u64temp; + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = (u32)(u64temp>>32); + + barrier(); + // Ring doorbell (1 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | nesdev->cqp.qp_id ); + + /* Wait for CQP */ + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); +// dprintk("Waiting for modify iWARP QP%u to complete.\n", nesqp->hwqp.qp_id); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Modify iwarp QP%u completed, wait_event_timeout ret = %u, nesdev->cqp.sq_head = %u nesdev->cqp.sq_tail = %u.\n", + nesqp->hwqp.qp_id, ret, nesdev->cqp.sq_head, nesdev->cqp.sq_tail); + /* TODO: Catch error code... */ + } + + err = 0; + + return err; +} + + +/** + * nes_muticast_attach + * + * @param ibqp + * @param gid + * @param lid + * + * @return int + */ +static int nes_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return -ENOSYS; +} + + +/** + * nes_multicast_detach + * + * @param ibqp + * @param gid + * @param lid + * + * @return int + */ +static int nes_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return -ENOSYS; +} + + +/** + * nes_process_mad + * + * @param ibdev + * @param mad_flags + * @param port_num + * @param in_wc + * @param in_grh + * @param in_mad + * @param out_mad + * + * @return int + */ +static int nes_process_mad(struct ib_device *ibdev, + int mad_flags, + u8 port_num, + struct ib_wc *in_wc, + struct ib_grh *in_grh, + struct ib_mad *in_mad, struct ib_mad *out_mad) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return -ENOSYS; +} + + +/** + * nes_post_send + * + * @param ibqp + * @param ib_wr + * @param bad_wr + * + * @return int + */ +static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, + struct ib_send_wr **bad_wr) +{ + struct nes_dev *nesdev = to_nesdev(ibqp->device); + struct nes_qp *nesqp = to_nesqp(ibqp); + u32 qsize = nesqp->hwqp.sq_size; + struct nes_hw_qp_wqe *wqe; + unsigned long flags = 0; + u32 head; + int err = 0; + u32 wqe_count = 0; + u32 counter; + int sge_index; + u32 total_payload_length; + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + if (nesqp->ibqp_state > IB_QPS_RTS) + return -EINVAL; + + spin_lock_irqsave(&nesqp->lock, flags); + + head = nesqp->hwqp.sq_head; + + while (ib_wr) { + /* Check for SQ overflow */ + if (((head + (2 * qsize) - nesqp->hwqp.sq_tail) % qsize) == (qsize - 1)) { + err = -EINVAL; + break; + } + + wqe = &nesqp->hwqp.sq_vbase[head]; +// dprintk("%s:processing sq wqe at %p, head = %u.\n", __FUNCTION__, wqe, head); + *((u64 *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_SCRATCH_LOW_IDX]) = ib_wr->wr_id; + *((struct nes_qp **)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; + wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX] |= head; + + switch (ib_wr->opcode) { + case IB_WR_SEND: + if (ib_wr->send_flags & IB_SEND_SOLICITED) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_SENDSE; + } else { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_SEND; + } + if (ib_wr->num_sge > nesdev->nesadapter->max_sge) { + err = -EINVAL; + break; + } + if (ib_wr->send_flags & IB_SEND_FENCE) { + /* TODO: is IB Send Fence local or RDMA read? */ + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_LOCAL_FENCE; + } + total_payload_length = 0; + for (sge_index=0; sge_index < ib_wr->num_sge; sge_index++) { + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = cpu_to_le32((u32)ib_wr->sg_list[sge_index].addr); + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = cpu_to_le32((u32)(ib_wr->sg_list[sge_index].addr>>32)); + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX+(sge_index*4)] = cpu_to_le32(ib_wr->sg_list[sge_index].length); + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX+(sge_index*4)] = cpu_to_le32(ib_wr->sg_list[sge_index].lkey); + total_payload_length += ib_wr->sg_list[sge_index].length; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = cpu_to_le32(total_payload_length); + nesqp->bytes_sent += total_payload_length; + if (nesqp->bytes_sent > NES_MAX_SQ_PAYLOAD_SIZE) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_READ_FENCE; + nesqp->bytes_sent = 0; + } + break; + case IB_WR_RDMA_WRITE: + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_RDMAW; + if (ib_wr->num_sge > nesdev->nesadapter->max_sge) { + err = -EINVAL; + break; + } + if (ib_wr->send_flags & IB_SEND_FENCE) { + /* TODO: is IB Send Fence local or RDMA read? */ + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_LOCAL_FENCE; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_STAG_IDX] = cpu_to_le32(ib_wr->wr.rdma.rkey); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX] = cpu_to_le32(ib_wr->wr.rdma.remote_addr); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX] = cpu_to_le32((u32)(ib_wr->wr.rdma.remote_addr>>32)); + total_payload_length = 0; + for (sge_index=0; sge_index < ib_wr->num_sge; sge_index++) { + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = cpu_to_le32((u32)ib_wr->sg_list[sge_index].addr); + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = cpu_to_le32((u32)(ib_wr->sg_list[sge_index].addr>>32)); + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX+(sge_index*4)] = cpu_to_le32(ib_wr->sg_list[sge_index].length); + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX+(sge_index*4)] = cpu_to_le32(ib_wr->sg_list[sge_index].lkey); + total_payload_length += ib_wr->sg_list[sge_index].length; + } + /* TODO: handle multiple fragments, switch to loop on structure */ + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = cpu_to_le32(total_payload_length); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX] = wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX]; + nesqp->bytes_sent += total_payload_length; + if (nesqp->bytes_sent > NES_MAX_SQ_PAYLOAD_SIZE) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_READ_FENCE; + nesqp->bytes_sent = 0; + } + break; + case IB_WR_RDMA_READ: + /* IWarp only supports 1 sge for RDMA reads */ + if (ib_wr->num_sge > 1) { + err = -EINVAL; + break; + } + /* TODO: what about fences... */ + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_RDMAR; + + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX] = cpu_to_le32(ib_wr->wr.rdma.remote_addr); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX] = cpu_to_le32((u32)(ib_wr->wr.rdma.remote_addr>>32)); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_STAG_IDX] = cpu_to_le32(ib_wr->wr.rdma.rkey); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX] = cpu_to_le32(ib_wr->sg_list->length); + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32(ib_wr->sg_list->addr); + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX] = cpu_to_le32((u32)(ib_wr->sg_list->addr>>32)); + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX] = cpu_to_le32(ib_wr->sg_list->lkey); + break; + default: + /* error */ + err = -EINVAL; + break; + } + + if (ib_wr->send_flags & IB_SEND_SIGNALED) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_SIGNALED_COMPL; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = cpu_to_le32(wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX]); + + ib_wr = ib_wr->next; + head++; + wqe_count++; + if (head >= qsize) + head = 0; + + } + + nesqp->hwqp.sq_head = head; + barrier(); + while (wqe_count) { + counter = min(wqe_count, ((u32)255)); + wqe_count -= counter; + /* TODO: switch to using doorbell region */ + nes_write32(nesdev->regs + NES_WQE_ALLOC, (counter << 24) | 0x00800000 | nesqp->hwqp.qp_id); + } + + spin_unlock_irqrestore(&nesqp->lock, flags); + + if (err) + *bad_wr = ib_wr; + return (err); +} + + +/** + * nes_post_recv + * + * @param ibqp + * @param ib_wr + * @param bad_wr + * + * @return int + */ +static int nes_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, + struct ib_recv_wr **bad_wr) +{ + struct nes_dev *nesdev = to_nesdev(ibqp->device); + struct nes_qp *nesqp = to_nesqp(ibqp); + u32 qsize = nesqp->hwqp.rq_size; + struct nes_hw_qp_wqe *wqe; + unsigned long flags = 0; + u32 head; + int err = 0; + u32 wqe_count = 0; + u32 counter; + int sge_index; + u32 total_payload_length; + + // dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + if (nesqp->ibqp_state > IB_QPS_RTS) + return -EINVAL; + + spin_lock_irqsave(&nesqp->lock, flags); + + head = nesqp->hwqp.rq_head; + + while (ib_wr) { + if (ib_wr->num_sge > nesdev->nesadapter->max_sge) { + err = -EINVAL; + break; + } + /* Check for RQ overflow */ + if (((head + (2 * qsize) - nesqp->hwqp.rq_tail) % qsize) == (qsize - 1)) { + err = -EINVAL; + break; + } + +// dprintk("%s: ibwr sge count = %u.\n", __FUNCTION__, ib_wr->num_sge); + wqe = &nesqp->hwqp.rq_vbase[head]; +// dprintk("%s:QP%u:processing rq wqe at %p, head = %u.\n", __FUNCTION__, nesqp->hwqp.qp_id, wqe, head); + *((u64 *)&wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_SCRATCH_LOW_IDX]) = ib_wr->wr_id; + *((struct nes_qp **)&wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; + wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX] |= head; + + total_payload_length = 0; + for (sge_index=0; sge_index < ib_wr->num_sge; sge_index++) { + wqe->wqe_words[NES_IWARP_RQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = cpu_to_le32((u32)ib_wr->sg_list[sge_index].addr); + wqe->wqe_words[NES_IWARP_RQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = cpu_to_le32((u32)(ib_wr->sg_list[sge_index].addr>>32)); + wqe->wqe_words[NES_IWARP_RQ_WQE_LENGTH0_IDX+(sge_index*4)] = cpu_to_le32(ib_wr->sg_list[sge_index].length); + wqe->wqe_words[NES_IWARP_RQ_WQE_STAG0_IDX+(sge_index*4)] = cpu_to_le32(ib_wr->sg_list[sge_index].lkey); + total_payload_length += ib_wr->sg_list->length; + } + wqe->wqe_words[NES_IWARP_RQ_WQE_TOTAL_PAYLOAD_IDX] = cpu_to_le32(total_payload_length); + + ib_wr = ib_wr->next; + head++; + wqe_count++; + if (head >= qsize) + head = 0; + } + + nesqp->hwqp.rq_head = head; + barrier(); + while (wqe_count) { + counter = min(wqe_count, ((u32)255)); + wqe_count -= counter; + /* TODO: switch to using doorbell region */ + nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter<<24) | nesqp->hwqp.qp_id ); + } + + spin_unlock_irqrestore(&nesqp->lock, flags); + + if (err) + *bad_wr = ib_wr; + return err; +} + + +/** + * nes_poll_cq + * + * @param ibcq + * @param num_entries + * @param entry + * + * @return int + */ +static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) +{ + u64 wrid; +// u64 u64temp; + struct nes_dev *nesdev = to_nesdev(ibcq->device); + struct nes_cq *nescq = to_nescq(ibcq); + struct nes_qp *nesqp; + struct nes_hw_cqe cqe; + unsigned long flags = 0; + u32 head; + u32 wq_tail; + u32 cq_size; + u32 cqe_count=0; + u32 wqe_index; +// u32 counter; + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + spin_lock_irqsave(&nescq->lock, flags); + + head = nescq->hw_cq.cq_head; + cq_size = nescq->hw_cq.cq_size; +// dprintk("%s: Polling CQ%u (head = %u, size = %u).\n", __FUNCTION__, +// nescq->hw_cq.cq_number, head, cq_size); + + while (cqe_counthw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] & NES_CQE_VALID) { + /* TODO: determine if this copy of the cqe actually helps since cq is volatile */ + cqe = nescq->hw_cq.cq_vbase[head]; + nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0; + /* TODO: need to add code to check for magic bit (0x200) and ignore */ + wqe_index = cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]&(nesdev->nesadapter->max_qp _wr - 1); + cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX] &= ~(NES_SW_CONTEXT_ALIGN-1); + barrier(); + /* parse CQE, get completion context from WQE (either rq or sq */ + nesqp = *((struct nes_qp **)&cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]); + memset(entry, 0, sizeof *entry); + entry->status = IB_WC_SUCCESS; + entry->qp_num = nesqp->hwqp.qp_id; + entry->src_qp = nesqp->hwqp.qp_id; + + if (cqe.cqe_words[NES_CQE_OPCODE_IDX] & NES_CQE_SQ) { + if (nesqp->skip_lsmm) + { + nesqp->skip_lsmm = 0; + wq_tail = nesqp->hwqp.sq_tail++; + } + + /* Working on a SQ Completion*/ + /* TODO: get the wr head from the completion after proper alignment of nesqp */ + wq_tail = wqe_index; + nesqp->hwqp.sq_tail = (wqe_index+1)&(nesqp->hwqp.sq_size - 1); + wrid = *((u64 *)&nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_COMP_SCRATCH _LOW_IDX]); + entry->byte_len = nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_I DX]; + + switch (nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_MISC_IDX]&0x3f ) { + case NES_IWARP_SQ_OP_RDMAW: +// dprintk("%s: Operation = RDMA WRITE.\n", __FUNCTION__ ); + entry->opcode = IB_WC_RDMA_WRITE; + break; + case NES_IWARP_SQ_OP_RDMAR: +// dprintk("%s: Operation = RDMA READ.\n", __FUNCTION__ ); + entry->opcode = IB_WC_RDMA_READ; + entry->byte_len = nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX ]; + break; + case NES_IWARP_SQ_OP_SENDINV: + case NES_IWARP_SQ_OP_SENDSEINV: + case NES_IWARP_SQ_OP_SEND: + case NES_IWARP_SQ_OP_SENDSE: +// dprintk("%s: Operation = Send.\n", __FUNCTION__ ); + entry->opcode = IB_WC_SEND; + break; + } + } else { + /* Working on a RQ Completion*/ + wq_tail = wqe_index; + nesqp->hwqp.rq_tail = (wqe_index+1)&(nesqp->hwqp.rq_size - 1); + entry->byte_len = le32_to_cpu(cqe.cqe_words[NES_CQE_PAYLOAD_LENGTH_IDX]); + entry->byte_len = le32_to_cpu(cqe.cqe_words[NES_CQE_PAYLOAD_LENGTH_IDX]); + wrid = *((u64 *)&nesqp->hwqp.rq_vbase[wq_tail].wqe_words[NES_IWARP_RQ_WQE_COMP_SCRATCH _LOW_IDX]); + entry->opcode = IB_WC_RECV; + } + /* TODO: report errors */ + entry->wr_id = wrid; + + if (++head >= cq_size) + head = 0; + cqe_count++; + nescq->polled_completions++; + /* TODO: find a better number...if there is one */ + if ((nescq->polled_completions>(cq_size/2)) || (nescq->polled_completions==255)) { + dprintk("%s: CQ%u Issuing CQE Allocate since more than half of cqes are pending %u of %u.\n", + __FUNCTION__, nescq->hw_cq.cq_number ,nescq->polled_completions, cq_size); + nes_write32(nesdev->regs+NES_CQE_ALLOC, nescq->hw_cq.cq_number | (nescq->polled_completions << 16) ); + nescq->polled_completions = 0; + } + entry++; + } else + break; + } + + if (nescq->polled_completions) { +// dprintk("%s: CQ%u Issuing CQE Allocate for %u cqes.\n", +// __FUNCTION__, nescq->hw_cq.cq_number ,nescq->polled_completions); + nes_write32(nesdev->regs+NES_CQE_ALLOC, nescq->hw_cq.cq_number | (nescq->polled_completions << 16) ); + nescq->polled_completions = 0; + } + + /* TODO: Add code to check if overflow checking is on, if so write CQE_ALLOC with remaining CQEs here or overflow + could occur */ + + nescq->hw_cq.cq_head = head; +// dprintk("%s: Reporting %u completions for CQ%u.\n", __FUNCTION__, cqe_count, nescq->hw_cq.cq_number); + + spin_unlock_irqrestore(&nescq->lock, flags); + + return cqe_count; +} + + +/** + * nes_req_notify_cq + * + * @param ibcq + * @param notify + * + * @return int + */ +static int nes_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +{ + struct nes_dev *nesdev = to_nesdev(ibcq->device); + struct nes_cq *nescq = to_nescq(ibcq); + u32 cq_arm; + +// dprintk("%s: Requesting notification for CQ%u.\n", __FUNCTION__, nescq->hw_cq.cq_number); + cq_arm = nescq->hw_cq.cq_number; + if (notify == IB_CQ_NEXT_COMP) + cq_arm |= NES_CQE_ALLOC_NOTIFY_NEXT; + else if (notify == IB_CQ_SOLICITED) + cq_arm |= NES_CQE_ALLOC_NOTIFY_SE; + else + return -EINVAL; + +// dprintk("%s: Arming CQ%u, command = 0x%08X.\n", __FUNCTION__, nescq->hw_cq.cq_number, cq_arm); + nes_write32(nesdev->regs+NES_CQE_ALLOC, cq_arm ); + + return 0; +} + + +/** + * nes_register_device + * + * @param nesdev + * + * @return int + */ +int nes_register_device(struct nes_dev *nesdev) +{ + int ret; + int i; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + strlcpy(nesdev->ibdev.name, "nes%d", IB_DEVICE_NAME_MAX); + nesdev->ibdev.owner = THIS_MODULE; + + nesdev->ibdev.node_type = RDMA_NODE_RNIC; + memset(&nesdev->ibdev.node_guid, 0, sizeof(nesdev->ibdev.node_guid)); + memcpy(&nesdev->ibdev.node_guid, nesdev->netdev->dev_addr, 6); + nesdev->nesadapter->device_cap_flags = + (IB_DEVICE_ZERO_STAG | IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW); + + nesdev->ibdev.uverbs_cmd_mask = + (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) | + (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) | + (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | + (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_REG_MR) | + (1ull << IB_USER_VERBS_CMD_DEREG_MR) | + (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | + (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | + (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | + (1ull << IB_USER_VERBS_CMD_CREATE_AH) | + (1ull << IB_USER_VERBS_CMD_DESTROY_AH) | + (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) | + (1ull << IB_USER_VERBS_CMD_CREATE_QP) | + (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | + (1ull << IB_USER_VERBS_CMD_POLL_CQ) | + (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | + (1ull << IB_USER_VERBS_CMD_POST_SEND) | + (1ull << IB_USER_VERBS_CMD_POST_RECV); + + nesdev->ibdev.phys_port_cnt = 1; + nesdev->ibdev.dma_device = &nesdev->pcidev->dev; + nesdev->ibdev.class_dev.dev = &nesdev->pcidev->dev; + nesdev->ibdev.query_device = nes_query_device; + nesdev->ibdev.query_port = nes_query_port; + nesdev->ibdev.modify_port = nes_modify_port; + nesdev->ibdev.query_pkey = nes_query_pkey; + nesdev->ibdev.query_gid = nes_query_gid; + nesdev->ibdev.alloc_ucontext = nes_alloc_ucontext; + nesdev->ibdev.dealloc_ucontext = nes_dealloc_ucontext; + nesdev->ibdev.mmap = nes_mmap; + nesdev->ibdev.alloc_pd = nes_alloc_pd; + nesdev->ibdev.dealloc_pd = nes_dealloc_pd; + nesdev->ibdev.create_ah = nes_create_ah; + nesdev->ibdev.destroy_ah = nes_destroy_ah; + nesdev->ibdev.create_qp = nes_create_qp; + nesdev->ibdev.modify_qp = nes_modify_qp; + nesdev->ibdev.query_qp = nes_query_qp; + nesdev->ibdev.destroy_qp = nes_destroy_qp; + nesdev->ibdev.create_cq = nes_create_cq; + nesdev->ibdev.destroy_cq = nes_destroy_cq; + nesdev->ibdev.poll_cq = nes_poll_cq; + nesdev->ibdev.get_dma_mr = nes_get_dma_mr; + nesdev->ibdev.reg_phys_mr = nes_reg_phys_mr; + nesdev->ibdev.reg_user_mr = nes_reg_user_mr; + nesdev->ibdev.dereg_mr = nes_dereg_mr; + + nesdev->ibdev.alloc_fmr = 0; + nesdev->ibdev.unmap_fmr = 0; + nesdev->ibdev.dealloc_fmr = 0; + nesdev->ibdev.map_phys_fmr = 0; + + nesdev->ibdev.attach_mcast = nes_multicast_attach; + nesdev->ibdev.detach_mcast = nes_multicast_detach; + nesdev->ibdev.process_mad = nes_process_mad; + + nesdev->ibdev.req_notify_cq = nes_req_notify_cq; + nesdev->ibdev.post_send = nes_post_send; + nesdev->ibdev.post_recv = nes_post_recv; + + nesdev->ibdev.iwcm = kmalloc(sizeof(*nesdev->ibdev.iwcm), GFP_KERNEL); + if (nesdev->ibdev.iwcm == NULL) { + return (-ENOMEM); + } + nesdev->ibdev.iwcm->add_ref = nes_add_ref; + nesdev->ibdev.iwcm->rem_ref = nes_rem_ref; + nesdev->ibdev.iwcm->get_qp = nes_get_qp; + nesdev->ibdev.iwcm->connect = nes_connect; + nesdev->ibdev.iwcm->accept = nes_accept; + nesdev->ibdev.iwcm->reject = nes_reject; + nesdev->ibdev.iwcm->create_listen = nes_create_listen; + nesdev->ibdev.iwcm->destroy_listen = nes_destroy_listen; + + dprintk("&nes_dev=0x%p : &nes->ibdev = 0x%p: %s : %u\n", nesdev, &nesdev->ibdev, + __FUNCTION__, __LINE__); + + ret = ib_register_device(&nesdev->ibdev); + if (ret) { + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + return ret; + } + + for (i = 0; i < ARRAY_SIZE(nes_class_attributes); ++i) { + ret = class_device_create_file(&nesdev->ibdev.class_dev, nes_class_attributes[i]); + if (ret) { + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + ib_unregister_device(&nesdev->ibdev); + return ret; + } + } + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + return 0; +} + + +/** + * nes_unregister_device + * + * @param nesdev + */ +void nes_unregister_device(struct nes_dev *nesdev) +{ + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + ib_unregister_device(&nesdev->ibdev); +} From ggrundstrom at NetEffect.com Thu Oct 26 17:33:13 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:33:13 -0500 Subject: [openib-general] [PATCH 9/9] NetEffect 10Gb RNIC Driver: openfabrics verbs header file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EC7@venom2> Kernel driver patch 9 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_verbs.h new/drivers/infiniband/hw/nes/nes_verbs.h --- old/drivers/infiniband/hw/nes/nes_verbs.h 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_verbs.h 2006-10-25 10:15:52.000000000 -0500 @@ -0,0 +1,144 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef NES_VERBS_H +#define NES_VERBS_H + +struct nes_dev; + +#define NES_MAX_USER_DB_REGIONS 4096 +#define NES_MAX_USER_WQ_REGIONS 4096 + +struct nes_ucontext { + struct ib_ucontext ibucontext; + struct nes_dev *nesdev; + /* need to track mmapped areas, start with bit vector? */ + unsigned long mmap_wq_offset; + unsigned long mmap_cq_offset; /* to be removed */ + int index; /* rnic index (minor) */ + unsigned long allocated_doorbells[BITS_TO_LONGS(NES_MAX_USER_DB_REGIONS)]; + u16 mmap_db_index[NES_MAX_USER_DB_REGIONS]; + u16 first_free_db; + unsigned long allocated_wqs[BITS_TO_LONGS(NES_MAX_USER_WQ_REGIONS)]; + struct nes_qp * mmap_nesqp[NES_MAX_USER_WQ_REGIONS]; + u16 first_free_wq; + struct list_head cq_reg_mem_list; +}; + +struct nes_pd { + struct ib_pd ibpd; + u16 pd_id; + atomic_t sqp_count; + u16 mmap_db_index; +}; + +struct nes_mr { + struct ib_mr ibmr; + u16 pbls_used; + u8 mode; + u8 pbl_4k; +}; + +struct nes_hw_pb { + u32 pa_low; + u32 pa_high; +}; + +struct nes_vpbl { + dma_addr_t pbl_pbase; + struct nes_hw_pb *pbl_vbase; +}; + +struct nes_root_vpbl { + dma_addr_t pbl_pbase; + struct nes_hw_pb *pbl_vbase; + struct nes_vpbl *leaf_vpbl; +}; + +struct nes_av; + +struct nes_cq { + struct ib_cq ibcq; + struct nes_hw_cq hw_cq; + u32 polled_completions; + u32 cq_mem_size; + spinlock_t lock; + u8 virtual_cq; + u8 pad[3]; +}; + +struct nes_wq { + spinlock_t lock; +}; + +struct iw_cm_id; + +struct nes_qp { + struct ib_qp ibqp; + enum ib_qp_state ibqp_state; + u32 iwarp_state; + void * allocated_buffer; + struct iw_cm_id *cm_id; + struct workqueue_struct *wq; + struct workqueue_struct *aewq; + struct socket *ksock; + struct nes_cq *nesscq; + struct nes_cq *nesrcq; + struct nes_pd *nespd; + struct ietf_mpa_req_resp_frame *ietf_frame; + dma_addr_t ietf_frame_pbase; + wait_queue_head_t state_waitq; + unsigned long socket; + struct nes_hw_qp hwqp; + struct work_struct work; + struct work_struct ae_work; + u32 hte_index; + u32 last_aeq; + u32 qp_mem_size; + atomic_t refcount; + u32 mmap_sq_db_index; + u32 mmap_rq_db_index; + spinlock_t lock; + /* TODO: should move these two to the hw qp? */ + struct nes_qp_context *nesqp_context; + dma_addr_t nesqp_context_pbase; + u32 bytes_sent; + u16 private_data_len; + u8 active_conn; + u8 skip_lsmm; + u8 user_mode; + u8 hte_added; +}; + +#endif /* NES_VERBS_H */ From rdreier at cisco.com Thu Oct 26 17:33:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 26 Oct 2006 17:33:44 -0700 Subject: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files In-Reply-To: <20061026.171657.43852974.davem@davemloft.net> (David Miller's message of "Thu, 26 Oct 2006 17:16:57 -0700 (PDT)") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBD@venom2> <20061026.171657.43852974.davem@davemloft.net> Message-ID: > I fear this is exactly the kind of stuff that we didn't want > to see start going into the kernel, and we've resisted the > TCP/IP stack offload stuff in the infiniband layer exactly > for this reason. We're definitely not going to merge a second TCP stack in any form. - R. From ggrundstrom at NetEffect.com Thu Oct 26 17:09:39 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:09:39 -0500 Subject: [openib-general] [PATCH 5/9] NetEffect 10Gb RNIC Driver: hardware interface c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EBC@venom2> Kernel driver patch 5 of 9. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/drivers/infiniband/hw/nes/nes_hw.c new/drivers/infiniband/hw/nes/nes_hw.c --- old/drivers/infiniband/hw/nes/nes_hw.c 1969-12-31 18:00:00.000000000 -0600 +++ new/drivers/infiniband/hw/nes/nes_hw.c 2006-10-25 10:15:50.000000000 -0500 @@ -0,0 +1,1470 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include +#include +#include +#include + +#include "nes.h" + + +#if defined(SA1) +struct nes_init_values init_values[] = +{ + {0x00000600,0x55555555}, + {0x00000604,0x55555555}, + {0x00002000,0x00000001}, + {0x00002004,0x00000001}, + {0x00002008,0x0000FFFF}, + {0x0000200C,0x00000001}, + {0x00002010,0x00000241}, + {0x0000201C,0x75345678}, + {0x00005100,0x00000008}, + {0x00006000,0x000000e0}, + {0x00006008,0x000000e0}, +// {0x00006018,0x00000001}, +// {0x00006028,0x00000001}, + {0x00006038,0x00000003}, + {0x000060B8,0x00000002}, + {0x00006090,0xFFFFFFFF}, + {0x00000900,0x20000001}, +// {0x000001E8,0x000208c2}, + {0x000001E8,0x000208c4}, + {0x000001EC,0x5f1e8480}, + {0x000001FC,0x00050005}, + {0x00000B00,0x00001000}, + {0x000010C8,0x00000003}, + {0x00005008,0x1F1F1F1F}, + {0x00005010,0x1F1F1F1F}, + {0x00005018,0x1F1F1F1F}, + {0x00005020,0x1F1F1F1F}, +// {0x000060B8,0x00000001}, + {0x000060C0,0x00000194}, + {0x000060C8,0x00000020}, + {0x00000000,0x00000000} +}; +#endif + + +/** + * nes_adapter_init - initialize adapter + * + * @param nesdev + * @param num_pds + * + * @return struct nes_adapter* + */ +struct nes_adapter *nes_adapter_init(struct nes_dev *nesdev, unsigned long num_pds) { + struct nes_adapter *nesadapter = NULL; + int i=0; + int found = 0; + u32 u32temp; + u16 max_rq_wrs; + u16 max_sq_wrs; + u32 max_mr; + u32 max_256pbl; + u32 max_4kpbl; + u32 max_qp; + u32 max_irrq; + u32 max_cq; + u32 hte_index_mask; + u32 adapter_size; + u32 arp_table_size; + + /* search the list of existing adapters */ + list_for_each_entry(nesadapter, &nes_adapter_list, list) { + dprintk("Searching Adapter list for PCI devfn = 0x%X.\n", nesdev->pcidev->devfn); + if ((PCI_SLOT(nesadapter->devfn) == PCI_SLOT(nesdev->pcidev->devfn)) && + (nesadapter->bus_number == nesdev->pcidev->bus->number)) { + found = 1; + break; + } + } + + if (!found) { + if (nes_read_indexed(nesdev->index_reg, + NES_IDX_QP_CONTROL+PCI_FUNC(nesdev->pcidev->devfn)*8)) { + nes_write32(nesdev->regs+NES_SOFTWARE_RESET, 0xd); + } + /* enable the ports */ + nes_write32(nesdev->regs+NES_SOFTWARE_RESET, 0); + + u32temp = 0; + while ( nes_read_indexed(nesdev->index_reg, + NES_IDX_INT_CPU_STATUS) != 0x80 ) { + if (u32temp++ > 10000) break; + mdelay(1); + } + + if (nes_read_indexed(nesdev->index_reg, NES_IDX_INT_CPU_STATUS) != 0x80) { + printk(KERN_ERR PFX "Internal CPU not ready, status = %02X\n", + nes_read_indexed(nesdev->index_reg, NES_IDX_INT_CPU_STATUS) ); + return NULL; + } + + while ( init_values[i].index != 0 ) { + nes_write_indexed(nesdev->index_reg, + init_values[i].index, init_values[i].data); + i++; + } + + nes_write_indexed(nesdev->index_reg, NES_IDX_GPIO_CONTROL, 0x00000070); + nes_write_indexed(nesdev->index_reg, NES_IDX_GPIO_DATA, 0); + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_TCP_TIMER_SIZE0); + u32temp &= 0xf0f0f0f0; + u32temp |= 0x05050105; + u32temp &= 0xffffff0f; + u32temp |= 0x00000090; + nes_write_indexed(nesdev->index_reg, NES_IDX_TCP_TIMER_SIZE0, u32temp); + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_TCP_TIMER_SIZE1); + u32temp &= 0x0000f0f0; + u32temp |= 0x00000505; + u32temp &= 0xffff0f0f; + u32temp |= 0x00009090; + nes_write_indexed(nesdev->index_reg, NES_IDX_TCP_TIMER_SIZE1, u32temp); + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_TCP_CONFIG0); + u32temp |= 0x00000001; + nes_write_indexed(nesdev->index_reg, NES_IDX_TCP_CONFIG0, u32temp); + + nes_write_indexed(nesdev->index_reg, NES_IDX_DENALI_CTL_22, 0x00FF0000); + + nesadapter->tick_delta = 2000; + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_TCP_TIMER_CONFIG); + nes_write_indexed(nesdev->index_reg, NES_IDX_TCP_TIMER_CONFIG, + (u32temp&0xff000000)|((nesadapter->tick_delta*1000)&0x00ffffff)); // set to 10ms + + max_qp = nes_read_indexed(nesdev->index_reg, NES_IDX_QP_CTX_SIZE); + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_QUAD_HASH_TABLE_SIZE); + if (max_qp > ((u32)1 << (u32temp & 0x001f))) { + dprintk("Reducing Max QPs to %u due to hash table size. ht size reg = 0x%08X\n", max_qp, u32temp ); + max_qp = (u32)1 << (u32temp & 0x001f); + } + + hte_index_mask = ((u32)1 << ((u32temp & 0x001f)+1))-1; + dprintk("Max QP = %u, hte_index_mask = 0x%08X.\n", max_qp, hte_index_mask); + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_IRRQ_COUNT); + + max_irrq = 1<<(u32temp&0x001f); + + if (max_qp > max_irrq) { + max_qp = max_irrq; + dprintk("Reducing Max QPs to %u due to Available Q1s.\n", max_qp); + } + + /* there should be no reason to allocate more pds than qps */ + if (num_pds > max_qp) + num_pds = max_qp; + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_MRT_SIZE); + max_mr = (u32)8192 << (u32temp&0x3); + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_PBL_REGION_SIZE); + max_256pbl = (u32)1 << (u32temp & 0x0000001f); + max_4kpbl = (u32)1 << ((u32temp>>16) & 0x0000001f); + max_cq = nes_read_indexed(nesdev->index_reg, NES_IDX_CQ_CTX_SIZE); + + arp_table_size = NES_ARP_TABLE_SIZE; + max_qp -= 5; + arp_table_size -= 5; + + adapter_size = (sizeof(struct nes_adapter)+(sizeof(unsigned long)-1))&(~(sizeof(unsigned long)-1)); + adapter_size += sizeof(unsigned long)*BITS_TO_LONGS(max_qp); + adapter_size += sizeof(unsigned long)*BITS_TO_LONGS(max_mr); + adapter_size += sizeof(unsigned long)*BITS_TO_LONGS(max_cq); + adapter_size += sizeof(unsigned long)*BITS_TO_LONGS(num_pds); + adapter_size += sizeof(unsigned long) * BITS_TO_LONGS(NES_ARP_TABLE_SIZE); + adapter_size += sizeof(struct nes_qp **)*max_qp; + + nesadapter = kzalloc(adapter_size, GFP_KERNEL); + dprintk("Adapter not found, allocating new one @ %p, size = %u (actual size = %u).\n", nesadapter, (u32)sizeof(struct nes_adapter), adapter_size); + if (nesadapter){ + nesadapter->devfn = nesdev->pcidev->devfn; + nesadapter->bus_number = nesdev->pcidev->bus->number; + nesadapter->ref_count = 1; + + nesadapter->max_qp = max_qp; + nesadapter->hte_index_mask = hte_index_mask; + nesadapter->max_irrq = max_irrq; + nesadapter->max_mr = max_mr; + nesadapter->max_256pbl = max_256pbl - 1; + nesadapter->max_4kpbl = max_4kpbl - 1; + nesadapter->max_cq = max_cq; + nesadapter->free_256pbl = max_256pbl-1; + nesadapter->free_4kpbl = max_4kpbl-1; + nesadapter->max_pd = num_pds; + nesadapter->arp_table_size = arp_table_size; + nesadapter->base_pd = 1; + + nesadapter->allocated_qps = (unsigned long *)&(((unsigned char *)nesadapter)[(sizeof(struct nes_adapter)+(sizeof(unsigned long)-1))&(~(sizeof(unsigned long)-1))]); + nesadapter->allocated_cqs = &nesadapter->allocated_qps[BITS_TO_LONGS(max_qp)]; + nesadapter->allocated_mrs = &nesadapter->allocated_cqs[BITS_TO_LONGS(max_cq)]; + nesadapter->allocated_pds = &nesadapter->allocated_mrs[BITS_TO_LONGS(max_mr)]; + nesadapter->allocated_arps = &nesadapter->allocated_pds[BITS_TO_LONGS(num_pds)]; + nesadapter->qp_table = (struct nes_qp **)(&nesadapter->allocated_arps[BITS_TO_LONGS(NES_ARP_TABLE_SIZE)]); + + + /* mark the usual suspect QPs and CQs as in use */ + for (u32temp=0; u32tempallocated_qps); + set_bit(u32temp, nesadapter->allocated_cqs); + } + + u32temp = nes_read_indexed(nesdev->index_reg, NES_IDX_QP_MAX_CFG_SIZES); + + max_rq_wrs = ((u32temp>>8) & 3); + switch (max_rq_wrs) { + case 0: + max_rq_wrs = 4; + break; + case 1: + max_rq_wrs = 16; + break; + case 2: + max_rq_wrs = 32; + break; + case 3: + max_rq_wrs = 512; + break; + } + + max_sq_wrs = (u32temp & 3); + switch (max_sq_wrs) { + case 0: + max_sq_wrs = 4; + break; + case 1: + max_sq_wrs = 16; + break; + case 2: + max_sq_wrs = 32; + break; + case 3: + max_sq_wrs = 512; + break; + } + nesadapter->max_qp_wr = min(max_rq_wrs, max_sq_wrs); + dprintk("Max wqes = %u.\n", nesadapter->max_qp_wr ); + + /* Encoded */ + nesadapter->max_irrq_wr = (u32temp >> 16) & 3; +// dprintk("%s: Max IRRQ wqes = %u.\n", __FUNCTION__, nesadapter->max_irrq_wr ); + + nesadapter->max_sge = 4; + nesadapter->max_cqe = 32767; + + dprintk("%s:Initializing adapter resource lock (%p).\n", + __FUNCTION__, &nesadapter->resource_lock ); + spin_lock_init(&nesadapter->resource_lock); + + list_add_tail(&nesadapter->list, &nes_adapter_list); + i = 0; + + } + }else + nesadapter->ref_count++; + + return nesadapter; +} + + +/** + * nes_cqp_init + * + * @param nesdev + * + * @return int + */ +int nes_cqp_init(struct nes_dev *nesdev) +{ + struct nes_port *nes_port = NULL; + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct net_device *netdev = alloc_etherdev(sizeof(*nes_port)); + struct nes_hw_cqp_qp_context *cqp_qp_context; + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_hw_ceq *ceq; + struct nes_hw_aeq *aeq; + void *mem; + char *lmi_buf; + dma_addr_t lmi_dma_handle; + u32 count=0; + u32 cqp_head; + u64 u64temp; + + // Allocate Memory + nesdev->cqp_mem_size = (sizeof(struct nes_hw_cqp_wqe)*NES_CQP_SQ_SIZE) + /* needs 512 byte alignment */ + (sizeof(struct nes_hw_cqe)*NES_CCQ_SIZE) + /* needs 256 byte alignment */ + (sizeof(struct nes_hw_ceqe)*nesadapter->max_cq) + /* needs 256 byte alignment */ + (sizeof(struct nes_hw_aeqe)*nesadapter->max_qp) + /* needs 256 byte alignment */ + sizeof(struct nes_hw_cqp_qp_context) + /* needs 8 byte alignment */ + 8192; /* this is the masq table */ + + mem = pci_alloc_consistent(nesdev->pcidev, nesdev->cqp_mem_size, + &nesdev->cqp.sq_pbase); + if (!mem) { + dprintk(KERN_ERR PFX "Unable to allocate memory for " + "host descriptor rings\n"); + free_netdev(netdev); + return ERR_PTR(-ENOMEM); + } + dprintk("Allocated CQP structures at %p (phys = %016lX), size = %u.\n", mem, + (unsigned long)nesdev->cqp.sq_pbase, nesdev->cqp_mem_size); + if (((u64)((unsigned)mem))&(512-1)) { + dprintk("CQP SQ base (%p) not 512 byte aligned.\n", mem); + } + memset(mem, 0, nesdev->cqp_mem_size); + + nesdev->mac_index = PCI_FUNC(nesdev->pcidev->devfn); + spin_lock_init(&nesdev->cqp.lock); + init_waitqueue_head( &nesdev->cqp.waitq ); + + // Setup Various Structures + nesdev->cqp.sq_vbase = mem; + nesdev->cqp.sq_size = NES_CQP_SQ_SIZE; + nesdev->cqp.sq_head = 0; + nesdev->cqp.sq_tail = 0; + nesdev->cqp.qp_id = PCI_FUNC(nesdev->pcidev->devfn); + mem += sizeof(struct nes_hw_cqp_wqe)*nesdev->cqp.sq_size; + + nesdev->ccq.cq_vbase = mem; + nesdev->ccq.cq_pbase = nesdev->cqp.sq_pbase + (sizeof(struct nes_hw_cqp_wqe)*nesdev->cqp.sq_size); + nesdev->ccq.cq_size = NES_CCQ_SIZE; + nesdev->ccq.cq_head = 0; + nesdev->ccq.ce_handler = cqp_ce_handler; + nesdev->ccq.cq_number = PCI_FUNC(nesdev->pcidev->devfn); + mem += sizeof(struct nes_hw_cqe)*nesdev->ccq.cq_size; + dprintk("CCQ at %p (phys = %016lX).\n", nesdev->ccq.cq_vbase, (unsigned long)nesdev->ccq.cq_pbase); + + ceq = &nesadapter->ceq[PCI_FUNC(nesdev->pcidev->devfn)]; + ceq->ceq_vbase = mem; + ceq->ceq_pbase = nesdev->ccq.cq_pbase + (sizeof(struct nes_hw_cqe)*nesdev->ccq.cq_size); + ceq->ceq_size = nesadapter->max_cq; + ceq->ceq_head = 0; + mem += sizeof(struct nes_hw_ceqe)*nesadapter->max_cq; + dprintk("CEQ at %p (phys = %016lX).\n", ceq->ceq_vbase, (unsigned long)ceq->ceq_pbase); + + aeq = &nesadapter->aeq[PCI_FUNC(nesdev->pcidev->devfn)]; + aeq->aeq_vbase = mem; + aeq->aeq_pbase = ceq->ceq_pbase + (sizeof(struct nes_hw_ceqe)*nesadapter->max_cq); + aeq->aeq_size = nesadapter->max_qp; + aeq->aeq_head = 0; + mem += sizeof(struct nes_hw_aeqe)*nesadapter->max_qp; + dprintk("AEQ at %p (phys = %016lX).\n", aeq->aeq_vbase, (unsigned long)aeq->aeq_pbase); + + // Setup QP Context + cqp_qp_context = mem; + cqp_qp_context->context_words[0] = (PCI_FUNC(nesdev->pcidev->devfn)<<12) + (1<<10); + cqp_qp_context->context_words[1] = 0; + cqp_qp_context->context_words[2] = (u32)nesdev->cqp.sq_pbase; + cqp_qp_context->context_words[3] = ((u64)nesdev->cqp.sq_pbase)>>32; + mem += sizeof(struct nes_hw_cqp_qp_context); + + nesdev->apbv_table = mem; + memset(nesdev->apbv_table, 0, sizeof(*nesdev->apbv_table)); + + dprintk("Address of CQP Context = %p.\n", cqp_qp_context); + for (count=0;count<4 ; count++ ) { + dprintk("CQP Context, Line %u = %08X.\n", count, cqp_qp_context->context_words[count]); + } + + // Write the address to Create CQP + if ((sizeof(dma_addr_t) > 4)) { + nes_write_indexed(nesdev->index_reg, + NES_IDX_CREATE_CQP_HIGH+(PCI_FUNC(nesdev->pcidev->devfn)*8), + ((u64)aeq->aeq_pbase+(sizeof(struct nes_hw_aeqe)*aeq->aeq_size))>>32); + } else { + nes_write_indexed(nesdev->index_reg, + NES_IDX_CREATE_CQP_HIGH+(PCI_FUNC(nesdev->pcidev->devfn)*8), 0); + } + nes_write_indexed(nesdev->index_reg, + NES_IDX_CREATE_CQP_LOW+(PCI_FUNC(nesdev->pcidev->devfn)*8), + (u32)(aeq->aeq_pbase+(sizeof(struct nes_hw_aeqe)*aeq->aeq_size))); + + dprintk("Address of CQP SQ = %p.\n", nesdev->cqp.sq_vbase); + + lmi_buf = pci_alloc_consistent(nesdev->pcidev, 1024, &lmi_dma_handle); + if (lmi_buf == NULL) { + free_netdev(netdev); + return ERR_PTR(-ENOMEM); + } + + cqp_head = nesdev->cqp.sq_head++; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_LMI_ACCESS + 0x20000000); + cqp_wqe->wqe_words[NES_CQP_LMI_WQE_LMI_OFFSET_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0x01010101; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0x02020202; + u64temp = (u64)lmi_dma_handle; + cqp_wqe->wqe_words[NES_CQP_LMI_WQE_FRAG_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_LMI_WQE_FRAG_HIGH_IDX] = cpu_to_le32((u32)(u64temp >> 32)); + cqp_wqe->wqe_words[NES_CQP_LMI_WQE_FRAG_LEN_IDX] = cpu_to_le32(1024); + + // Write Create CCQ WQE + cqp_head = nesdev->cqp.sq_head++; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_CREATE_CQ | NES_CQP_CQ_CEQ_VALID | + NES_CQP_CQ_CHK_OVERFLOW | (nesdev->ccq.cq_size<<16)); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(PCI_FUNC(nesdev->pcidev->devfn) || + (PCI_FUNC(nesdev->pcidev->devfn)<<16)); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0x03030303; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0x04040404; + u64temp = (u64)nesdev->ccq.cq_pbase; + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX] = 0; + /* TODO: the following 2 lines likely have endian issues */ + *((struct nes_hw_cq **)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) = &nesdev->ccq; + *((u64 *)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) >>= 1; + dprintk("%s: CQ%u context = 0x%08X:0x%08X.\n", __FUNCTION__, nesdev->ccq.cq_number, + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX], + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]); + + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = 0; + + // Write Create CEQ WQE + cqp_head = nesdev->cqp.sq_head++; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_CREATE_CEQ + (PCI_FUNC(nesdev->pcidev->devfn)<<8)); + cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_ELEMENT_COUNT_IDX] = cpu_to_le32(ceq->ceq_size); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0x05050505; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0x06060606; + u64temp = (u64)ceq->ceq_pbase; + cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_PBL_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_PBL_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + + // Write Create AEQ WQE + cqp_head = nesdev->cqp.sq_head++; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_CREATE_AEQ + (PCI_FUNC(nesdev->pcidev->devfn)<<8)); + cqp_wqe->wqe_words[NES_CQP_AEQ_WQE_ELEMENT_COUNT_IDX] = cpu_to_le32(aeq->aeq_size); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0x07070707; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0x08080808; + u64temp = (u64)aeq->aeq_pbase; + cqp_wqe->wqe_words[NES_CQP_AEQ_WQE_PBL_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_AEQ_WQE_PBL_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + + // Poll until CCQP done + // TODO: Cleanup if CQP does not behave + count = 0; + do { + if (count++ > 1000) break; + udelay(10); + } while ( !(nes_read_indexed(nesdev->index_reg, + NES_IDX_QP_CONTROL+ + (PCI_FUNC(nesdev->pcidev->devfn)*8))&(1<<8)) ); + + dprintk("QP Status = 0x%08X bytes\n", + nes_read_indexed(nesdev->index_reg, + NES_IDX_QP_CONTROL+(PCI_FUNC(nesdev->pcidev->devfn)*8))); + + // Ring doorbell (4 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x04800000 | nesdev->cqp.qp_id ); + + /* wait for the CCQ, CEQ, and AEQ to get created */ + count = 0; + do { + if (count++ > 1000) break; + udelay(10); + } while ( ((nes_read_indexed(nesdev->index_reg, + NES_IDX_QP_CONTROL+ + (PCI_FUNC(nesdev->pcidev->devfn)*8)) & (15<<8)) != (15<<8)) ); + + /* dump the QP status value */ + dprintk("QP Status = 0x%08X bytes\n", + nes_read_indexed(nesdev->index_reg, + NES_IDX_QP_CONTROL+(PCI_FUNC(nesdev->pcidev->devfn)*8))); + + pci_free_consistent(nesdev->pcidev, 1024, lmi_buf, lmi_dma_handle); + /* bump head since create CCQ does not generate a completion */ + nesdev->cqp.sq_tail++; + nesdev->cqp.sq_tail++; + + return 0; +} + + +/** + * nes_phy_init + */ +int nes_phy_init(struct nes_dev *nesdev) +{ + struct nes_adapter *nesadapter = nesdev->nesadapter; + u32 u32temp; + u32 counter; + u32 mac_index = nesdev->mac_index; + u16 phy_data; + u16 link_up = 0; + + + dprintk("10G PHY\n"); + + nes_write_indexed(nesdev->index_reg, 0x2004, 0x00000019); + udelay(30); /* do we really need this? */ + + nes_read_10G_phy_reg(nesdev->index_reg, 0, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL ); + dprintk("Phy data from register 0 = 0x%X.\n", phy_data); + + nes_read_10G_phy_reg(nesdev->index_reg, 1, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL ); + dprintk("Phy data from register 1 = 0x%X.\n", phy_data); + + // reset the phy + nes_write_10G_phy_reg(nesdev->index_reg, 0, nesadapter->phy_index[mac_index], 0x8000); + for (counter = 0; counter < 100; counter++) + { + mdelay(1); + nes_read_10G_phy_reg(nesdev->index_reg, 0, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL ); + dprintk("Phy data from register 0 after reset = 0x%X.\n", phy_data); + + if (!(phy_data & 0x8000)) + { + break; + } + } + + // device identifier 1 + nes_read_10G_phy_reg(nesdev->index_reg, 2, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL ); + dprintk("Phy data from register 2 = 0x%X.\n", phy_data); + + // device identifier 2 + nes_read_10G_phy_reg(nesdev->index_reg, 3, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL ); + dprintk("Phy data from register 3 = 0x%X.\n", phy_data); + + // primary channel lanes (0-3) analog transmit configuration + nes_write_10G_phy_reg(nesdev->index_reg, 0xd021, nesadapter->phy_index[mac_index], 0x160a); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd029, nesadapter->phy_index[mac_index], 0x160a); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd031, nesadapter->phy_index[mac_index], 0x160a); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd039, nesadapter->phy_index[mac_index], 0x160a); + + nes_write_10G_phy_reg(nesdev->index_reg, 0xd041, nesadapter->phy_index[mac_index], 0x160a); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd049, nesadapter->phy_index[mac_index], 0x160a); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd051, nesadapter->phy_index[mac_index], 0x160a); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd059, nesadapter->phy_index[mac_index], 0x160a); + + // primary channel lanes (0-3) analog receive configuration + nes_write_10G_phy_reg(nesdev->index_reg, 0xd025, nesadapter->phy_index[mac_index], 0x8201); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd02d, nesadapter->phy_index[mac_index], 0x8201); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd035, nesadapter->phy_index[mac_index], 0x8201); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd03d, nesadapter->phy_index[mac_index], 0x8201); + + nes_write_10G_phy_reg(nesdev->index_reg, 0xd045, nesadapter->phy_index[mac_index], 0x8201); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd04d, nesadapter->phy_index[mac_index], 0x8201); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd055, nesadapter->phy_index[mac_index], 0x8201); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd05d, nesadapter->phy_index[mac_index], 0x8201); + + nes_write_10G_phy_reg(nesdev->index_reg, 0xd023, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd02b, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd033, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd03b, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd043, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd04b, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd053, nesadapter->phy_index[mac_index], 0x0500); + nes_write_10G_phy_reg(nesdev->index_reg, 0xd05b, nesadapter->phy_index[mac_index], 0x0500); + + nes_write_10G_phy_reg(nesdev->index_reg, 0xd000, nesadapter->phy_index[mac_index], 0x0800); + + // master register port status + nes_write_10G_phy_reg(nesdev->index_reg, 0xd00c, nesadapter->phy_index[mac_index], 0x8070); + + // try to let the link come up + nes_read_10G_phy_reg(nesdev->index_reg, 0x18, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL); + dprintk("10G phy data from register 0x18 = 0x%X\n", phy_data); + // clear any faults + nes_read_10G_phy_reg(nesdev->index_reg, 8, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL); + dprintk("10G phy data from register 8 = 0x%X\n", phy_data); + + counter = 0; + do { + msleep(1); + nes_read_10G_phy_reg(nesdev->index_reg, 1, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL ); + } while ( (!(phy_data & 0x0004)) && (counter++<100) ); + + dprintk("10G phy data from register 1 = 0x%X, counter=%d\n", phy_data, counter); + + return 0; +} + + +/** + * nes_nic_qp_init + */ +int nes_nic_qp_init(struct nes_dev *nesdev, struct net_device *netdev) +{ + struct nes_hw_cqp_wqe *cqp_wqe; + struct nes_hw_nic_sq_wqe *nic_sqe; + struct nes_hw_nic_qp_context *nic_context; + struct sk_buff *skb; + struct nes_hw_nic_rq_wqe *nic_rqe; + u8 *virt_address; + unsigned long flags; + dma_addr_t bus_address; + u64 u64temp; + int ret; + u32 cqp_head; + u32 counter; + u32 wqe_count; + + /* Allocate SQ, RQ, CQ, Reuse CEQ based on the PCI function */ + nesdev->nic_mem_size = (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_sq_wqe)) + + (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_rq_wqe)) + + (NES_NIC_WQ_SIZE * 2 * sizeof(struct nes_hw_nic_cqe)) + + (NES_NIC_WQ_SIZE * sizeof(struct nes_first_frag)) + + sizeof(struct nes_hw_nic_qp_context); + dprintk("NIC PCI memory size = %u.\n", nesdev->nic_mem_size); + + /* TODO: check for NULL */ + virt_address = pci_alloc_consistent(nesdev->pcidev, nesdev->nic_mem_size, &bus_address); + + /* Setup the first Fragment buffers */ + nesdev->hnic.first_frag_vbase = (void *)virt_address; + virt_address += NES_NIC_WQ_SIZE * sizeof(struct nes_first_frag); + + for (counter=0; counterhnic.frag_paddr[counter] = bus_address; + bus_address += sizeof(struct nes_first_frag); + } + + /* setup the SQ */ + nesdev->hnic.sq_vbase = (void *)virt_address; + nesdev->hnic.sq_pbase = bus_address; + virt_address += NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_sq_wqe); + bus_address += NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_sq_wqe); + nesdev->hnic.sq_head = 0; + nesdev->hnic.sq_tail = 0; + nesdev->hnic.sq_size = NES_NIC_WQ_SIZE; + nesdev->hnic.qp_id = 16+(PCI_FUNC(nesdev->pcidev->devfn)*2); + for (counter=0; counter<(NES_NIC_WQ_SIZE); counter++) { + nic_sqe = &nesdev->hnic.sq_vbase[counter]; + nic_sqe->wqe_words[NES_NIC_SQ_WQE_MISC_IDX] = NES_NIC_SQ_WQE_DISABLE_CHKSUM | NES_NIC_SQ_WQE_COMPLETION; + nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_0_TAG_IDX] = (u32)NES_FIRST_FRAG_SIZE<<16; + nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX] = (u32)nesdev->hnic.frag_paddr[counter]; + nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX] = (u32)((u64)nesdev->hnic.frag_paddr[counter]>>32); + } + /* TODO: if first frag is usually un-aligned, setup the first frag SGEs here */ + spin_lock_init(&nesdev->hnic.sq_lock); + + /* setup the RQ */ + nesdev->hnic.rq_vbase = (void *)(&nesdev->hnic.sq_vbase[NES_NIC_WQ_SIZE]); + nesdev->hnic.rq_head = 0; + nesdev->hnic.rq_tail = 0; + nesdev->hnic.rq_size = NES_NIC_WQ_SIZE; + nesdev->hnic.rq_pbase = nesdev->hnic.sq_pbase + (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_sq_wqe)); + + /* setup the CQ */ + nesdev->hnic_cq.cq_vbase = (void *)(&nesdev->hnic.rq_vbase[NES_NIC_WQ_SIZE]); + nesdev->hnic_cq.cq_head = 0; + nesdev->hnic_cq.cq_size = NES_NIC_WQ_SIZE*2; + nesdev->hnic_cq.cq_number = nesdev->hnic.qp_id; + nesdev->hnic_cq.ce_handler = nes_hnic_ce_handler; + nesdev->hnic_cq.cq_pbase = nesdev->hnic.rq_pbase + (NES_NIC_WQ_SIZE * sizeof(struct nes_hw_nic_rq_wqe)); + + /* Send CreateCQ request to CQP */ + spin_lock_irqsave(&nesdev->cqp.lock, flags); + cqp_head = nesdev->cqp.sq_head; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_CREATE_CQ | NES_CQP_CQ_CEQ_VALID | + (nesdev->hnic_cq.cq_size << 16); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nesdev->hnic_cq.cq_number | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<16); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + u64temp = (u64)nesdev->hnic_cq.cq_pbase; + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX] = 0; + /* the following two lines likely have endian issues */ + *((struct nes_hw_nic_cq **)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) = &nesdev->hnic_cq; + *((u64 *)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) >>= 1; + dprintk("%s: CQ%u context = 0x%08X:0x%08X.\n", __FUNCTION__, nesdev->hnic_cq.cq_number, + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX], + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]); + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = 0; + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX]); + if (++cqp_head >= nesdev->cqp.sq_size) cqp_head = 0; + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; + + /* Send CreateQP request to CQP */ + nic_context = (void *)(&nesdev->hnic_cq.cq_vbase[nesdev->hnic_cq.cq_size]); + nic_context->context_words[NES_NIC_CTX_MISC_IDX] = cpu_to_le32((u32)NES_NIC_CTX_SIZE | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<12)); + dprintk("NES_NIC_CTX_SIZE = %0x, word0 = %u.\n", NES_NIC_CTX_SIZE, nic_context->context_words[NES_NIC_CTX_MISC_IDX]); + + u64temp = (u64)nesdev->hnic.sq_pbase; + nic_context->context_words[NES_NIC_CTX_SQ_LOW_IDX] = cpu_to_le32((u32)u64temp); + nic_context->context_words[NES_NIC_CTX_SQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + u64temp = (u64)nesdev->hnic.rq_pbase; + nic_context->context_words[NES_NIC_CTX_RQ_LOW_IDX] = cpu_to_le32((u32)u64temp); + nic_context->context_words[NES_NIC_CTX_RQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_CREATE_QP | NES_CQP_QP_TYPE_NIC); + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(nesdev->hnic.qp_id); + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; + *((struct nes_hw_cqp **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = cqp_head; + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; + u64temp = (u64)nesdev->hnic_cq.cq_pbase + (nesdev->hnic_cq.cq_size * sizeof(struct nes_hw_nic_cqe)); + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = cpu_to_le32((u32)u64temp); + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = cpu_to_le32((u32)(u64temp>>32)); + if (++cqp_head >= nesdev->cqp.sq_size) cqp_head = 0; + + nesdev->cqp.sq_head = cqp_head; + barrier(); + + // Ring doorbell (2 WQEs) + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x02800000 | nesdev->cqp.qp_id ); + + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); + dprintk("Waiting for create NIC QP to complete.\n"); + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); + ret = wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), 2); + dprintk("Create NIC QP completed, wait_event_timeout ret = %u.\n", ret); + + /* Populate the RQ */ + for (counter=0; counter<(NES_NIC_WQ_SIZE-1); counter++) { + skb = dev_alloc_skb(max_frame_len); + if (!skb) { + dprintk(KERN_ERR PFX "%s: out of memory for receive\n", netdev->name); + // TODO: Unwind the NIC + return -ENOMEM; + } + + skb->dev = netdev; + + bus_address = pci_map_single(nesdev->pcidev, skb->data, max_frame_len, PCI_DMA_FROMDEVICE); + + nic_rqe = &nesdev->hnic.rq_vbase[counter]; + nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_1_0_IDX] = cpu_to_le32(max_frame_len); + nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_3_2_IDX] = 0; + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32((u32)bus_address); + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX] = cpu_to_le32((u32)((u64)bus_address>>32)); + nesdev->hnic.rx_skb[counter] = skb; + } + + wqe_count = NES_NIC_WQ_SIZE-1; + nesdev->hnic.rq_head = wqe_count-1; + barrier(); + do { + counter = min(wqe_count, ((u32)255)); + wqe_count -= counter; + nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter<<24) | nesdev->hnic.qp_id ); + } while (wqe_count); + + return 0; +} + + +#define MAX_DPC_ITERATIONS 128 +/** + * nes_dpc + * + * @param param + */ +void nes_dpc(unsigned long param) +{ + struct nes_dev *nesdev = (struct nes_dev *) param; + struct nes_adapter *nesadapter = nesdev->nesadapter; + u32 counter; + u32 loop_counter = 0; + u32 int_status_bit; + u32 int_stat; + u32 temp_int_stat; + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + do { + int_stat = nes_read32(nesdev->regs + NES_INT_STAT); + /* Mask off bits for interrupts we are not processing */ + int_stat &= nesdev->int_req; + // dprintk("Interrupt Status (postfilter) = 0x%08X\n", int_stat ); + + if (int_stat) { + /* Ack the interrupts */ + nes_write32(nesdev->regs+NES_INT_STAT, + (int_stat&~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|NES_INT_MAC1|NES_INT _MAC2|NES_INT_MAC3))); + + temp_int_stat = int_stat; + /* Process the CEQs */ + for (counter=0, int_status_bit=1; counter<16 ; counter++) { + if (int_stat & int_status_bit) { + nes_process_ceq(nesdev, &nesadapter->ceq[counter]); + temp_int_stat &= ~int_status_bit; + } + if (!(temp_int_stat & 0x0000ffff)) + break; + int_status_bit <<= 1; + } + + /* Process the AEQ for this pci function */ + int_status_bit = 1<<(16+PCI_FUNC(nesdev->pcidev->devfn)); + if (int_stat & int_status_bit) { + nes_process_aeq(nesdev, &nesadapter->aeq[PCI_FUNC(nesdev->pcidev->devfn)]); + } + + /* Process the MAC interrupt for this pci function */ + int_status_bit = 1<<(24+nesdev->mac_index); + if (int_stat & int_status_bit) { + nes_process_mac_intr(nesdev, nesdev->mac_index); + } + + if (int_stat & NES_INT_TIMER) { + + } + + + if (int_stat & NES_INT_TSW) { + } + } + /* Don't use the interface interrupt bit stay in loop */ + int_stat &= ~NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|NES_INT_MAC1|NES_INT_MAC2|NES_I NT_MAC3; + } while ((int_stat != 0) && (loop_counter++ < MAX_DPC_ITERATIONS)); + + // Enable interrupts + nes_write32(nesdev->regs+NES_INT_MASK, ~nesdev->int_req); +} + + +/** + * nes_process_ceq + * + * @param nesdev + * @param ceq + */ +void nes_process_ceq(struct nes_dev *nesdev, struct nes_hw_ceq *ceq) +{ + u64 u64temp; + struct nes_hw_cq *cq; + u32 head; + u32 ceq_size; + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + head = ceq->ceq_head; + ceq_size = ceq->ceq_size; + + do { + if (le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX]) & NES_CEQE_VALID) { + u64temp = *((u64 *)&ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_LOW_IDX]); + u64temp <<= 1; + cq = *((struct nes_hw_cq **)&u64temp); + barrier(); + /* make the CEQE not valid */ + ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX] = 0; + + /* call the event handler */ + cq->ce_handler(nesdev, cq); + + if (++head >= ceq_size) + head = 0; + } else { + break; + } + } while ( 1 ); + ceq->ceq_head = head; +} + + +/** + * nes_process_aeq + * + * @param nesdev + * @param aeq + */ +void nes_process_aeq(struct nes_dev *nesdev, struct nes_hw_aeq *aeq) +{ + u64 u64temp; + u32 head; + u32 aeq_size; + struct nes_hw_aeqe volatile *aeqe; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + + head = aeq->aeq_head; + aeq_size = aeq->aeq_size; + + do { + aeqe = &aeq->aeq_vbase[head]; + if ((le32_to_cpu(aeqe->aeqe_words[NES_AEQE_MISC_IDX]) & NES_AEQE_VALID) == 0) + break; + aeqe->aeqe_words[NES_AEQE_MISC_IDX] = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_MISC_IDX]); + aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX] = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]); + if (aeqe->aeqe_words[NES_AEQE_MISC_IDX] & (NES_AEQE_QP|NES_AEQE_CQ)) { + if (aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX] >= NES_FIRST_QPN) { + /* dealing with an accelerated QP related AE */ + u64temp = *((u64 *)&aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]); + nes_process_iwarp_aeqe(nesdev, (struct nes_hw_aeqe *)aeqe); + } else { + } + } else if (aeqe->aeqe_words[NES_AEQE_MISC_IDX] & NES_AEQE_CQ) { + /* dealing with a CQ related AE */ + dprintk("%s: Processing CQ realated AE, misc = 0x%04X\n", __FUNCTION__, + (u16)(aeqe->aeqe_words[NES_AEQE_MISC_IDX]>>16)); + } + + aeqe->aeqe_words[NES_AEQE_MISC_IDX] = 0; + + head++; + if (head >= aeq_size) + head = 0; + } + while ( 1 ); + aeq->aeq_head = head; +} + + +/** + * nes_process_mac_intr + * + * @param nesdev + * @param mac_number + */ +void nes_process_mac_intr(struct nes_dev *nesdev, u32 mac_number) +{ + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct nes_port *nes_port = netdev_priv(nesdev->netdev); + u32 mac_status; + u32 mac_index = nesdev->mac_index; + u16 phy_data; + + // ack the MAC interrupt + mac_status = nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_INT_STATUS ); + /* Clear the interrupt */ + nes_write_indexed(nesdev->index_reg, NES_IDX_MAC_INT_STATUS, mac_status ); + + dprintk("MAC interrupt status = 0x%X.\n", mac_status); + + if (mac_status & (NES_MAC_INT_LINK_STAT_CHG | NES_MAC_INT_XGMII_EXT)) { + /* read the PHY interrupt status register */ + // read status + nes_read_10G_phy_reg(nesdev->index_reg, 1, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, + NES_IDX_MAC_MDIO_CONTROL ); + dprintk("10G phy data from register 1 = 0x%X\n", phy_data); + + if (!(phy_data & 0x0004)) + { + dprintk("link reports down, check faults\n"); + +// nes_read_10G_phy_reg(nesdev->index_reg, 0x18, nesadapter->phy_index[mac_index]); +// phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL); +// dprintk("10G phy data from register 0x18 = 0x%X\n", phy_data); + // clear any faults + nes_read_10G_phy_reg(nesdev->index_reg, 8, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, NES_IDX_MAC_MDIO_CONTROL); + dprintk("10G phy data from register 8 = 0x%X\n", phy_data); + + // read status, again + nes_read_10G_phy_reg(nesdev->index_reg, 1, nesadapter->phy_index[mac_index]); + phy_data = (u16)nes_read_indexed(nesdev->index_reg, + NES_IDX_MAC_MDIO_CONTROL ); + dprintk("10G phy data from register 1 = 0x%X\n", phy_data); + } + + if (phy_data & 0x0004) + { + dprintk("The Link is UP!!. linkup was %d\n", nes_port->linkup); + if (nes_port->linkup == 0) { + printk(PFX "The Link is now up for port %u.\n", nesdev->mac_index); + if (netif_queue_stopped(nesdev->netdev)) + netif_start_queue(nesdev->netdev); + nes_port->linkup = 1; + netif_carrier_on(nesdev->netdev); + } + } else { + dprintk("The Link is Down!!. linkup was %d\n", nes_port->linkup); + if (nes_port->linkup == 1) { + printk(PFX "The Link is now down for port %u.\n", nesdev->mac_index); + if (!(netif_queue_stopped(nesdev->netdev))) + netif_stop_queue(nesdev->netdev); + nes_port->linkup = 0; + netif_carrier_off(nesdev->netdev); + } + } + } + if (mac_status & NES_MAC_INT_TX_UNDERFLOW) { + dprintk("The MAC reported a TX underflow!!.\n"); + } + if (mac_status & NES_MAC_INT_TX_ERROR) { + dprintk("The MAC reported a TX Error!!.\n"); + } +} + + +/** + * nes_hnic_ce_handler + * + * @param nesdev + * @param cq + */ +void nes_hnic_ce_handler(struct nes_dev *nesdev, struct nes_hw_nic_cq *cq) +{ + struct nes_hw_nic *nesnic; + struct nes_port *nes_port = netdev_priv(nesdev->netdev); + struct nes_hw_nic_rq_wqe *nic_rqe; + struct nes_hw_nic_sq_wqe *nic_sqe; + struct sk_buff *skb; + struct sk_buff *rx_skb; + struct sk_buff *dup_rx_skb; + struct tcphdr *pTCPHeader; + struct iphdr *pIPHeader; + unsigned long flags; + u64 u64temp; + u8 u8temp; + dma_addr_t bus_address; + u32 head; + u32 cq_size; + u32 rx_pkt_size; + u32 cqe_count=0; + +// dprintk("%s:%s:%u:\n", __FILE__, __FUNCTION__, __LINE__); + + head = cq->cq_head; + cq_size = cq->cq_size; + do { + if (le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]) & NES_NIC_CQE_VALID) { + nesnic = &nesdev->hnic; + cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX] = le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]); + + if (cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]&NES_NIC_CQE_SQ) { +// dprintk("%s: Processing SQ completion for QP%u. SQ Tail = %u.\n", __FUNCTION__, +// nesdev->hnic.qp_id, nesnic->sq_tail); + nic_sqe = &nesnic->sq_vbase[nesnic->sq_tail]; + skb = nesnic->tx_skb[nesnic->sq_tail]; +// dprintk("SQ skb = %p.\n", skb); + if (nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_2_1_IDX] != 0) { + u64temp = le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG1_LOW_IDX]); + u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG1_HIGH_IDX]))<<3 2; + bus_address = (dma_addr_t)u64temp; + pci_unmap_single(nesdev->pcidev, bus_address, + le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_LENGTH_2_1_IDX]), PCI_DMA_TODEVICE); + dev_kfree_skb_any(skb); + } + spin_lock_irqsave(&nesnic->sq_lock, flags); + nesnic->sq_tail++; + nesnic->sq_tail &= nesnic->sq_size-1; + /* restart the queue if it had been stopped */ + if (netif_queue_stopped(nesdev->netdev)) + netif_wake_queue(nesdev->netdev); + spin_unlock_irqrestore(&nesnic->sq_lock, flags); + } else { + rx_pkt_size = cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]&0x0000ffff; +// dprintk("%s: Processing RQ completion for QP%u. RQ Tail = %u, size = %u.\n", +// __FUNCTION__, nesdev->hnic.qp_id, nesnic->rq_tail, rx_pkt_size); + nic_rqe = &nesnic->rq_vbase[nesnic->rq_tail]; + /* Get the skb */ + rx_skb = nesnic->rx_skb[nesnic->rq_tail]; +// dprintk("Dequeued RQ skb = %p.\n", rx_skb); + /* unmap the buffer */ + nic_rqe = &nesnic->rq_vbase[nesdev->hnic.rq_tail]; + bus_address = le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]); + bus_address += ((u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX]))<<3 2; +// dprintk("skb %p getting removed from the RQ at index %u (bus address = %X).\n", +// rx_skb, nesnic->rq_tail, (u32)bus_address); + pci_unmap_single(nesdev->pcidev, bus_address, + max_frame_len, PCI_DMA_FROMDEVICE); + /* setup the old skb */ + rx_skb->tail = rx_skb->data + rx_pkt_size; + rx_skb->len = rx_pkt_size; + rx_skb->protocol = eth_type_trans(rx_skb, nesdev->netdev); + nesnic->rq_tail++; + nesnic->rq_tail &= nesnic->rq_size - 1; + /* get a new skb */ + skb = dev_alloc_skb(max_frame_len); + if (skb) { +// dprintk("skb %p added to the RQ at index %u.\n", skb, nesnic->rq_head); + skb->dev = nesdev->netdev; + + /* map put down to the chip */ + bus_address = pci_map_single(nesdev->pcidev, + skb->data, max_frame_len, PCI_DMA_FROMDEVICE); +// dprintk("skb %p added to the RQ at index %u (bus address = %X).\n", skb, nesnic->rq_head, bus_address); + + nic_rqe = &nesnic->rq_vbase[nesdev->hnic.rq_head]; + nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_1_0_IDX] = cpu_to_le32(max_frame_len); + nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_3_2_IDX] = 0; + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32((u32)bus_address); + nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX] = cpu_to_le32((u32)((u64)bus_address>>32)); + nesnic->rx_skb[nesnic->rq_head] = skb; + nesnic->rq_head++; + nesnic->rq_head &= nesnic->rq_size - 1; + nes_write32(nesdev->regs+NES_WQE_ALLOC, (1<<24) | nesnic->qp_id ); + } else { + // TODO: Set a timer and/or add code to rx to allocate more buffers + } + + /* indicate the old skb up to the stack */ + /* Need to dup arps, and filter packets based on ports */ + + // if the packet is TCP/IPv4, look it up + if ((le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_NIC_CQE_TAG_PKT_TYPE_IDX] ) & 0xF3E)== 0x112) { + /* TODO: Assuming DIX for now, allow for SNAP and VLAN */ + pIPHeader = (struct iphdr *)rx_skb->data; + pTCPHeader = (struct tcphdr *)(rx_skb->data+(4*pIPHeader->ihl)); + u8temp = 1 << (ntohs(pTCPHeader->dest)&7); + if ((nesdev->local_ipaddr == pIPHeader->daddr) && + (nesdev->apbv_table[ntohs(pTCPHeader->dest)>>3] & u8temp)) { + stack_ops_p->nesif_rx(rx_skb); + } else { + netif_rx(rx_skb); + } + } else { + if (ntohs(rx_skb->protocol) == ETH_P_ARP) { + if (nesdev->nes_stack_start) { + dup_rx_skb = skb_clone(rx_skb, GFP_ATOMIC); + if (dup_rx_skb) { + stack_ops_p->nesif_rx(dup_rx_skb); + } + } + } + netif_rx(rx_skb); + } + + nesdev->netdev->last_rx = jiffies; + nes_port->netstats.rx_packets++; + nes_port->netstats.rx_bytes += (cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]&0x0000ffff); + } + cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX] = 0; + // Accounting... + cqe_count++; + if (++head >= cq_size) head = 0; + if (cqe_count == 255) { + // Arm the CCQ + nes_write32(nesdev->regs+NES_CQE_ALLOC, + cq->cq_number | (cqe_count << 16) ); + cqe_count = 0; + } + } else { + break; + } + } while ( 1 ); + cq->cq_head = head; +// dprintk("CQ%u Processed = %u cqes, new head = %u.\n", cq->cq_number, cqe_count, cq->cq_head); + // Arm the CCQ + nes_write32(nesdev->regs+NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT | + cq->cq_number | (cqe_count << 16) ); +} + + +/** + * cqp_ce_handler + * + * @param nesdev + * @param cq + */ +void cqp_ce_handler(struct nes_dev *nesdev, struct nes_hw_cq *cq) +{ + struct nes_hw_cqp *cqp; + u32 head; + u32 cq_size; + u32 cqe_count=0; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + head = cq->cq_head; + cq_size = cq->cq_size; + + do { + /* process the CQE */ + if (le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX]) & NES_CQE_VALID) { + cqp = *((struct nes_hw_cqp **)&cq->cq_vbase[head].cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]); + + if (cq->cq_vbase[head].cqe_words[NES_CQE_ERROR_CODE_IDX]) { + cq->cq_vbase[head].cqe_words[NES_CQE_ERROR_CODE_IDX] = + le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_CQE_ERROR_CODE_IDX]); + dprintk("Bad Completion code from CQP, Major/Minor codes = 0x%04X:%04X.\n", + (u16)(cq->cq_vbase[head].cqe_words[NES_CQE_ERROR_CODE_IDX]>>16), + (u16)cq->cq_vbase[head].cqe_words[NES_CQE_ERROR_CODE_IDX]); + } + + if (++cqp->sq_tail >= cqp->sq_size) + cqp->sq_tail = 0; + wake_up(&nesdev->cqp.waitq); + + cq->cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0; + // Accounting... + cqe_count++; + if (++head >= cq_size) + head = 0; + } else { + break; + } + } while ( 1 ); + cq->cq_head = head; + + /* Arm the CCQ */ + nes_write32(nesdev->regs+NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT | + cq->cq_number | (cqe_count << 16) ); +} + + +/** + * nes_process_iwarp_aeqe + * + * @param nesdev + * @param aeqe + */ +void nes_process_iwarp_aeqe(struct nes_dev *nesdev, struct nes_hw_aeqe *aeqe) +{ + u64 context; + struct nes_qp *nesqp; + struct iw_cm_id *cm_id; + struct nes_adapter *nesadapter = nesdev->nesadapter; + struct ib_event ibevent; + struct iw_cm_event cmevent; + u32 aeq_info; + u16 async_event_id; + + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + context = aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]; + context += ((u64)aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])<<32; + aeq_info = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_MISC_IDX]); + async_event_id = (u16)aeq_info; + + switch (async_event_id){ + case NES_AEQE_AEID_LLP_FIN_RECEIVED: + nesqp = *((struct nes_qp **)&context); + if (nesqp->cm_id){ + cm_id = nesqp->cm_id; + if (cm_id->event_handler) { + cmevent.event = IW_CM_EVENT_DISCONNECT; + cmevent.status = IW_CM_EVENT_STATUS_OK; + cmevent.local_addr = cm_id->local_addr; + cmevent.remote_addr = cm_id->remote_addr; + cmevent.private_data = NULL; + cmevent.private_data_len = 0; + dprintk("%s:Generating a Disconnect Event (normal) for QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + cm_id->event_handler(nesqp->cm_id, &cmevent); + /* TODO: this does not seem correct, Seems like app should do close */ + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + } + } + break; + case NES_AEQE_AEID_LLP_CLOSE_COMPLETE: + nesqp = *((struct nes_qp **)&context); + nesqp->last_aeq = NES_AEQE_AEID_LLP_CLOSE_COMPLETE; + dprintk("%s: Processing an NES_AEQE_AEID_LLP_CLOSE_COMPLETE event on QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + break; + case NES_AEQE_AEID_LLP_CONNECTION_RESET: + nesqp = *((struct nes_qp **)&context); + if (nesqp->cm_id){ + cm_id = nesqp->cm_id; + if (cm_id->event_handler) { + cmevent.event = IW_CM_EVENT_DISCONNECT; + cmevent.status = IW_CM_EVENT_STATUS_OK; + cmevent.local_addr = cm_id->local_addr; + cmevent.remote_addr = cm_id->remote_addr; + cmevent.private_data = NULL; + cmevent.private_data_len = 0; + dprintk("%s:Generating a Disconnect Event (reset) for QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + cm_id->event_handler(cm_id, &cmevent); + /* TODO: this does not seem correct, Seems like app should do close */ + dprintk("nesqp->aewq = %p.\n", nesqp->aewq ); + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + } + } + break; + case NES_AEQE_AEID_AMP_BAD_STAG_INDEX: + if (NES_AEQE_INBOUND_RDMA&aeq_info) { + nesqp = nesadapter->qp_table[le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID _IDX])-NES_FIRST_QPN]; + } else { + /* TODO: get the actual WQE and mask off wqe index */ + context &= ~((u64)511); + nesqp = *((struct nes_qp **)&context); + } + printk("%s: Processing an NES_AEQE_AEID_AMP_BAD_STAG_INDEX event on QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_ACCESS_ERR; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context ); + } + break; + case NES_AEQE_AEID_AMP_UNALLOCATED_STAG: + nesqp = *((struct nes_qp **)&context); + printk("%s: Processing an NES_AEQE_AEID_AMP_UNALLOCATED_STAG event on QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_ACCESS_ERR; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context); + } + break; + case NES_AEQE_AEID_PRIV_OPERATION_DENIED: + nesqp = nesadapter->qp_table[le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID _IDX])-NES_FIRST_QPN]; + printk("%s: Processing an NES_AEQE_AEID_PRIV_OPERATION_DENIED event on QP%u, nesqp = %p, AE reported %p \n", + __FUNCTION__, nesqp->hwqp.qp_id, nesqp, *((struct nes_qp **)&context)); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_ACCESS_ERR; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context); + } + case NES_AEQE_AEID_CQ_OPERATION_ERROR: + printk("%s: Processing an NES_AEQE_AEID_CQ_OPERATION_ERROR event on CQ%u \n", + __FUNCTION__, le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX])); + /* TODO: Need to add code to lookup the CQ context based on the CQID from the AE + and generate an event */ + break; + case NES_AEQE_AEID_DDP_UBE_DDP_MESSAGE_TOO_LONG_FOR_AVAILABLE_BUFFER: + /* SA1 Errata: completion context not filled in */ + nesqp = nesadapter->qp_table[le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID _IDX])-NES_FIRST_QPN]; + printk("%s: Processing an NES_AEQE_AEID_DDP_UBE_DDP_MESSAGE_TOO_LONG_FOR_AVAILABLE_BUFFER event on QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_ACCESS_ERR; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context ); + } + /* TODO: this does not seem correct, Seems like app should do close */ + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + break; + case NES_AEQE_AEID_DDP_UBE_INVALID_MSN_NO_BUFFER_AVAILABLE: + nesqp = *((struct nes_qp **)&context); + printk("%s: Processing an NES_AEQE_AEID_DDP_UBE_INVALID_MSN_NO_BUFFER_AVAILABLE event on QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_FATAL; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context ); + } + /* TODO: this does not seem correct, Seems like app should do close */ + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + break; + case NES_AEQE_AEID_LLP_RECEIVED_MPA_CRC_ERROR: + nesqp = *((struct nes_qp **)&context); + printk("%s: Processing an NES_AEQE_AEID_LLP_RECEIVED_MPA_CRC_ERROR event on QP%u \n Q2 Data: \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_FATAL; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context ); + } + /* TODO: this does not seem correct, Seems like app should do close */ + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + break; + case NES_AEQE_AEID_LLP_TERMINATE_RECEIVED: + nesqp = *((struct nes_qp **)&context); + printk("%s: Processing an NES_AEQE_AEID_LLP_TERMINATE_RECEIVED event on QP%u \n Q2 Data: \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_FATAL; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context ); + } + nesqp->ibqp_state = IB_QPS_SQE; + nesqp->iwarp_state = NES_CQP_QP_IWARP_STATE_TERMINATE; + /* TODO: this does not seem correct, Seems like app should do close */ + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + break; + case NES_AEQE_AEID_RDMAP_ROE_BAD_LLP_CLOSE: + nesqp = *((struct nes_qp **)&context); + printk("%s: Processing an NES_AEQE_AEID_RDMAP_ROE_BAD_LLP_CLOSE event on QP%u \n", + __FUNCTION__, nesqp->hwqp.qp_id ); + if (nesqp->ibqp.event_handler) { + ibevent.device = nesqp->ibqp.device; + ibevent.element.qp = &nesqp->ibqp; + ibevent.event = IB_EVENT_QP_FATAL; + nesqp->ibqp.event_handler(&ibevent, nesqp->ibqp.qp_context ); + } + nesqp->ibqp_state = IB_QPS_SQE; + nesqp->iwarp_state = NES_CQP_QP_IWARP_STATE_TERMINATE; + /* TODO: this does not seem correct, Seems like app should do close */ + INIT_WORK(&nesqp->ae_work, nes_disconnect_worker, nesqp); + queue_work(nesqp->aewq, &nesqp->ae_work); + break; + /* TODO: additional AEs need to be here */ + default: + printk("%s: Processing an iWARP related AE for QP, misc = 0x%04X\n", __FUNCTION__, + async_event_id); + break; + } +} + + +/** + * iwarp_ce_handler + * + * @param nesdev + * @param hw_cq + */ +void iwarp_ce_handler(struct nes_dev *nesdev, struct nes_hw_cq *hw_cq) +{ + struct nes_cq *nescq = container_of(hw_cq, struct nes_cq, hw_cq); + +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + +// dprintk("%s: Processing completion event for iWARP CQ%u.\n", __FUNCTION__, nescq->hw_cq.cq_number); + nes_write32(nesdev->regs+NES_CQ_ACK, nescq->hw_cq.cq_number ); + + if (nescq->ibcq.comp_handler) + nescq->ibcq.comp_handler(&nescq->ibcq, nescq->ibcq.cq_context); + + return; + +} + From ggrundstrom at NetEffect.com Thu Oct 26 17:41:01 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:41:01 -0500 Subject: [openib-general] [PATCH 1/5] NetEffect 10Gb RNIC Userspace Library: userspace config generation Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECA@venom2> Userspace patch 1 of 5. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/src/userspace/libnes/aclocal.m4 new/src/userspace/libnes/aclocal.m4 --- old/src/userspace/libnes/aclocal.m4 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/aclocal.m4 2006-10-25 11:11:08.000000000 -0500 @@ -0,0 +1,7256 @@ +# generated automatically by aclocal 1.9.6 -*- Autoconf -*- + +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, +# 2005 Free Software Foundation, Inc. +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY, to the extent permitted by law; without +# even the implied warranty of MERCHANTABILITY or FITNESS FOR A +# PARTICULAR PURPOSE. + +# libtool.m4 - Configure libtool for the host system. -*-Autoconf-*- + +# serial 48 AC_PROG_LIBTOOL + + +# AC_PROVIDE_IFELSE(MACRO-NAME, IF-PROVIDED, IF-NOT-PROVIDED) +# ----------------------------------------------------------- +# If this macro is not defined by Autoconf, define it here. +m4_ifdef([AC_PROVIDE_IFELSE], + [], + [m4_define([AC_PROVIDE_IFELSE], + [m4_ifdef([AC_PROVIDE_$1], + [$2], [$3])])]) + + +# AC_PROG_LIBTOOL +# --------------- +AC_DEFUN([AC_PROG_LIBTOOL], +[AC_REQUIRE([_AC_PROG_LIBTOOL])dnl +dnl If AC_PROG_CXX has already been expanded, run AC_LIBTOOL_CXX +dnl immediately, otherwise, hook it in at the end of AC_PROG_CXX. + AC_PROVIDE_IFELSE([AC_PROG_CXX], + [AC_LIBTOOL_CXX], + [define([AC_PROG_CXX], defn([AC_PROG_CXX])[AC_LIBTOOL_CXX + ])]) +dnl And a similar setup for Fortran 77 support + AC_PROVIDE_IFELSE([AC_PROG_F77], + [AC_LIBTOOL_F77], + [define([AC_PROG_F77], defn([AC_PROG_F77])[AC_LIBTOOL_F77 +])]) + +dnl Quote A][M_PROG_GCJ so that aclocal doesn't bring it in needlessly. +dnl If either AC_PROG_GCJ or A][M_PROG_GCJ have already been expanded, run +dnl AC_LIBTOOL_GCJ immediately, otherwise, hook it in at the end of both. + AC_PROVIDE_IFELSE([AC_PROG_GCJ], + [AC_LIBTOOL_GCJ], + [AC_PROVIDE_IFELSE([A][M_PROG_GCJ], + [AC_LIBTOOL_GCJ], + [AC_PROVIDE_IFELSE([LT_AC_PROG_GCJ], + [AC_LIBTOOL_GCJ], + [ifdef([AC_PROG_GCJ], + [define([AC_PROG_GCJ], defn([AC_PROG_GCJ])[AC_LIBTOOL_GCJ])]) + ifdef([A][M_PROG_GCJ], + [define([A][M_PROG_GCJ], defn([A][M_PROG_GCJ])[AC_LIBTOOL_GCJ])]) + ifdef([LT_AC_PROG_GCJ], + [define([LT_AC_PROG_GCJ], + defn([LT_AC_PROG_GCJ])[AC_LIBTOOL_GCJ])])])]) +])])# AC_PROG_LIBTOOL + + +# _AC_PROG_LIBTOOL +# ---------------- +AC_DEFUN([_AC_PROG_LIBTOOL], +[AC_REQUIRE([AC_LIBTOOL_SETUP])dnl +AC_BEFORE([$0],[AC_LIBTOOL_CXX])dnl +AC_BEFORE([$0],[AC_LIBTOOL_F77])dnl +AC_BEFORE([$0],[AC_LIBTOOL_GCJ])dnl + +# This can be used to rebuild libtool when needed +LIBTOOL_DEPS="$ac_aux_dir/ltmain.sh" + +# Always use our own libtool. +LIBTOOL='$(SHELL) $(top_builddir)/libtool' +AC_SUBST(LIBTOOL)dnl + +# Prevent multiple expansion +define([AC_PROG_LIBTOOL], []) +])# _AC_PROG_LIBTOOL + + +# AC_LIBTOOL_SETUP +# ---------------- +AC_DEFUN([AC_LIBTOOL_SETUP], +[AC_PREREQ(2.50)dnl +AC_REQUIRE([AC_ENABLE_SHARED])dnl +AC_REQUIRE([AC_ENABLE_STATIC])dnl +AC_REQUIRE([AC_ENABLE_FAST_INSTALL])dnl +AC_REQUIRE([AC_CANONICAL_HOST])dnl +AC_REQUIRE([AC_CANONICAL_BUILD])dnl +AC_REQUIRE([AC_PROG_CC])dnl +AC_REQUIRE([AC_PROG_LD])dnl +AC_REQUIRE([AC_PROG_LD_RELOAD_FLAG])dnl +AC_REQUIRE([AC_PROG_NM])dnl + +AC_REQUIRE([AC_PROG_LN_S])dnl +AC_REQUIRE([AC_DEPLIBS_CHECK_METHOD])dnl +# Autoconf 2.13's AC_OBJEXT and AC_EXEEXT macros only works for C compilers! +AC_REQUIRE([AC_OBJEXT])dnl +AC_REQUIRE([AC_EXEEXT])dnl +dnl + +AC_LIBTOOL_SYS_MAX_CMD_LEN +AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE +AC_LIBTOOL_OBJDIR + +AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl +_LT_AC_PROG_ECHO_BACKSLASH + +case $host_os in +aix3*) + # AIX sometimes has problems with the GCC collect2 program. For some + # reason, if we set the COLLECT_NAMES environment variable, the problems + # vanish in a puff of smoke. + if test "X${COLLECT_NAMES+set}" != Xset; then + COLLECT_NAMES= + export COLLECT_NAMES + fi + ;; +esac + +# Sed substitution that helps us do robust quoting. It backslashifies +# metacharacters that are still active within double-quoted strings. +Xsed='sed -e 1s/^X//' +[sed_quote_subst='s/\([\\"\\`$\\\\]\)/\\\1/g'] + +# Same as above, but do not quote variable references. +[double_quote_subst='s/\([\\"\\`\\\\]\)/\\\1/g'] + +# Sed substitution to delay expansion of an escaped shell variable in a +# double_quote_subst'ed string. +delay_variable_subst='s/\\\\\\\\\\\$/\\\\\\$/g' + +# Sed substitution to avoid accidental globbing in evaled expressions +no_glob_subst='s/\*/\\\*/g' + +# Constants: +rm="rm -f" + +# Global variables: +default_ofile=libtool +can_build_shared=yes + +# All known linkers require a `.a' archive for static linking (except MSVC, +# which needs '.lib'). +libext=a +ltmain="$ac_aux_dir/ltmain.sh" +ofile="$default_ofile" +with_gnu_ld="$lt_cv_prog_gnu_ld" + +AC_CHECK_TOOL(AR, ar, false) +AC_CHECK_TOOL(RANLIB, ranlib, :) +AC_CHECK_TOOL(STRIP, strip, :) + +old_CC="$CC" +old_CFLAGS="$CFLAGS" + +# Set sane defaults for various variables +test -z "$AR" && AR=ar +test -z "$AR_FLAGS" && AR_FLAGS=cru +test -z "$AS" && AS=as +test -z "$CC" && CC=cc +test -z "$LTCC" && LTCC=$CC +test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS +test -z "$DLLTOOL" && DLLTOOL=dlltool +test -z "$LD" && LD=ld +test -z "$LN_S" && LN_S="ln -s" +test -z "$MAGIC_CMD" && MAGIC_CMD=file +test -z "$NM" && NM=nm +test -z "$SED" && SED=sed +test -z "$OBJDUMP" && OBJDUMP=objdump +test -z "$RANLIB" && RANLIB=: +test -z "$STRIP" && STRIP=: +test -z "$ac_objext" && ac_objext=o + +# Determine commands to create old-style static archives. +old_archive_cmds='$AR $AR_FLAGS $oldlib$oldobjs$old_deplibs' +old_postinstall_cmds='chmod 644 $oldlib' +old_postuninstall_cmds= + +if test -n "$RANLIB"; then + case $host_os in + openbsd*) + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$oldlib" + ;; + *) + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$oldlib" + ;; + esac + old_archive_cmds="$old_archive_cmds~\$RANLIB \$oldlib" +fi + +_LT_CC_BASENAME([$compiler]) + +# Only perform the check for file, if the check method requires it +case $deplibs_check_method in +file_magic*) + if test "$file_magic_cmd" = '$MAGIC_CMD'; then + AC_PATH_MAGIC + fi + ;; +esac + +AC_PROVIDE_IFELSE([AC_LIBTOOL_DLOPEN], enable_dlopen=yes, enable_dlopen=no) +AC_PROVIDE_IFELSE([AC_LIBTOOL_WIN32_DLL], +enable_win32_dll=yes, enable_win32_dll=no) + +AC_ARG_ENABLE([libtool-lock], + [AC_HELP_STRING([--disable-libtool-lock], + [avoid locking (might break parallel builds)])]) +test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes + +AC_ARG_WITH([pic], + [AC_HELP_STRING([--with-pic], + [try to use only PIC/non-PIC objects @<:@default=use both@:>@])], + [pic_mode="$withval"], + [pic_mode=default]) +test -z "$pic_mode" && pic_mode=default + +# Use C for the default configuration in the libtool script +tagname= +AC_LIBTOOL_LANG_C_CONFIG +_LT_AC_TAGCONFIG +])# AC_LIBTOOL_SETUP + + +# _LT_AC_SYS_COMPILER +# ------------------- +AC_DEFUN([_LT_AC_SYS_COMPILER], +[AC_REQUIRE([AC_PROG_CC])dnl + +# If no C compiler was specified, use CC. +LTCC=${LTCC-"$CC"} + +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} + +# Allow CC to be a program name with arguments. +compiler=$CC +])# _LT_AC_SYS_COMPILER + + +# _LT_CC_BASENAME(CC) +# ------------------- +# Calculate cc_basename. Skip known compiler wrappers and cross-prefix. +AC_DEFUN([_LT_CC_BASENAME], +[for cc_temp in $1""; do + case $cc_temp in + compile | *[[\\/]]compile | ccache | *[[\\/]]ccache ) ;; + distcc | *[[\\/]]distcc | purify | *[[\\/]]purify ) ;; + \-*) ;; + *) break;; + esac +done +cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e "s%^$host_alias-%%"` +]) + + +# _LT_COMPILER_BOILERPLATE +# ------------------------ +# Check for compiler boilerplate output or warnings with +# the simple compiler test code. +AC_DEFUN([_LT_COMPILER_BOILERPLATE], +[ac_outfile=conftest.$ac_objext +printf "$lt_simple_compile_test_code" >conftest.$ac_ext +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err +_lt_compiler_boilerplate=`cat conftest.err` +$rm conftest* +])# _LT_COMPILER_BOILERPLATE + + +# _LT_LINKER_BOILERPLATE +# ---------------------- +# Check for linker boilerplate output or warnings with +# the simple link test code. +AC_DEFUN([_LT_LINKER_BOILERPLATE], +[ac_outfile=conftest.$ac_objext +printf "$lt_simple_link_test_code" >conftest.$ac_ext +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err +_lt_linker_boilerplate=`cat conftest.err` +$rm conftest* +])# _LT_LINKER_BOILERPLATE + + +# _LT_AC_SYS_LIBPATH_AIX +# ---------------------- +# Links a minimal program and checks the executable +# for the system default hardcoded library path. In most cases, +# this is /usr/lib:/lib, but when the MPI compilers are used +# the location of the communication and MPI libs are included too. +# If we don't find anything, use the default library path according +# to the aix ld manual. +AC_DEFUN([_LT_AC_SYS_LIBPATH_AIX], +[AC_LINK_IFELSE(AC_LANG_PROGRAM,[ +aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } +}'` +# Check for a 64-bit object if we didn't find anything. +if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } +}'`; fi],[]) +if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi +])# _LT_AC_SYS_LIBPATH_AIX + + +# _LT_AC_SHELL_INIT(ARG) +# ---------------------- +AC_DEFUN([_LT_AC_SHELL_INIT], +[ifdef([AC_DIVERSION_NOTICE], + [AC_DIVERT_PUSH(AC_DIVERSION_NOTICE)], + [AC_DIVERT_PUSH(NOTICE)]) +$1 +AC_DIVERT_POP +])# _LT_AC_SHELL_INIT + + +# _LT_AC_PROG_ECHO_BACKSLASH +# -------------------------- +# Add some code to the start of the generated configure script which +# will find an echo command which doesn't interpret backslashes. +AC_DEFUN([_LT_AC_PROG_ECHO_BACKSLASH], +[_LT_AC_SHELL_INIT([ +# Check that we are running under the correct shell. +SHELL=${CONFIG_SHELL-/bin/sh} + +case X$ECHO in +X*--fallback-echo) + # Remove one level of quotation (which was required for Make). + ECHO=`echo "$ECHO" | sed 's,\\\\\[$]\\[$]0,'[$]0','` + ;; +esac + +echo=${ECHO-echo} +if test "X[$]1" = X--no-reexec; then + # Discard the --no-reexec flag, and continue. + shift +elif test "X[$]1" = X--fallback-echo; then + # Avoid inline document here, it may be left over + : +elif test "X`($echo '\t') 2>/dev/null`" = 'X\t' ; then + # Yippee, $echo works! + : +else + # Restart under the correct shell. + exec $SHELL "[$]0" --no-reexec ${1+"[$]@"} +fi + +if test "X[$]1" = X--fallback-echo; then + # used as fallback echo + shift + cat </dev/null 2>&1 && unset CDPATH + +if test -z "$ECHO"; then +if test "X${echo_test_string+set}" != Xset; then +# find a string as large as possible, as long as the shell can cope with it + for cmd in 'sed 50q "[$]0"' 'sed 20q "[$]0"' 'sed 10q "[$]0"' 'sed 2q "[$]0"' 'echo test'; do + # expected sizes: less than 2Kb, 1Kb, 512 bytes, 16 bytes, ... + if (echo_test_string=`eval $cmd`) 2>/dev/null && + echo_test_string=`eval $cmd` && + (test "X$echo_test_string" = "X$echo_test_string") 2>/dev/null + then + break + fi + done +fi + +if test "X`($echo '\t') 2>/dev/null`" = 'X\t' && + echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` && + test "X$echo_testing_string" = "X$echo_test_string"; then + : +else + # The Solaris, AIX, and Digital Unix default echo programs unquote + # backslashes. This makes it impossible to quote backslashes using + # echo "$something" | sed 's/\\/\\\\/g' + # + # So, first we look for a working echo in the user's PATH. + + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR + for dir in $PATH /usr/ucb; do + IFS="$lt_save_ifs" + if (test -f $dir/echo || test -f $dir/echo$ac_exeext) && + test "X`($dir/echo '\t') 2>/dev/null`" = 'X\t' && + echo_testing_string=`($dir/echo "$echo_test_string") 2>/dev/null` && + test "X$echo_testing_string" = "X$echo_test_string"; then + echo="$dir/echo" + break + fi + done + IFS="$lt_save_ifs" + + if test "X$echo" = Xecho; then + # We didn't find a better echo, so look for alternatives. + if test "X`(print -r '\t') 2>/dev/null`" = 'X\t' && + echo_testing_string=`(print -r "$echo_test_string") 2>/dev/null` && + test "X$echo_testing_string" = "X$echo_test_string"; then + # This shell has a builtin print -r that does the trick. + echo='print -r' + elif (test -f /bin/ksh || test -f /bin/ksh$ac_exeext) && + test "X$CONFIG_SHELL" != X/bin/ksh; then + # If we have ksh, try running configure again with it. + ORIGINAL_CONFIG_SHELL=${CONFIG_SHELL-/bin/sh} + export ORIGINAL_CONFIG_SHELL + CONFIG_SHELL=/bin/ksh + export CONFIG_SHELL + exec $CONFIG_SHELL "[$]0" --no-reexec ${1+"[$]@"} + else + # Try using printf. + echo='printf %s\n' + if test "X`($echo '\t') 2>/dev/null`" = 'X\t' && + echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` && + test "X$echo_testing_string" = "X$echo_test_string"; then + # Cool, printf works + : + elif echo_testing_string=`($ORIGINAL_CONFIG_SHELL "[$]0" --fallback-echo '\t') 2>/dev/null` && + test "X$echo_testing_string" = 'X\t' && + echo_testing_string=`($ORIGINAL_CONFIG_SHELL "[$]0" --fallback-echo "$echo_test_string") 2>/dev/null` && + test "X$echo_testing_string" = "X$echo_test_string"; then + CONFIG_SHELL=$ORIGINAL_CONFIG_SHELL + export CONFIG_SHELL + SHELL="$CONFIG_SHELL" + export SHELL + echo="$CONFIG_SHELL [$]0 --fallback-echo" + elif echo_testing_string=`($CONFIG_SHELL "[$]0" --fallback-echo '\t') 2>/dev/null` && + test "X$echo_testing_string" = 'X\t' && + echo_testing_string=`($CONFIG_SHELL "[$]0" --fallback-echo "$echo_test_string") 2>/dev/null` && + test "X$echo_testing_string" = "X$echo_test_string"; then + echo="$CONFIG_SHELL [$]0 --fallback-echo" + else + # maybe with a smaller string... + prev=: + + for cmd in 'echo test' 'sed 2q "[$]0"' 'sed 10q "[$]0"' 'sed 20q "[$]0"' 'sed 50q "[$]0"'; do + if (test "X$echo_test_string" = "X`eval $cmd`") 2>/dev/null + then + break + fi + prev="$cmd" + done + + if test "$prev" != 'sed 50q "[$]0"'; then + echo_test_string=`eval $prev` + export echo_test_string + exec ${ORIGINAL_CONFIG_SHELL-${CONFIG_SHELL-/bin/sh}} "[$]0" ${1+"[$]@"} + else + # Oops. We lost completely, so just stick with echo. + echo=echo + fi + fi + fi + fi +fi +fi + +# Copy echo and quote the copy suitably for passing to libtool from +# the Makefile, instead of quoting the original, which is used later. +ECHO=$echo +if test "X$ECHO" = "X$CONFIG_SHELL [$]0 --fallback-echo"; then + ECHO="$CONFIG_SHELL \\\$\[$]0 --fallback-echo" +fi + +AC_SUBST(ECHO) +])])# _LT_AC_PROG_ECHO_BACKSLASH + + +# _LT_AC_LOCK +# ----------- +AC_DEFUN([_LT_AC_LOCK], +[AC_ARG_ENABLE([libtool-lock], + [AC_HELP_STRING([--disable-libtool-lock], + [avoid locking (might break parallel builds)])]) +test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes + +# Some flags need to be propagated to the compiler or linker for good +# libtool support. +case $host in +ia64-*-hpux*) + # Find out which ABI we are using. + echo 'int i;' > conftest.$ac_ext + if AC_TRY_EVAL(ac_compile); then + case `/usr/bin/file conftest.$ac_objext` in + *ELF-32*) + HPUX_IA64_MODE="32" + ;; + *ELF-64*) + HPUX_IA64_MODE="64" + ;; + esac + fi + rm -rf conftest* + ;; +*-*-irix6*) + # Find out which ABI we are using. + echo '[#]line __oline__ "configure"' > conftest.$ac_ext + if AC_TRY_EVAL(ac_compile); then + if test "$lt_cv_prog_gnu_ld" = yes; then + case `/usr/bin/file conftest.$ac_objext` in + *32-bit*) + LD="${LD-ld} -melf32bsmip" + ;; + *N32*) + LD="${LD-ld} -melf32bmipn32" + ;; + *64-bit*) + LD="${LD-ld} -melf64bmip" + ;; + esac + else + case `/usr/bin/file conftest.$ac_objext` in + *32-bit*) + LD="${LD-ld} -32" + ;; + *N32*) + LD="${LD-ld} -n32" + ;; + *64-bit*) + LD="${LD-ld} -64" + ;; + esac + fi + fi + rm -rf conftest* + ;; + +x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*|s390*-*linux*|sparc*-*linu x*) + # Find out which ABI we are using. + echo 'int i;' > conftest.$ac_ext + if AC_TRY_EVAL(ac_compile); then + case `/usr/bin/file conftest.o` in + *32-bit*) + case $host in + x86_64-*linux*) + LD="${LD-ld} -m elf_i386" + ;; + ppc64-*linux*|powerpc64-*linux*) + LD="${LD-ld} -m elf32ppclinux" + ;; + s390x-*linux*) + LD="${LD-ld} -m elf_s390" + ;; + sparc64-*linux*) + LD="${LD-ld} -m elf32_sparc" + ;; + esac + ;; + *64-bit*) + case $host in + x86_64-*linux*) + LD="${LD-ld} -m elf_x86_64" + ;; + ppc*-*linux*|powerpc*-*linux*) + LD="${LD-ld} -m elf64ppc" + ;; + s390*-*linux*) + LD="${LD-ld} -m elf64_s390" + ;; + sparc*-*linux*) + LD="${LD-ld} -m elf64_sparc" + ;; + esac + ;; + esac + fi + rm -rf conftest* + ;; + +*-*-sco3.2v5*) + # On SCO OpenServer 5, we need -belf to get full-featured binaries. + SAVE_CFLAGS="$CFLAGS" + CFLAGS="$CFLAGS -belf" + AC_CACHE_CHECK([whether the C compiler needs -belf], lt_cv_cc_needs_belf, + [AC_LANG_PUSH(C) + AC_TRY_LINK([],[],[lt_cv_cc_needs_belf=yes],[lt_cv_cc_needs_belf=no]) + AC_LANG_POP]) + if test x"$lt_cv_cc_needs_belf" != x"yes"; then + # this is probably gcc 2.8.0, egcs 1.0 or newer; no need for -belf + CFLAGS="$SAVE_CFLAGS" + fi + ;; +sparc*-*solaris*) + # Find out which ABI we are using. + echo 'int i;' > conftest.$ac_ext + if AC_TRY_EVAL(ac_compile); then + case `/usr/bin/file conftest.o` in + *64-bit*) + case $lt_cv_prog_gnu_ld in + yes*) LD="${LD-ld} -m elf64_sparc" ;; + *) LD="${LD-ld} -64" ;; + esac + ;; + esac + fi + rm -rf conftest* + ;; + +AC_PROVIDE_IFELSE([AC_LIBTOOL_WIN32_DLL], +[*-*-cygwin* | *-*-mingw* | *-*-pw32*) + AC_CHECK_TOOL(DLLTOOL, dlltool, false) + AC_CHECK_TOOL(AS, as, false) + AC_CHECK_TOOL(OBJDUMP, objdump, false) + ;; + ]) +esac + +need_locks="$enable_libtool_lock" + +])# _LT_AC_LOCK + + +# AC_LIBTOOL_COMPILER_OPTION(MESSAGE, VARIABLE-NAME, FLAGS, +# [OUTPUT-FILE], [ACTION-SUCCESS], [ACTION-FAILURE]) +# ---------------------------------------------------------------- +# Check whether the given compiler option works +AC_DEFUN([AC_LIBTOOL_COMPILER_OPTION], +[AC_REQUIRE([LT_AC_PROG_SED]) +AC_CACHE_CHECK([$1], [$2], + [$2=no + ifelse([$4], , [ac_outfile=conftest.$ac_objext], [ac_outfile=$4]) + printf "$lt_simple_compile_test_code" > conftest.$ac_ext + lt_compiler_flag="$3" + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + # The option is referenced via a variable to avoid confusing sed. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [[^ ]]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:__oline__: $lt_compile\"" >&AS_MESSAGE_LOG_FD) + (eval "$lt_compile" 2>conftest.err) + ac_status=$? + cat conftest.err >&AS_MESSAGE_LOG_FD + echo "$as_me:__oline__: \$? = $ac_status" >&AS_MESSAGE_LOG_FD + if (exit $ac_status) && test -s "$ac_outfile"; then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings other than the usual output. + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then + $2=yes + fi + fi + $rm conftest* +]) + +if test x"[$]$2" = xyes; then + ifelse([$5], , :, [$5]) +else + ifelse([$6], , :, [$6]) +fi +])# AC_LIBTOOL_COMPILER_OPTION + + +# AC_LIBTOOL_LINKER_OPTION(MESSAGE, VARIABLE-NAME, FLAGS, +# [ACTION-SUCCESS], [ACTION-FAILURE]) +# ------------------------------------------------------------ +# Check whether the given compiler option works +AC_DEFUN([AC_LIBTOOL_LINKER_OPTION], +[AC_CACHE_CHECK([$1], [$2], + [$2=no + save_LDFLAGS="$LDFLAGS" + LDFLAGS="$LDFLAGS $3" + printf "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&AS_MESSAGE_LOG_FD + $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + $2=yes + fi + else + $2=yes + fi + fi + $rm conftest* + LDFLAGS="$save_LDFLAGS" +]) + +if test x"[$]$2" = xyes; then + ifelse([$4], , :, [$4]) +else + ifelse([$5], , :, [$5]) +fi +])# AC_LIBTOOL_LINKER_OPTION + + +# AC_LIBTOOL_SYS_MAX_CMD_LEN +# -------------------------- +AC_DEFUN([AC_LIBTOOL_SYS_MAX_CMD_LEN], +[# find the maximum length of command line arguments +AC_MSG_CHECKING([the maximum length of command line arguments]) +AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl + i=0 + teststring="ABCD" + + case $build_os in + msdosdjgpp*) + # On DJGPP, this test can blow up pretty badly due to problems in libc + # (any single argument exceeding 2000 bytes causes a buffer overrun + # during glob expansion). Even if it were fixed, the result of this + # check would be larger than it should be. + lt_cv_sys_max_cmd_len=12288; # 12K is about right + ;; + + gnu*) + # Under GNU Hurd, this test is not required because there is + # no limit to the length of command line arguments. + # Libtool will interpret -1 as no limit whatsoever + lt_cv_sys_max_cmd_len=-1; + ;; + + cygwin* | mingw*) + # On Win9x/ME, this test blows up -- it succeeds, but takes + # about 5 minutes as the teststring grows exponentially. + # Worse, since 9x/ME are not pre-emptively multitasking, + # you end up with a "frozen" computer, even though with patience + # the test eventually succeeds (with a max line length of 256k). + # Instead, let's just punt: use the minimum linelength reported by + # all of the supported platforms: 8192 (on NT/2K/XP). + lt_cv_sys_max_cmd_len=8192; + ;; + + amigaos*) + # On AmigaOS with pdksh, this test takes hours, literally. + # So we just punt and use a minimum line length of 8192. + lt_cv_sys_max_cmd_len=8192; + ;; + + netbsd* | freebsd* | openbsd* | darwin* | dragonfly*) + # This has been around since 386BSD, at least. Likely further. + if test -x /sbin/sysctl; then + lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax` + elif test -x /usr/sbin/sysctl; then + lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax` + else + lt_cv_sys_max_cmd_len=65536 # usable default for all BSDs + fi + # And add a safety zone + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` + ;; + + interix*) + # We know the value 262144 and hardcode it with a safety zone (like BSD) + lt_cv_sys_max_cmd_len=196608 + ;; + + osf*) + # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running configure + # due to this test when exec_disable_arg_limit is 1 on Tru64. It is not + # nice to cause kernel panics so lets avoid the loop below. + # First set a reasonable default. + lt_cv_sys_max_cmd_len=16384 + # + if test -x /sbin/sysconfig; then + case `/sbin/sysconfig -q proc exec_disable_arg_limit` in + *1*) lt_cv_sys_max_cmd_len=-1 ;; + esac + fi + ;; + sco3.2v5*) + lt_cv_sys_max_cmd_len=102400 + ;; + sysv5* | sco5v6* | sysv4.2uw2*) + kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null` + if test -n "$kargmax"; then + lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[[ ]]//'` + else + lt_cv_sys_max_cmd_len=32768 + fi + ;; + *) + # If test is not a shell built-in, we'll probably end up computing a + # maximum length that is only half of the actual maximum length, but + # we can't tell. + SHELL=${SHELL-${CONFIG_SHELL-/bin/sh}} + while (test "X"`$SHELL [$]0 --fallback-echo "X$teststring" 2>/dev/null` \ + = "XX$teststring") >/dev/null 2>&1 && + new_result=`expr "X$teststring" : ".*" 2>&1` && + lt_cv_sys_max_cmd_len=$new_result && + test $i != 17 # 1/2 MB should be enough + do + i=`expr $i + 1` + teststring=$teststring$teststring + done + teststring= + # Add a significant safety factor because C++ compilers can tack on massive + # amounts of additional arguments before passing them to the linker. + # It appears as though 1/2 is a usable value. + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 2` + ;; + esac +]) +if test -n $lt_cv_sys_max_cmd_len ; then + AC_MSG_RESULT($lt_cv_sys_max_cmd_len) +else + AC_MSG_RESULT(none) +fi +])# AC_LIBTOOL_SYS_MAX_CMD_LEN + + +# _LT_AC_CHECK_DLFCN +# ------------------ +AC_DEFUN([_LT_AC_CHECK_DLFCN], +[AC_CHECK_HEADERS(dlfcn.h)dnl +])# _LT_AC_CHECK_DLFCN + + +# _LT_AC_TRY_DLOPEN_SELF (ACTION-IF-TRUE, ACTION-IF-TRUE-W-USCORE, +# ACTION-IF-FALSE, ACTION-IF-CROSS-COMPILING) +# --------------------------------------------------------------------- +AC_DEFUN([_LT_AC_TRY_DLOPEN_SELF], +[AC_REQUIRE([_LT_AC_CHECK_DLFCN])dnl +if test "$cross_compiling" = yes; then : + [$4] +else + lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 + lt_status=$lt_dlunknown + cat > conftest.$ac_ext < +#endif + +#include + +#ifdef RTLD_GLOBAL +# define LT_DLGLOBAL RTLD_GLOBAL +#else +# ifdef DL_GLOBAL +# define LT_DLGLOBAL DL_GLOBAL +# else +# define LT_DLGLOBAL 0 +# endif +#endif + +/* We may have to define LT_DLLAZY_OR_NOW in the command line if we + find out it does not work in some platform. */ +#ifndef LT_DLLAZY_OR_NOW +# ifdef RTLD_LAZY +# define LT_DLLAZY_OR_NOW RTLD_LAZY +# else +# ifdef DL_LAZY +# define LT_DLLAZY_OR_NOW DL_LAZY +# else +# ifdef RTLD_NOW +# define LT_DLLAZY_OR_NOW RTLD_NOW +# else +# ifdef DL_NOW +# define LT_DLLAZY_OR_NOW DL_NOW +# else +# define LT_DLLAZY_OR_NOW 0 +# endif +# endif +# endif +# endif +#endif + +#ifdef __cplusplus +extern "C" void exit (int); +#endif + +void fnord() { int i=42;} +int main () +{ + void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); + int status = $lt_dlunknown; + + if (self) + { + if (dlsym (self,"fnord")) status = $lt_dlno_uscore; + else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; + /* dlclose (self); */ + } + else + puts (dlerror ()); + + exit (status); +}] +EOF + if AC_TRY_EVAL(ac_link) && test -s conftest${ac_exeext} 2>/dev/null; then + (./conftest; exit; ) >&AS_MESSAGE_LOG_FD 2>/dev/null + lt_status=$? + case x$lt_status in + x$lt_dlno_uscore) $1 ;; + x$lt_dlneed_uscore) $2 ;; + x$lt_dlunknown|x*) $3 ;; + esac + else : + # compilation failed + $3 + fi +fi +rm -fr conftest* +])# _LT_AC_TRY_DLOPEN_SELF + + +# AC_LIBTOOL_DLOPEN_SELF +# ---------------------- +AC_DEFUN([AC_LIBTOOL_DLOPEN_SELF], +[AC_REQUIRE([_LT_AC_CHECK_DLFCN])dnl +if test "x$enable_dlopen" != xyes; then + enable_dlopen=unknown + enable_dlopen_self=unknown + enable_dlopen_self_static=unknown +else + lt_cv_dlopen=no + lt_cv_dlopen_libs= + + case $host_os in + beos*) + lt_cv_dlopen="load_add_on" + lt_cv_dlopen_libs= + lt_cv_dlopen_self=yes + ;; + + mingw* | pw32*) + lt_cv_dlopen="LoadLibrary" + lt_cv_dlopen_libs= + ;; + + cygwin*) + lt_cv_dlopen="dlopen" + lt_cv_dlopen_libs= + ;; + + darwin*) + # if libdl is installed we need to link against it + AC_CHECK_LIB([dl], [dlopen], + [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl"],[ + lt_cv_dlopen="dyld" + lt_cv_dlopen_libs= + lt_cv_dlopen_self=yes + ]) + ;; + + *) + AC_CHECK_FUNC([shl_load], + [lt_cv_dlopen="shl_load"], + [AC_CHECK_LIB([dld], [shl_load], + [lt_cv_dlopen="shl_load" lt_cv_dlopen_libs="-dld"], + [AC_CHECK_FUNC([dlopen], + [lt_cv_dlopen="dlopen"], + [AC_CHECK_LIB([dl], [dlopen], + [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl"], + [AC_CHECK_LIB([svld], [dlopen], + [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-lsvld"], + [AC_CHECK_LIB([dld], [dld_link], + [lt_cv_dlopen="dld_link" lt_cv_dlopen_libs="-dld"]) + ]) + ]) + ]) + ]) + ]) + ;; + esac + + if test "x$lt_cv_dlopen" != xno; then + enable_dlopen=yes + else + enable_dlopen=no + fi + + case $lt_cv_dlopen in + dlopen) + save_CPPFLAGS="$CPPFLAGS" + test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" + + save_LDFLAGS="$LDFLAGS" + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" + + save_LIBS="$LIBS" + LIBS="$lt_cv_dlopen_libs $LIBS" + + AC_CACHE_CHECK([whether a program can dlopen itself], + lt_cv_dlopen_self, [dnl + _LT_AC_TRY_DLOPEN_SELF( + lt_cv_dlopen_self=yes, lt_cv_dlopen_self=yes, + lt_cv_dlopen_self=no, lt_cv_dlopen_self=cross) + ]) + + if test "x$lt_cv_dlopen_self" = xyes; then + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $lt_prog_compiler_static\" + AC_CACHE_CHECK([whether a statically linked program can dlopen itself], + lt_cv_dlopen_self_static, [dnl + _LT_AC_TRY_DLOPEN_SELF( + lt_cv_dlopen_self_static=yes, lt_cv_dlopen_self_static=yes, + lt_cv_dlopen_self_static=no, lt_cv_dlopen_self_static=cross) + ]) + fi + + CPPFLAGS="$save_CPPFLAGS" + LDFLAGS="$save_LDFLAGS" + LIBS="$save_LIBS" + ;; + esac + + case $lt_cv_dlopen_self in + yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; + *) enable_dlopen_self=unknown ;; + esac + + case $lt_cv_dlopen_self_static in + yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; + *) enable_dlopen_self_static=unknown ;; + esac +fi +])# AC_LIBTOOL_DLOPEN_SELF + + +# AC_LIBTOOL_PROG_CC_C_O([TAGNAME]) +# --------------------------------- +# Check to see if options -c and -o are simultaneously supported by compiler +AC_DEFUN([AC_LIBTOOL_PROG_CC_C_O], +[AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl +AC_CACHE_CHECK([if $compiler supports -c -o file.$ac_objext], + [_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)], + [_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=no + $rm -r conftest 2>/dev/null + mkdir conftest + cd conftest + mkdir out + printf "$lt_simple_compile_test_code" > conftest.$ac_ext + + lt_compiler_flag="-o out/conftest2.$ac_objext" + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [[^ ]]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:__oline__: $lt_compile\"" >&AS_MESSAGE_LOG_FD) + (eval "$lt_compile" 2>out/conftest.err) + ac_status=$? + cat out/conftest.err >&AS_MESSAGE_LOG_FD + echo "$as_me:__oline__: \$? = $ac_status" >&AS_MESSAGE_LOG_FD + if (exit $ac_status) && test -s out/conftest2.$ac_objext + then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then + _LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=yes + fi + fi + chmod u+w . 2>&AS_MESSAGE_LOG_FD + $rm conftest* + # SGI C++ compiler will create directory out/ii_files/ for + # template instantiation + test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files + $rm out/* && rmdir out + cd .. + rmdir conftest + $rm conftest* +]) +])# AC_LIBTOOL_PROG_CC_C_O + + +# AC_LIBTOOL_SYS_HARD_LINK_LOCKS([TAGNAME]) +# ----------------------------------------- +# Check to see if we can do hard links to lock some files if needed +AC_DEFUN([AC_LIBTOOL_SYS_HARD_LINK_LOCKS], +[AC_REQUIRE([_LT_AC_LOCK])dnl + +hard_links="nottested" +if test "$_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)" = no && test "$need_locks" != no; then + # do not overwrite the value of need_locks provided by the user + AC_MSG_CHECKING([if we can lock with hard links]) + hard_links=yes + $rm conftest* + ln conftest.a conftest.b 2>/dev/null && hard_links=no + touch conftest.a + ln conftest.a conftest.b 2>&5 || hard_links=no + ln conftest.a conftest.b 2>/dev/null && hard_links=no + AC_MSG_RESULT([$hard_links]) + if test "$hard_links" = no; then + AC_MSG_WARN([`$CC' does not support `-c -o', so `make -j' may be unsafe]) + need_locks=warn + fi +else + need_locks=no +fi +])# AC_LIBTOOL_SYS_HARD_LINK_LOCKS + + +# AC_LIBTOOL_OBJDIR +# ----------------- +AC_DEFUN([AC_LIBTOOL_OBJDIR], +[AC_CACHE_CHECK([for objdir], [lt_cv_objdir], +[rm -f .libs 2>/dev/null +mkdir .libs 2>/dev/null +if test -d .libs; then + lt_cv_objdir=.libs +else + # MS-DOS does not allow filenames that begin with a dot. + lt_cv_objdir=_libs +fi +rmdir .libs 2>/dev/null]) +objdir=$lt_cv_objdir +])# AC_LIBTOOL_OBJDIR + + +# AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH([TAGNAME]) +# ---------------------------------------------- +# Check hardcoding attributes. +AC_DEFUN([AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH], +[AC_MSG_CHECKING([how to hardcode library paths into programs]) +_LT_AC_TAGVAR(hardcode_action, $1)= +if test -n "$_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)" || \ + test -n "$_LT_AC_TAGVAR(runpath_var, $1)" || \ + test "X$_LT_AC_TAGVAR(hardcode_automatic, $1)" = "Xyes" ; then + + # We can hardcode non-existant directories. + if test "$_LT_AC_TAGVAR(hardcode_direct, $1)" != no && + # If the only mechanism to avoid hardcoding is shlibpath_var, we + # have to relink, otherwise we might link with an installed library + # when we should be linking with a yet-to-be-installed one + ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)" != no && + test "$_LT_AC_TAGVAR(hardcode_minus_L, $1)" != no; then + # Linking always hardcodes the temporary library directory. + _LT_AC_TAGVAR(hardcode_action, $1)=relink + else + # We can link without hardcoding, and we can hardcode nonexisting dirs. + _LT_AC_TAGVAR(hardcode_action, $1)=immediate + fi +else + # We cannot hardcode anything, or else we can only hardcode existing + # directories. + _LT_AC_TAGVAR(hardcode_action, $1)=unsupported +fi +AC_MSG_RESULT([$_LT_AC_TAGVAR(hardcode_action, $1)]) + +if test "$_LT_AC_TAGVAR(hardcode_action, $1)" = relink; then + # Fast installation is not supported + enable_fast_install=no +elif test "$shlibpath_overrides_runpath" = yes || + test "$enable_shared" = no; then + # Fast installation is not necessary + enable_fast_install=needless +fi +])# AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH + + +# AC_LIBTOOL_SYS_LIB_STRIP +# ------------------------ +AC_DEFUN([AC_LIBTOOL_SYS_LIB_STRIP], +[striplib= +old_striplib= +AC_MSG_CHECKING([whether stripping libraries is possible]) +if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; then + test -z "$old_striplib" && old_striplib="$STRIP --strip-debug" + test -z "$striplib" && striplib="$STRIP --strip-unneeded" + AC_MSG_RESULT([yes]) +else +# FIXME - insert some real tests, host_os isn't really good enough + case $host_os in + darwin*) + if test -n "$STRIP" ; then + striplib="$STRIP -x" + AC_MSG_RESULT([yes]) + else + AC_MSG_RESULT([no]) +fi + ;; + *) + AC_MSG_RESULT([no]) + ;; + esac +fi +])# AC_LIBTOOL_SYS_LIB_STRIP + + +# AC_LIBTOOL_SYS_DYNAMIC_LINKER +# ----------------------------- +# PORTME Fill in your ld.so characteristics +AC_DEFUN([AC_LIBTOOL_SYS_DYNAMIC_LINKER], +[AC_MSG_CHECKING([dynamic linker characteristics]) +library_names_spec= +libname_spec='lib$name' +soname_spec= +shrext_cmds=".so" +postinstall_cmds= +postuninstall_cmds= +finish_cmds= +finish_eval= +shlibpath_var= +shlibpath_overrides_runpath=unknown +version_type=none +dynamic_linker="$host_os ld.so" +sys_lib_dlsearch_path_spec="/lib /usr/lib" +if test "$GCC" = yes; then + sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` + if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then + # if the path contains ";" then we assume it to be the separator + # otherwise default to the standard path separator (i.e. ":") - it is + # assumed that no part of a normal pathname contains ";" but that should + # okay in the real world where ";" in dirpaths is itself problematic. + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` + else + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` + fi +else + sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" +fi +need_lib_prefix=unknown +hardcode_into_libs=no + +# when you set need_version to no, make sure it does not cause -set_version +# flags to be left without arguments +need_version=unknown + +case $host_os in +aix3*) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix $libname.a' + shlibpath_var=LIBPATH + + # AIX 3 has no versioning support, so we append a major version to the name. + soname_spec='${libname}${release}${shared_ext}$major' + ;; + +aix4* | aix5*) + version_type=linux + need_lib_prefix=no + need_version=no + hardcode_into_libs=yes + if test "$host_cpu" = ia64; then + # AIX 5 supports IA64 + library_names_spec='${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' + shlibpath_var=LD_LIBRARY_PATH + else + # With GCC up to 2.95.x, collect2 would create an import file + # for dependence libraries. The import file would start with + # the line `#! .'. This would cause the generated library to + # depend on `.', always an invalid library. This was fixed in + # development snapshots of GCC prior to 3.0. + case $host_os in + aix4 | aix4.[[01]] | aix4.[[01]].*) + if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' + echo ' yes ' + echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then + : + else + can_build_shared=no + fi + ;; + esac + # AIX (on Power*) has no versioning support, so currently we can not hardcode correct + # soname into executable. Probably we can add versioning support to + # collect2, so additional links can be useful in future. + if test "$aix_use_runtimelinking" = yes; then + # If using run time linking (on AIX 4.2 or later) use lib.so + # instead of lib.a to let people know that these are not + # typical AIX shared libraries. + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + else + # We preserve .a as extension for shared libraries through AIX4.2 + # and later when we are not doing run time linking. + library_names_spec='${libname}${release}.a $libname.a' + soname_spec='${libname}${release}${shared_ext}$major' + fi + shlibpath_var=LIBPATH + fi + ;; + +amigaos*) + library_names_spec='$libname.ixlibrary $libname.a' + # Create ${libname}_ixlibrary.a entries in /sys/libs. + finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`$echo "X$lib" | $Xsed -e '\''s%^.*/\([[^/]]*\)\.ixlibrary$%\1%'\''`; test $rm /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' + ;; + +beos*) + library_names_spec='${libname}${shared_ext}' + dynamic_linker="$host_os ld.so" + shlibpath_var=LIBRARY_PATH + ;; + +bsdi[[45]]*) + version_type=linux + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' + shlibpath_var=LD_LIBRARY_PATH + sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" + sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" + # the default ld.so.conf also contains /usr/contrib/lib and + # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow + # libtool to hard-code these into programs + ;; + +cygwin* | mingw* | pw32*) + version_type=windows + shrext_cmds=".dll" + need_version=no + need_lib_prefix=no + + case $GCC,$host_os in + yes,cygwin* | yes,mingw* | yes,pw32*) + library_names_spec='$libname.dll.a' + # DLL is installed to $(libdir)/../bin by postinstall_cmds + postinstall_cmds='base_file=`basename \${file}`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + test -d \$dldir || mkdir -p \$dldir~ + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname' + postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ + dlpath=$dir/\$dldll~ + $rm \$dlpath' + shlibpath_overrides_runpath=yes + + case $host_os in + cygwin*) + # Cygwin DLLs use 'cyg' prefix rather than 'lib' + soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}' + sys_lib_search_path_spec="/usr/lib /lib/w32api /lib /usr/local/lib" + ;; + mingw*) + # MinGW DLLs use traditional 'lib' prefix + soname_spec='${libname}`echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}' + sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` + if echo "$sys_lib_search_path_spec" | [grep ';[c-zC-Z]:/' >/dev/null]; then + # It is most probably a Windows format PATH printed by + # mingw gcc, but we are running on Cygwin. Gcc prints its search + # path with ; separators, and with drive letters. We can handle the + # drive letters (cygwin fileutils understands them), so leave them, + # especially as we might pass files found there to a mingw objdump, + # which wouldn't understand a cygwinified path. Ahh. + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` + else + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` + fi + ;; + pw32*) + # pw32 DLLs use 'pw' prefix rather than 'lib' + library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}' + ;; + esac + ;; + + *) + library_names_spec='${libname}`echo ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext} $libname.lib' + ;; + esac + dynamic_linker='Win32 ld.exe' + # FIXME: first we should search . and the directory the executable is in + shlibpath_var=PATH + ;; + +darwin* | rhapsody*) + dynamic_linker="$host_os dyld" + version_type=darwin + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${versuffix}$shared_ext ${libname}${release}${major}$shared_ext ${libname}$shared_ext' + soname_spec='${libname}${release}${major}$shared_ext' + shlibpath_overrides_runpath=yes + shlibpath_var=DYLD_LIBRARY_PATH + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' + # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the same. + if test "$GCC" = yes; then + sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` + else + sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' + fi + sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' + ;; + +dgux*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname$shared_ext' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + ;; + +freebsd1*) + dynamic_linker=no + ;; + +kfreebsd*-gnu) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + dynamic_linker='GNU ld.so' + ;; + +freebsd* | dragonfly*) + # DragonFly does not have aout. When/if they implement a new + # versioning mechanism, adjust this. + if test -x /usr/bin/objformat; then + objformat=`/usr/bin/objformat` + else + case $host_os in + freebsd[[123]]*) objformat=aout ;; + *) objformat=elf ;; + esac + fi + version_type=freebsd-$objformat + case $version_type in + freebsd-elf*) + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' + need_version=no + need_lib_prefix=no + ;; + freebsd-*) + library_names_spec='${libname}${release}${shared_ext}$versuffix $libname${shared_ext}$versuffix' + need_version=yes + ;; + esac + shlibpath_var=LD_LIBRARY_PATH + case $host_os in + freebsd2*) + shlibpath_overrides_runpath=yes + ;; + freebsd3.[[01]]* | freebsdelf3.[[01]]*) + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; + freebsd3.[[2-9]]* | freebsdelf3.[[2-9]]* | \ + freebsd4.[[0-5]] | freebsdelf4.[[0-5]] | freebsd4.1.1 | freebsdelf4.1.1) + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; + freebsd*) # from 4.6 on + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; + esac + ;; + +gnu*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + ;; + +hpux9* | hpux10* | hpux11*) + # Give a soname corresponding to the major version so that dld.sl refuses to + # link against other versions. + version_type=sunos + need_lib_prefix=no + need_version=no + case $host_cpu in + ia64*) + shrext_cmds='.so' + hardcode_into_libs=yes + dynamic_linker="$host_os dld.so" + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + if test "X$HPUX_IA64_MODE" = X32; then + sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" + else + sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" + fi + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + ;; + hppa*64*) + shrext_cmds='.sl' + hardcode_into_libs=yes + dynamic_linker="$host_os dld.sl" + shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + ;; + *) + shrext_cmds='.sl' + dynamic_linker="$host_os dld.sl" + shlibpath_var=SHLIB_PATH + shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + ;; + esac + # HP-UX runs *really* slowly unless shared libraries are mode 555. + postinstall_cmds='chmod 555 $lib' + ;; + +interix3*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; + +irix5* | irix6* | nonstopux*) + case $host_os in + nonstopux*) version_type=nonstopux ;; + *) + if test "$lt_cv_prog_gnu_ld" = yes; then + version_type=linux + else + version_type=irix + fi ;; + esac + need_lib_prefix=no + need_version=no + soname_spec='${libname}${release}${shared_ext}$major' + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${release}${shared_ext} $libname${shared_ext}' + case $host_os in + irix5* | nonstopux*) + libsuff= shlibsuff= + ;; + *) + case $LD in # libtool.m4 will add one of these switches to LD + *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") + libsuff= shlibsuff= libmagic=32-bit;; + *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") + libsuff=32 shlibsuff=N32 libmagic=N32;; + *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") + libsuff=64 shlibsuff=64 libmagic=64-bit;; + *) libsuff= shlibsuff= libmagic=never-match;; + esac + ;; + esac + shlibpath_var=LD_LIBRARY${shlibsuff}_PATH + shlibpath_overrides_runpath=no + sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} /usr/local/lib${libsuff}" + sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" + hardcode_into_libs=yes + ;; + +# No shared lib support for Linux oldld, aout, or coff. +linux*oldld* | linux*aout* | linux*coff*) + dynamic_linker=no + ;; + +# This must be Linux ELF. +linux*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + # This implies no fast_install, which is unacceptable. + # Some rework will be needed to allow for fast_install + # before this can be enabled. + hardcode_into_libs=yes + + # find out which ABI we are using + libsuff= + case "$host_cpu" in + x86_64*|s390x*|powerpc64*) + echo '[#]line __oline__ "configure"' > conftest.$ac_ext + if AC_TRY_EVAL(ac_compile); then + case `/usr/bin/file conftest.$ac_objext` in + *64-bit*) + libsuff=64 + sys_lib_search_path_spec="/lib${libsuff} /usr/lib${libsuff} /usr/local/lib${libsuff}" + ;; + esac + fi + rm -rf conftest* + ;; + esac + + # Append ld.so.conf contents to the search path + if test -f /etc/ld.so.conf; then + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", \[$]2)); skip = 1; } { if (!skip) print \[$]0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;/^$/d' | tr '\n' ' '` + sys_lib_dlsearch_path_spec="/lib${libsuff} /usr/lib${libsuff} $lt_ld_extra" + fi + + # We used to test for /lib/ld.so.1 and disable shared libraries on + # powerpc, because MkLinux only supported shared libraries with the + # GNU dynamic linker. Since this was broken with cross compilers, + # most powerpc-linux boxes support dynamic linking these days and + # people can always --disable-shared, the test was removed, and we + # assume the GNU/Linux dynamic linker is in use. + dynamic_linker='GNU/Linux ld.so' + ;; + +knetbsd*-gnu) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + dynamic_linker='GNU ld.so' + ;; + +netbsd*) + version_type=sunos + need_lib_prefix=no + need_version=no + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' + dynamic_linker='NetBSD (a.out) ld.so' + else + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + dynamic_linker='NetBSD ld.elf_so' + fi + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; + +newsos6) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + ;; + +nto-qnx*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + ;; + +openbsd*) + version_type=sunos + sys_lib_dlsearch_path_spec="/usr/lib" + need_lib_prefix=no + # Some older versions of OpenBSD (3.3 at least) *do* need versioned libs. + case $host_os in + openbsd3.3 | openbsd3.3.*) need_version=yes ;; + *) need_version=no ;; + esac + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' + shlibpath_var=LD_LIBRARY_PATH + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then + case $host_os in + openbsd2.[[89]] | openbsd2.[[89]].*) + shlibpath_overrides_runpath=no + ;; + *) + shlibpath_overrides_runpath=yes + ;; + esac + else + shlibpath_overrides_runpath=yes + fi + ;; + +os2*) + libname_spec='$name' + shrext_cmds=".dll" + need_lib_prefix=no + library_names_spec='$libname${shared_ext} $libname.a' + dynamic_linker='OS/2 ld.exe' + shlibpath_var=LIBPATH + ;; + +osf3* | osf4* | osf5*) + version_type=osf + need_lib_prefix=no + need_version=no + soname_spec='${libname}${release}${shared_ext}$major' + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + shlibpath_var=LD_LIBRARY_PATH + sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" + sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" + ;; + +solaris*) + version_type=linux + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + # ldd complains unless libraries are executable + postinstall_cmds='chmod +x $lib' + ;; + +sunos4*) + version_type=sunos + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${shared_ext}$versuffix' + finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + if test "$with_gnu_ld" = yes; then + need_lib_prefix=no + fi + need_version=yes + ;; + +sysv4 | sysv4.3*) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + case $host_vendor in + sni) + shlibpath_overrides_runpath=no + need_lib_prefix=no + export_dynamic_flag_spec='${wl}-Blargedynsym' + runpath_var=LD_RUN_PATH + ;; + siemens) + need_lib_prefix=no + ;; + motorola) + need_lib_prefix=no + need_version=no + shlibpath_overrides_runpath=no + sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' + ;; + esac + ;; + +sysv4*MP*) + if test -d /usr/nec ;then + version_type=linux + library_names_spec='$libname${shared_ext}.$versuffix $libname${shared_ext}.$major $libname${shared_ext}' + soname_spec='$libname${shared_ext}.$major' + shlibpath_var=LD_LIBRARY_PATH + fi + ;; + +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + version_type=freebsd-elf + need_lib_prefix=no + need_version=no + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext} $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + hardcode_into_libs=yes + if test "$with_gnu_ld" = yes; then + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' + shlibpath_overrides_runpath=no + else + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' + shlibpath_overrides_runpath=yes + case $host_os in + sco3.2v5*) + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" + ;; + esac + fi + sys_lib_dlsearch_path_spec='/usr/lib' + ;; + +uts4*) + version_type=linux + library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}$major $libname${shared_ext}' + soname_spec='${libname}${release}${shared_ext}$major' + shlibpath_var=LD_LIBRARY_PATH + ;; + +*) + dynamic_linker=no + ;; +esac +AC_MSG_RESULT([$dynamic_linker]) +test "$dynamic_linker" = no && can_build_shared=no + +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" +if test "$GCC" = yes; then + variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" +fi +])# AC_LIBTOOL_SYS_DYNAMIC_LINKER + + +# _LT_AC_TAGCONFIG +# ---------------- +AC_DEFUN([_LT_AC_TAGCONFIG], +[AC_ARG_WITH([tags], + [AC_HELP_STRING([--with-tags@<:@=TAGS@:>@], + [include additional configurations @<:@automatic@:>@])], + [tagnames="$withval"]) + +if test -f "$ltmain" && test -n "$tagnames"; then + if test ! -f "${ofile}"; then + AC_MSG_WARN([output file `$ofile' does not exist]) + fi + + if test -z "$LTCC"; then + eval "`$SHELL ${ofile} --config | grep '^LTCC='`" + if test -z "$LTCC"; then + AC_MSG_WARN([output file `$ofile' does not look like a libtool script]) + else + AC_MSG_WARN([using `LTCC=$LTCC', extracted from `$ofile']) + fi + fi + if test -z "$LTCFLAGS"; then + eval "`$SHELL ${ofile} --config | grep '^LTCFLAGS='`" + fi + + # Extract list of available tagged configurations in $ofile. + # Note that this assumes the entire list is on one line. + available_tags=`grep "^available_tags=" "${ofile}" | $SED -e 's/available_tags=\(.*$\)/\1/' -e 's/\"//g'` + + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," + for tagname in $tagnames; do + IFS="$lt_save_ifs" + # Check whether tagname contains only valid characters + case `$echo "X$tagname" | $Xsed -e 's:[[-_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890,/] ]::g'` in + "") ;; + *) AC_MSG_ERROR([invalid tag name: $tagname]) + ;; + esac + + if grep "^# ### BEGIN LIBTOOL TAG CONFIG: $tagname$" < "${ofile}" > /dev/null + then + AC_MSG_ERROR([tag name \"$tagname\" already exists]) + fi + + # Update the list of available tags. + if test -n "$tagname"; then + echo appending configuration tag \"$tagname\" to $ofile + + case $tagname in + CXX) + if test -n "$CXX" && ( test "X$CXX" != "Xno" && + ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) || + (test "X$CXX" != "Xg++"))) ; then + AC_LIBTOOL_LANG_CXX_CONFIG + else + tagname="" + fi + ;; + + F77) + if test -n "$F77" && test "X$F77" != "Xno"; then + AC_LIBTOOL_LANG_F77_CONFIG + else + tagname="" + fi + ;; + + GCJ) + if test -n "$GCJ" && test "X$GCJ" != "Xno"; then + AC_LIBTOOL_LANG_GCJ_CONFIG + else + tagname="" + fi + ;; + + RC) + AC_LIBTOOL_LANG_RC_CONFIG + ;; + + *) + AC_MSG_ERROR([Unsupported tag name: $tagname]) + ;; + esac + + # Append the new tag name to the list of available tags. + if test -n "$tagname" ; then + available_tags="$available_tags $tagname" + fi + fi + done + IFS="$lt_save_ifs" + + # Now substitute the updated list of available tags. + if eval "sed -e 's/^available_tags=.*\$/available_tags=\"$available_tags\"/' \"$ofile\" > \"${ofile}T\""; then + mv "${ofile}T" "$ofile" + chmod +x "$ofile" + else + rm -f "${ofile}T" + AC_MSG_ERROR([unable to update list of available tagged configurations.]) + fi +fi +])# _LT_AC_TAGCONFIG + + +# AC_LIBTOOL_DLOPEN +# ----------------- +# enable checks for dlopen support +AC_DEFUN([AC_LIBTOOL_DLOPEN], + [AC_BEFORE([$0],[AC_LIBTOOL_SETUP]) +])# AC_LIBTOOL_DLOPEN + + +# AC_LIBTOOL_WIN32_DLL +# -------------------- +# declare package support for building win32 DLLs +AC_DEFUN([AC_LIBTOOL_WIN32_DLL], +[AC_BEFORE([$0], [AC_LIBTOOL_SETUP]) +])# AC_LIBTOOL_WIN32_DLL + + +# AC_ENABLE_SHARED([DEFAULT]) +# --------------------------- +# implement the --enable-shared flag +# DEFAULT is either `yes' or `no'. If omitted, it defaults to `yes'. +AC_DEFUN([AC_ENABLE_SHARED], +[define([AC_ENABLE_SHARED_DEFAULT], ifelse($1, no, no, yes))dnl +AC_ARG_ENABLE([shared], + [AC_HELP_STRING([--enable-shared@<:@=PKGS@:>@], + [build shared libraries @<:@default=]AC_ENABLE_SHARED_DEFAULT[@:>@])], + [p=${PACKAGE-default} + case $enableval in + yes) enable_shared=yes ;; + no) enable_shared=no ;; + *) + enable_shared=no + # Look at the argument we got. We use all the common list separators. + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," + for pkg in $enableval; do + IFS="$lt_save_ifs" + if test "X$pkg" = "X$p"; then + enable_shared=yes + fi + done + IFS="$lt_save_ifs" + ;; + esac], + [enable_shared=]AC_ENABLE_SHARED_DEFAULT) +])# AC_ENABLE_SHARED + + +# AC_DISABLE_SHARED +# ----------------- +# set the default shared flag to --disable-shared +AC_DEFUN([AC_DISABLE_SHARED], +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl +AC_ENABLE_SHARED(no) +])# AC_DISABLE_SHARED + + +# AC_ENABLE_STATIC([DEFAULT]) +# --------------------------- +# implement the --enable-static flag +# DEFAULT is either `yes' or `no'. If omitted, it defaults to `yes'. +AC_DEFUN([AC_ENABLE_STATIC], +[define([AC_ENABLE_STATIC_DEFAULT], ifelse($1, no, no, yes))dnl +AC_ARG_ENABLE([static], + [AC_HELP_STRING([--enable-static@<:@=PKGS@:>@], + [build static libraries @<:@default=]AC_ENABLE_STATIC_DEFAULT[@:>@])], + [p=${PACKAGE-default} + case $enableval in + yes) enable_static=yes ;; + no) enable_static=no ;; + *) + enable_static=no + # Look at the argument we got. We use all the common list separators. + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," + for pkg in $enableval; do + IFS="$lt_save_ifs" + if test "X$pkg" = "X$p"; then + enable_static=yes + fi + done + IFS="$lt_save_ifs" + ;; + esac], + [enable_static=]AC_ENABLE_STATIC_DEFAULT) +])# AC_ENABLE_STATIC + + +# AC_DISABLE_STATIC +# ----------------- +# set the default static flag to --disable-static +AC_DEFUN([AC_DISABLE_STATIC], +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl +AC_ENABLE_STATIC(no) +])# AC_DISABLE_STATIC + + +# AC_ENABLE_FAST_INSTALL([DEFAULT]) +# --------------------------------- +# implement the --enable-fast-install flag +# DEFAULT is either `yes' or `no'. If omitted, it defaults to `yes'. +AC_DEFUN([AC_ENABLE_FAST_INSTALL], +[define([AC_ENABLE_FAST_INSTALL_DEFAULT], ifelse($1, no, no, yes))dnl +AC_ARG_ENABLE([fast-install], + [AC_HELP_STRING([--enable-fast-install@<:@=PKGS@:>@], + [optimize for fast installation @<:@default=]AC_ENABLE_FAST_INSTALL_DEFAULT[@:>@])], + [p=${PACKAGE-default} + case $enableval in + yes) enable_fast_install=yes ;; + no) enable_fast_install=no ;; + *) + enable_fast_install=no + # Look at the argument we got. We use all the common list separators. + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," + for pkg in $enableval; do + IFS="$lt_save_ifs" + if test "X$pkg" = "X$p"; then + enable_fast_install=yes + fi + done + IFS="$lt_save_ifs" + ;; + esac], + [enable_fast_install=]AC_ENABLE_FAST_INSTALL_DEFAULT) +])# AC_ENABLE_FAST_INSTALL + + +# AC_DISABLE_FAST_INSTALL +# ----------------------- +# set the default to --disable-fast-install +AC_DEFUN([AC_DISABLE_FAST_INSTALL], +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl +AC_ENABLE_FAST_INSTALL(no) +])# AC_DISABLE_FAST_INSTALL + + +# AC_LIBTOOL_PICMODE([MODE]) +# -------------------------- +# implement the --with-pic flag +# MODE is either `yes' or `no'. If omitted, it defaults to `both'. +AC_DEFUN([AC_LIBTOOL_PICMODE], +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl +pic_mode=ifelse($#,1,$1,default) +])# AC_LIBTOOL_PICMODE + + +# AC_PROG_EGREP +# ------------- +# This is predefined starting with Autoconf 2.54, so this conditional +# definition can be removed once we require Autoconf 2.54 or later. +m4_ifndef([AC_PROG_EGREP], [AC_DEFUN([AC_PROG_EGREP], +[AC_CACHE_CHECK([for egrep], [ac_cv_prog_egrep], + [if echo a | (grep -E '(a|b)') >/dev/null 2>&1 + then ac_cv_prog_egrep='grep -E' + else ac_cv_prog_egrep='egrep' + fi]) + EGREP=$ac_cv_prog_egrep + AC_SUBST([EGREP]) +])]) + + +# AC_PATH_TOOL_PREFIX +# ------------------- +# find a file program which can recognise shared library +AC_DEFUN([AC_PATH_TOOL_PREFIX], +[AC_REQUIRE([AC_PROG_EGREP])dnl +AC_MSG_CHECKING([for $1]) +AC_CACHE_VAL(lt_cv_path_MAGIC_CMD, +[case $MAGIC_CMD in +[[\\/*] | ?:[\\/]*]) + lt_cv_path_MAGIC_CMD="$MAGIC_CMD" # Let the user override the test with a path. + ;; +*) + lt_save_MAGIC_CMD="$MAGIC_CMD" + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR +dnl $ac_dummy forces splitting on constant user-supplied paths. +dnl POSIX.2 word splitting is done only on the output of word expansions, +dnl not every word. This closes a longstanding sh security hole. + ac_dummy="ifelse([$2], , $PATH, [$2])" + for ac_dir in $ac_dummy; do + IFS="$lt_save_ifs" + test -z "$ac_dir" && ac_dir=. + if test -f $ac_dir/$1; then + lt_cv_path_MAGIC_CMD="$ac_dir/$1" + if test -n "$file_magic_test_file"; then + case $deplibs_check_method in + "file_magic "*) + file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` + MAGIC_CMD="$lt_cv_path_MAGIC_CMD" + if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | + $EGREP "$file_magic_regex" > /dev/null; then + : + else + cat <&2 + +*** Warning: the command libtool uses to detect shared libraries, +*** $file_magic_cmd, produces output that libtool cannot recognize. +*** The result is that libtool may fail to recognize shared libraries +*** as such. This will affect the creation of libtool libraries that +*** depend on shared libraries, but programs linked with such libtool +*** libraries will work regardless of this problem. Nevertheless, you +*** may want to report the problem to your system manager and/or to +*** bug-libtool at gnu.org + +EOF + fi ;; + esac + fi + break + fi + done + IFS="$lt_save_ifs" + MAGIC_CMD="$lt_save_MAGIC_CMD" + ;; +esac]) +MAGIC_CMD="$lt_cv_path_MAGIC_CMD" +if test -n "$MAGIC_CMD"; then + AC_MSG_RESULT($MAGIC_CMD) +else + AC_MSG_RESULT(no) +fi +])# AC_PATH_TOOL_PREFIX + + +# AC_PATH_MAGIC +# ------------- +# find a file program which can recognise a shared library +AC_DEFUN([AC_PATH_MAGIC], +[AC_PATH_TOOL_PREFIX(${ac_tool_prefix}file, /usr/bin$PATH_SEPARATOR$PATH) +if test -z "$lt_cv_path_MAGIC_CMD"; then + if test -n "$ac_tool_prefix"; then + AC_PATH_TOOL_PREFIX(file, /usr/bin$PATH_SEPARATOR$PATH) + else + MAGIC_CMD=: + fi +fi +])# AC_PATH_MAGIC + + +# AC_PROG_LD +# ---------- +# find the pathname to the GNU or non-GNU linker +AC_DEFUN([AC_PROG_LD], +[AC_ARG_WITH([gnu-ld], + [AC_HELP_STRING([--with-gnu-ld], + [assume the C compiler uses GNU ld @<:@default=no@:>@])], + [test "$withval" = no || with_gnu_ld=yes], + [with_gnu_ld=no]) +AC_REQUIRE([LT_AC_PROG_SED])dnl +AC_REQUIRE([AC_PROG_CC])dnl +AC_REQUIRE([AC_CANONICAL_HOST])dnl +AC_REQUIRE([AC_CANONICAL_BUILD])dnl +ac_prog=ld +if test "$GCC" = yes; then + # Check if gcc -print-prog-name=ld gives a path. + AC_MSG_CHECKING([for ld used by $CC]) + case $host in + *-*-mingw*) + # gcc leaves a trailing carriage return which upsets mingw + ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; + *) + ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; + esac + case $ac_prog in + # Accept absolute paths. + [[\\/]]* | ?:[[\\/]]*) + re_direlt='/[[^/]][[^/]]*/\.\./' + # Canonicalize the pathname of ld + ac_prog=`echo $ac_prog| $SED 's%\\\\%/%g'` + while echo $ac_prog | grep "$re_direlt" > /dev/null 2>&1; do + ac_prog=`echo $ac_prog| $SED "s%$re_direlt%/%"` + done + test -z "$LD" && LD="$ac_prog" + ;; + "") + # If it fails, then pretend we aren't using GCC. + ac_prog=ld + ;; + *) + # If it is relative, then search for the first ld in PATH. + with_gnu_ld=unknown + ;; + esac +elif test "$with_gnu_ld" = yes; then + AC_MSG_CHECKING([for GNU ld]) +else + AC_MSG_CHECKING([for non-GNU ld]) +fi +AC_CACHE_VAL(lt_cv_path_LD, +[if test -z "$LD"; then + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR + for ac_dir in $PATH; do + IFS="$lt_save_ifs" + test -z "$ac_dir" && ac_dir=. + if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then + lt_cv_path_LD="$ac_dir/$ac_prog" + # Check to see if the program is GNU ld. I'd rather use --version, + # but apparently some variants of GNU ld only accept -v. + # Break only if it was the GNU/non-GNU ld that we prefer. + case `"$lt_cv_path_LD" -v 2>&1 &1 /dev/null; then + case $host_cpu in + i*86 ) + # Not sure whether the presence of OpenBSD here was a mistake. + # Let's accept both of them until this is cleared up. + lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[[3-9]]86 (compact )?demand paged shared library' + lt_cv_file_magic_cmd=/usr/bin/file + lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*` + ;; + esac + else + lt_cv_deplibs_check_method=pass_all + fi + ;; + +gnu*) + lt_cv_deplibs_check_method=pass_all + ;; + +hpux10.20* | hpux11*) + lt_cv_file_magic_cmd=/usr/bin/file + case $host_cpu in + ia64*) + lt_cv_deplibs_check_method='file_magic (s[[0-9]][[0-9]][[0-9]]|ELF-[[0-9]][[0-9]]) shared object file - IA64' + lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so + ;; + hppa*64*) + [lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - PA-RISC [0-9].[0-9]'] + lt_cv_file_magic_test_file=/usr/lib/pa20_64/libc.sl + ;; + *) + lt_cv_deplibs_check_method='file_magic (s[[0-9]][[0-9]][[0-9]]|PA-RISC[[0-9]].[[0-9]]) shared library' + lt_cv_file_magic_test_file=/usr/lib/libc.sl + ;; + esac + ;; + +interix3*) + # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a here + lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so|\.a)$' + ;; + +irix5* | irix6* | nonstopux*) + case $LD in + *-32|*"-32 ") libmagic=32-bit;; + *-n32|*"-n32 ") libmagic=N32;; + *-64|*"-64 ") libmagic=64-bit;; + *) libmagic=never-match;; + esac + lt_cv_deplibs_check_method=pass_all + ;; + +# This must be Linux ELF. +linux*) + lt_cv_deplibs_check_method=pass_all + ;; + +netbsd*) + if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then + lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$' + else + lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so|_pic\.a)$' + fi + ;; + +newos6*) + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (executable|dynamic lib)' + lt_cv_file_magic_cmd=/usr/bin/file + lt_cv_file_magic_test_file=/usr/lib/libnls.so + ;; + +nto-qnx*) + lt_cv_deplibs_check_method=unknown + ;; + +openbsd*) + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then + lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|\.so|_pic\.a)$' + else + lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$' + fi + ;; + +osf3* | osf4* | osf5*) + lt_cv_deplibs_check_method=pass_all + ;; + +solaris*) + lt_cv_deplibs_check_method=pass_all + ;; + +sysv4 | sysv4.3*) + case $host_vendor in + motorola) + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[ML]]SB (shared object|dynamic lib) M[[0-9]][[0-9]]* Version [[0-9]]' + lt_cv_file_magic_test_file=`echo /usr/lib/libc.so*` + ;; + ncr) + lt_cv_deplibs_check_method=pass_all + ;; + sequent) + lt_cv_file_magic_cmd='/bin/file' + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit [[LM]]SB (shared object|dynamic lib )' + ;; + sni) + lt_cv_file_magic_cmd='/bin/file' + lt_cv_deplibs_check_method="file_magic ELF [[0-9]][[0-9]]*-bit [[LM]]SB dynamic lib" + lt_cv_file_magic_test_file=/lib/libc.so + ;; + siemens) + lt_cv_deplibs_check_method=pass_all + ;; + pc) + lt_cv_deplibs_check_method=pass_all + ;; + esac + ;; + +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + lt_cv_deplibs_check_method=pass_all + ;; +esac +]) +file_magic_cmd=$lt_cv_file_magic_cmd +deplibs_check_method=$lt_cv_deplibs_check_method +test -z "$deplibs_check_method" && deplibs_check_method=unknown +])# AC_DEPLIBS_CHECK_METHOD + + +# AC_PROG_NM +# ---------- +# find the pathname to a BSD-compatible name lister +AC_DEFUN([AC_PROG_NM], +[AC_CACHE_CHECK([for BSD-compatible nm], lt_cv_path_NM, +[if test -n "$NM"; then + # Let the user override the test. + lt_cv_path_NM="$NM" +else + lt_nm_to_check="${ac_tool_prefix}nm" + if test -n "$ac_tool_prefix" && test "$build" = "$host"; then + lt_nm_to_check="$lt_nm_to_check nm" + fi + for lt_tmp_nm in $lt_nm_to_check; do + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR + for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do + IFS="$lt_save_ifs" + test -z "$ac_dir" && ac_dir=. + tmp_nm="$ac_dir/$lt_tmp_nm" + if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext" ; then + # Check to see if the nm accepts a BSD-compat flag. + # Adding the `sed 1q' prevents false positives on HP-UX, which says: + # nm: unknown option "B" ignored + # Tru64's nm complains that /dev/null is an invalid object file + case `"$tmp_nm" -B /dev/null 2>&1 | sed '1q'` in + */dev/null* | *'Invalid file or object type'*) + lt_cv_path_NM="$tmp_nm -B" + break + ;; + *) + case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in + */dev/null*) + lt_cv_path_NM="$tmp_nm -p" + break + ;; + *) + lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but + continue # so that we can try to find one that supports BSD flags + ;; + esac + ;; + esac + fi + done + IFS="$lt_save_ifs" + done + test -z "$lt_cv_path_NM" && lt_cv_path_NM=nm +fi]) +NM="$lt_cv_path_NM" +])# AC_PROG_NM + + +# AC_CHECK_LIBM +# ------------- +# check for math library +AC_DEFUN([AC_CHECK_LIBM], +[AC_REQUIRE([AC_CANONICAL_HOST])dnl +LIBM= +case $host in +*-*-beos* | *-*-cygwin* | *-*-pw32* | *-*-darwin*) + # These system don't have libm, or don't need it + ;; +*-ncr-sysv4.3*) + AC_CHECK_LIB(mw, _mwvalidcheckl, LIBM="-lmw") + AC_CHECK_LIB(m, cos, LIBM="$LIBM -lm") + ;; +*) + AC_CHECK_LIB(m, cos, LIBM="-lm") + ;; +esac +])# AC_CHECK_LIBM + + +# AC_LIBLTDL_CONVENIENCE([DIRECTORY]) +# ----------------------------------- +# sets LIBLTDL to the link flags for the libltdl convenience library and +# LTDLINCL to the include flags for the libltdl header and adds +# --enable-ltdl-convenience to the configure arguments. Note that +# AC_CONFIG_SUBDIRS is not called here. If DIRECTORY is not provided, +# it is assumed to be `libltdl'. LIBLTDL will be prefixed with +# '${top_builddir}/' and LTDLINCL will be prefixed with '${top_srcdir}/' +# (note the single quotes!). If your package is not flat and you're not +# using automake, define top_builddir and top_srcdir appropriately in +# the Makefiles. +AC_DEFUN([AC_LIBLTDL_CONVENIENCE], +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl + case $enable_ltdl_convenience in + no) AC_MSG_ERROR([this package needs a convenience libltdl]) ;; + "") enable_ltdl_convenience=yes + ac_configure_args="$ac_configure_args --enable-ltdl-convenience" ;; + esac + LIBLTDL='${top_builddir}/'ifelse($#,1,[$1],['libltdl'])/libltdlc.la + LTDLINCL='-I${top_srcdir}/'ifelse($#,1,[$1],['libltdl']) + # For backwards non-gettext consistent compatibility... + INCLTDL="$LTDLINCL" +])# AC_LIBLTDL_CONVENIENCE + + +# AC_LIBLTDL_INSTALLABLE([DIRECTORY]) +# ----------------------------------- +# sets LIBLTDL to the link flags for the libltdl installable library and +# LTDLINCL to the include flags for the libltdl header and adds +# --enable-ltdl-install to the configure arguments. Note that +# AC_CONFIG_SUBDIRS is not called here. If DIRECTORY is not provided, +# and an installed libltdl is not found, it is assumed to be `libltdl'. +# LIBLTDL will be prefixed with '${top_builddir}/'# and LTDLINCL with +# '${top_srcdir}/' (note the single quotes!). If your package is not +# flat and you're not using automake, define top_builddir and top_srcdir +# appropriately in the Makefiles. +# In the future, this macro may have to be called after AC_PROG_LIBTOOL. +AC_DEFUN([AC_LIBLTDL_INSTALLABLE], +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl + AC_CHECK_LIB(ltdl, lt_dlinit, + [test x"$enable_ltdl_install" != xyes && enable_ltdl_install=no], + [if test x"$enable_ltdl_install" = xno; then + AC_MSG_WARN([libltdl not installed, but installation disabled]) + else + enable_ltdl_install=yes + fi + ]) + if test x"$enable_ltdl_install" = x"yes"; then + ac_configure_args="$ac_configure_args --enable-ltdl-install" + LIBLTDL='${top_builddir}/'ifelse($#,1,[$1],['libltdl'])/libltdl.la + LTDLINCL='-I${top_srcdir}/'ifelse($#,1,[$1],['libltdl']) + else + ac_configure_args="$ac_configure_args --enable-ltdl-install=no" + LIBLTDL="-lltdl" + LTDLINCL= + fi + # For backwards non-gettext consistent compatibility... + INCLTDL="$LTDLINCL" +])# AC_LIBLTDL_INSTALLABLE + + +# AC_LIBTOOL_CXX +# -------------- +# enable support for C++ libraries +AC_DEFUN([AC_LIBTOOL_CXX], +[AC_REQUIRE([_LT_AC_LANG_CXX]) +])# AC_LIBTOOL_CXX + + +# _LT_AC_LANG_CXX +# --------------- +AC_DEFUN([_LT_AC_LANG_CXX], +[AC_REQUIRE([AC_PROG_CXX]) +AC_REQUIRE([_LT_AC_PROG_CXXCPP]) +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}CXX]) +])# _LT_AC_LANG_CXX + +# _LT_AC_PROG_CXXCPP +# ------------------ +AC_DEFUN([_LT_AC_PROG_CXXCPP], +[ +AC_REQUIRE([AC_PROG_CXX]) +if test -n "$CXX" && ( test "X$CXX" != "Xno" && + ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) || + (test "X$CXX" != "Xg++"))) ; then + AC_PROG_CXXCPP +fi +])# _LT_AC_PROG_CXXCPP + +# AC_LIBTOOL_F77 +# -------------- +# enable support for Fortran 77 libraries +AC_DEFUN([AC_LIBTOOL_F77], +[AC_REQUIRE([_LT_AC_LANG_F77]) +])# AC_LIBTOOL_F77 + + +# _LT_AC_LANG_F77 +# --------------- +AC_DEFUN([_LT_AC_LANG_F77], +[AC_REQUIRE([AC_PROG_F77]) +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}F77]) +])# _LT_AC_LANG_F77 + + +# AC_LIBTOOL_GCJ +# -------------- +# enable support for GCJ libraries +AC_DEFUN([AC_LIBTOOL_GCJ], +[AC_REQUIRE([_LT_AC_LANG_GCJ]) +])# AC_LIBTOOL_GCJ + + +# _LT_AC_LANG_GCJ +# --------------- +AC_DEFUN([_LT_AC_LANG_GCJ], +[AC_PROVIDE_IFELSE([AC_PROG_GCJ],[], + [AC_PROVIDE_IFELSE([A][M_PROG_GCJ],[], + [AC_PROVIDE_IFELSE([LT_AC_PROG_GCJ],[], + [ifdef([AC_PROG_GCJ],[AC_REQUIRE([AC_PROG_GCJ])], + [ifdef([A][M_PROG_GCJ],[AC_REQUIRE([A][M_PROG_GCJ])], + [AC_REQUIRE([A][C_PROG_GCJ_OR_A][M_PROG_GCJ])])])])])]) +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}GCJ]) +])# _LT_AC_LANG_GCJ + + +# AC_LIBTOOL_RC +# ------------- +# enable support for Windows resource files +AC_DEFUN([AC_LIBTOOL_RC], +[AC_REQUIRE([LT_AC_PROG_RC]) +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}RC]) +])# AC_LIBTOOL_RC + + +# AC_LIBTOOL_LANG_C_CONFIG +# ------------------------ +# Ensure that the configuration vars for the C compiler are +# suitably defined. Those variables are subsequently used by +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. +AC_DEFUN([AC_LIBTOOL_LANG_C_CONFIG], [_LT_AC_LANG_C_CONFIG]) +AC_DEFUN([_LT_AC_LANG_C_CONFIG], +[lt_save_CC="$CC" +AC_LANG_PUSH(C) + +# Source file extension for C test sources. +ac_ext=c + +# Object file extension for compiled C test sources. +objext=o +_LT_AC_TAGVAR(objext, $1)=$objext + +# Code to be used in simple compile tests +lt_simple_compile_test_code="int some_variable = 0;\n" + +# Code to be used in simple link tests +lt_simple_link_test_code='int main(){return(0);}\n' + +_LT_AC_SYS_COMPILER + +# save warnings/boilerplate of simple test code +_LT_COMPILER_BOILERPLATE +_LT_LINKER_BOILERPLATE + +AC_LIBTOOL_PROG_COMPILER_NO_RTTI($1) +AC_LIBTOOL_PROG_COMPILER_PIC($1) +AC_LIBTOOL_PROG_CC_C_O($1) +AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1) +AC_LIBTOOL_PROG_LD_SHLIBS($1) +AC_LIBTOOL_SYS_DYNAMIC_LINKER($1) +AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1) +AC_LIBTOOL_SYS_LIB_STRIP +AC_LIBTOOL_DLOPEN_SELF + +# Report which library types will actually be built +AC_MSG_CHECKING([if libtool supports shared libraries]) +AC_MSG_RESULT([$can_build_shared]) + +AC_MSG_CHECKING([whether to build shared libraries]) +test "$can_build_shared" = "no" && enable_shared=no + +# On AIX, shared libraries and static libraries use the same namespace, and +# are all built from PIC. +case $host_os in +aix3*) + test "$enable_shared" = yes && enable_static=no + if test -n "$RANLIB"; then + archive_cmds="$archive_cmds~\$RANLIB \$lib" + postinstall_cmds='$RANLIB $lib' + fi + ;; + +aix4* | aix5*) + if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; then + test "$enable_shared" = yes && enable_static=no + fi + ;; +esac +AC_MSG_RESULT([$enable_shared]) + +AC_MSG_CHECKING([whether to build static libraries]) +# Make sure either enable_shared or enable_static is yes. +test "$enable_shared" = yes || enable_static=yes +AC_MSG_RESULT([$enable_static]) + +AC_LIBTOOL_CONFIG($1) + +AC_LANG_POP +CC="$lt_save_CC" +])# AC_LIBTOOL_LANG_C_CONFIG + + +# AC_LIBTOOL_LANG_CXX_CONFIG +# -------------------------- +# Ensure that the configuration vars for the C compiler are +# suitably defined. Those variables are subsequently used by +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. +AC_DEFUN([AC_LIBTOOL_LANG_CXX_CONFIG], [_LT_AC_LANG_CXX_CONFIG(CXX)]) +AC_DEFUN([_LT_AC_LANG_CXX_CONFIG], +[AC_LANG_PUSH(C++) +AC_REQUIRE([AC_PROG_CXX]) +AC_REQUIRE([_LT_AC_PROG_CXXCPP]) + +_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no +_LT_AC_TAGVAR(allow_undefined_flag, $1)= +_LT_AC_TAGVAR(always_export_symbols, $1)=no +_LT_AC_TAGVAR(archive_expsym_cmds, $1)= +_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= +_LT_AC_TAGVAR(hardcode_direct, $1)=no +_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= +_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)= +_LT_AC_TAGVAR(hardcode_libdir_separator, $1)= +_LT_AC_TAGVAR(hardcode_minus_L, $1)=no +_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported +_LT_AC_TAGVAR(hardcode_automatic, $1)=no +_LT_AC_TAGVAR(module_cmds, $1)= +_LT_AC_TAGVAR(module_expsym_cmds, $1)= +_LT_AC_TAGVAR(link_all_deplibs, $1)=unknown +_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds +_LT_AC_TAGVAR(no_undefined_flag, $1)= +_LT_AC_TAGVAR(whole_archive_flag_spec, $1)= +_LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no + +# Dependencies to place before and after the object being linked: +_LT_AC_TAGVAR(predep_objects, $1)= +_LT_AC_TAGVAR(postdep_objects, $1)= +_LT_AC_TAGVAR(predeps, $1)= +_LT_AC_TAGVAR(postdeps, $1)= +_LT_AC_TAGVAR(compiler_lib_search_path, $1)= + +# Source file extension for C++ test sources. +ac_ext=cpp + +# Object file extension for compiled C++ test sources. +objext=o +_LT_AC_TAGVAR(objext, $1)=$objext + +# Code to be used in simple compile tests +lt_simple_compile_test_code="int some_variable = 0;\n" + +# Code to be used in simple link tests +lt_simple_link_test_code='int main(int, char *[[]]) { return(0); }\n' + +# ltmain only uses $CC for tagged configurations so make sure $CC is set. +_LT_AC_SYS_COMPILER + +# save warnings/boilerplate of simple test code +_LT_COMPILER_BOILERPLATE +_LT_LINKER_BOILERPLATE + +# Allow CC to be a program name with arguments. +lt_save_CC=$CC +lt_save_LD=$LD +lt_save_GCC=$GCC +GCC=$GXX +lt_save_with_gnu_ld=$with_gnu_ld +lt_save_path_LD=$lt_cv_path_LD +if test -n "${lt_cv_prog_gnu_ldcxx+set}"; then + lt_cv_prog_gnu_ld=$lt_cv_prog_gnu_ldcxx +else + $as_unset lt_cv_prog_gnu_ld +fi +if test -n "${lt_cv_path_LDCXX+set}"; then + lt_cv_path_LD=$lt_cv_path_LDCXX +else + $as_unset lt_cv_path_LD +fi +test -z "${LDCXX+set}" || LD=$LDCXX +CC=${CXX-"c++"} +compiler=$CC +_LT_AC_TAGVAR(compiler, $1)=$CC +_LT_CC_BASENAME([$compiler]) + +# We don't want -fno-exception wen compiling C++ code, so set the +# no_builtin_flag separately +if test "$GXX" = yes; then + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=' -fno-builtin' +else + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)= +fi + +if test "$GXX" = yes; then + # Set up default GNU C++ configuration + + AC_PROG_LD + + # Check if GNU C++ uses GNU ld as the underlying linker, since the + # archiving commands below assume that GNU ld is being used. + if test "$with_gnu_ld" = yes; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath ${wl}$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' + + # If archive_cmds runs LD, not CC, wlarc should be empty + # XXX I think wlarc can be eliminated in ltcf-cxx, but I need to + # investigate it a little bit more. (MM) + wlarc='${wl}' + + # ancient GNU ld didn't support --whole-archive et. al. + if eval "`$CC -print-prog-name=ld` --help 2>&1" | \ + grep 'no-whole-archive' > /dev/null; then + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' + else + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= + fi + else + with_gnu_ld=no + wlarc= + + # A generic and very simple default shared library creation + # command for GNU C++ for the case where it uses the native + # linker, instead of GNU ld. If possible, this setting should + # overridden to take advantage of the native linker features on + # the platform it is being used on. + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib' + fi + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"' + +else + GXX=no + with_gnu_ld=no + wlarc= +fi + +# PORTME: fill in a description of your system's C++ link characteristics +AC_MSG_CHECKING([whether the $compiler linker ($LD) supports shared libraries]) +_LT_AC_TAGVAR(ld_shlibs, $1)=yes +case $host_os in + aix3*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + aix4* | aix5*) + if test "$host_cpu" = ia64; then + # On IA64, the linker does run time linking by default, so we don't + # have to do anything special. + aix_use_runtimelinking=no + exp_sym_flag='-Bexport' + no_entry_flag="" + else + aix_use_runtimelinking=no + + # Test if we are trying to use run time linking or normal + # AIX style linking. If -brtl is somewhere in LDFLAGS, we + # need to do runtime linking. + case $host_os in aix4.[[23]]|aix4.[[23]].*|aix5*) + for ld_flag in $LDFLAGS; do + case $ld_flag in + *-brtl*) + aix_use_runtimelinking=yes + break + ;; + esac + done + ;; + esac + + exp_sym_flag='-bexport' + no_entry_flag='-bnoentry' + fi + + # When large executables or shared objects are built, AIX ld can + # have problems creating the table of contents. If linking a library + # or program results in "error TOC overflow" add -mminimal-toc to + # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not + # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. + + _LT_AC_TAGVAR(archive_cmds, $1)='' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + + if test "$GXX" = yes; then + case $host_os in aix4.[[012]]|aix4.[[012]].*) + # We only want to do this on AIX 4.2 and lower, the check + # below for broken collect2 doesn't work under 4.3+ + collect2name=`${CC} -print-prog-name=collect2` + if test -f "$collect2name" && \ + strings "$collect2name" | grep resolve_lib_name >/dev/null + then + # We have reworked collect2 + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + else + # We have old collect2 + _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported + # It fails to find uninstalled libraries when the uninstalled + # path is not listed in the libpath. Setting hardcode_minus_L + # to unsupported forces relinking + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)= + fi + ;; + esac + shared_flag='-shared' + if test "$aix_use_runtimelinking" = yes; then + shared_flag="$shared_flag "'${wl}-G' + fi + else + # not using gcc + if test "$host_cpu" = ia64; then + # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release + # chokes on -Wl,-G. The following line is correct: + shared_flag='-G' + else + if test "$aix_use_runtimelinking" = yes; then + shared_flag='${wl}-G' + else + shared_flag='${wl}-bM:SRE' + fi + fi + fi + + # It seems that -bexpall does not export symbols beginning with + # underscore (_), so it is better to generate a list of symbols to export. + _LT_AC_TAGVAR(always_export_symbols, $1)=yes + if test "$aix_use_runtimelinking" = yes; then + # Warning - without using the other runtime loading flags (-brtl), + # -berok will link without error, but may produce a broken library. + _LT_AC_TAGVAR(allow_undefined_flag, $1)='-berok' + # Determine the default libpath from the value encoded in an empty executable. + _LT_AC_SYS_LIBPATH_AIX + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath" + + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" + else + if test "$host_cpu" = ia64; then + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R $libdir:/usr/lib:/lib' + _LT_AC_TAGVAR(allow_undefined_flag, $1)="-z nodefs" + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" + else + # Determine the default libpath from the value encoded in an empty executable. + _LT_AC_SYS_LIBPATH_AIX + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath" + # Warning - without using the other run time loading flags, + # -berok will link without error, but may produce a broken library. + _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-bernotok' + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-berok' + # Exported symbols can be pulled into shared objects from archives + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='$convenience' + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes + # This is similar to how AIX traditionally builds its shared libraries. + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' + fi + fi + ;; + + beos*) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + # Joseph Beckenbach says some releases of gcc + # support --undefined. This deserves some investigation. FIXME + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + + chorus*) + case $cc_basename in + *) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + ;; + + cygwin* | mingw* | pw32*) + # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) is actually meaningless, + # as there is no search path for DLLs. + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + _LT_AC_TAGVAR(always_export_symbols, $1)=no + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes + + if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + # If the export-symbols file already is a .def file (1st line + # is EXPORTS), use it as is; otherwise, prepend... + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then + cp $export_symbols $output_objdir/$soname.def; + else + echo EXPORTS > $output_objdir/$soname.def; + cat $export_symbols >> $output_objdir/$soname.def; + fi~ + $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + darwin* | rhapsody*) + case $host_os in + rhapsody* | darwin1.[[012]]) + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}suppress' + ;; + *) # Darwin 1.3 on + if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' + else + case ${MACOSX_DEPLOYMENT_TARGET} in + 10.[[012]]) + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' + ;; + 10.*) + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}dynamic_lookup' + ;; + esac + fi + ;; + esac + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_automatic, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='' + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + + if test "$GXX" = yes ; then + lt_int_apple_cc_single_mod=no + output_verbose_link_cmd='echo' + if $CC -dumpspecs 2>&1 | $EGREP 'single_module' >/dev/null ; then + lt_int_apple_cc_single_mod=yes + fi + if test "X$lt_int_apple_cc_single_mod" = Xyes ; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring' + fi + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds + if test "X$lt_int_apple_cc_single_mod" = Xyes ; then + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib -single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + else + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -r -keep_private_externs -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + fi + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + else + case $cc_basename in + xlc*) + output_verbose_link_cmd='echo' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + ;; + *) + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + fi + ;; + + dgux*) + case $cc_basename in + ec++*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + ghcx*) + # Green Hills C++ Compiler + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + *) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + ;; + freebsd[[12]]*) + # C++ shared libraries reported to be fairly broken before switch to ELF + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + freebsd-elf*) + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + ;; + freebsd* | kfreebsd*-gnu | dragonfly*) + # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF + # conventions + _LT_AC_TAGVAR(ld_shlibs, $1)=yes + ;; + gnu*) + ;; + hpux9*) + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes # Not in the search PATH, + # but as the default + # location of the library. + + case $cc_basename in + CC*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + aCC*) + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC -b ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + # + # There doesn't appear to be a way to prevent this compiler from + # explicitly linking system object files so we need to strip them + # from the output so that they don't get included in the library + # dependencies. + output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | grep "[[-]]L"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + ;; + *) + if test "$GXX" = yes; then + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC -shared -nostdlib -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' + else + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + esac + ;; + hpux10*|hpux11*) + if test $with_gnu_ld = no; then + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + case $host_cpu in + hppa*64*|ia64*) + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='+b $libdir' + ;; + *) + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + ;; + esac + fi + case $host_cpu in + hppa*64*|ia64*) + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + *) + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes # Not in the search PATH, + # but as the default + # location of the library. + ;; + esac + + case $cc_basename in + CC*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + aCC*) + case $host_cpu in + hppa*64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + ia64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + esac + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + # + # There doesn't appear to be a way to prevent this compiler from + # explicitly linking system object files so we need to strip them + # from the output so that they don't get included in the library + # dependencies. + output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | grep "\-L"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + ;; + *) + if test "$GXX" = yes; then + if test $with_gnu_ld = no; then + case $host_cpu in + hppa*64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + ia64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + ;; + esac + fi + else + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + esac + ;; + interix3*) + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + ;; + irix5* | irix6*) + case $cc_basename in + CC*) + # SGI C++ + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -all -multigot $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + + # Archives containing C++ object files must be created using + # "CC -ar", where "CC" is the IRIX C++ compiler. This is + # necessary to make sure instantiated templates are included + # in the archive. + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -ar -WR,-u -o $oldlib $oldobjs' + ;; + *) + if test "$GXX" = yes; then + if test "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` -o $lib' + fi + fi + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + ;; + esac + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + ;; + linux*) + case $cc_basename in + KCC*) + # Kuck and Associates, Inc. (KAI) C++ Compiler + + # KCC will only create a shared library if the output file + # ends with ".so" (or ".sl" for HP-UX), so rename the library + # to its proper name (with version) after linking. + _LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib ${wl}-retain-symbols-file,$export_symbols; mv \$templib $lib' + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + # + # There doesn't appear to be a way to prevent this compiler from + # explicitly linking system object files so we need to strip them + # from the output so that they don't get included in the library + # dependencies. + output_verbose_link_cmd='templist=`$CC $CFLAGS -v conftest.$objext -o libconftest$shared_ext 2>&1 | grep "ld"`; rm -f libconftest$shared_ext; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath,$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' + + # Archives containing C++ object files must be created using + # "CC -Bstatic", where "CC" is the KAI C++ compiler. + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -Bstatic -o $oldlib $oldobjs' + ;; + icpc*) + # Intel C++ + with_gnu_ld=yes + # version 8.0 and above of icpc choke on multiply defined symbols + # if we add $predep_objects and $postdep_objects, however 7.1 and + # earlier do not add the objects themselves. + case `$CC -V 2>&1` in + *"Version 7."*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + ;; + *) # Version 8.0 or newer + tmp_idyn= + case $host_cpu in + ia64*) tmp_idyn=' -i_dynamic';; + esac + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + ;; + esac + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive$convenience ${wl}--no-whole-archive' + ;; + pgCC*) + # Portland Group C++ compiler + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath ${wl}$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + ;; + cxx*) + # Compaq C++ + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o $lib ${wl}-retain-symbols-file $wl$export_symbols' + + runpath_var=LD_RUN_PATH + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + # + # There doesn't appear to be a way to prevent this compiler from + # explicitly linking system object files so we need to strip them + # from the output so that they don't get included in the library + # dependencies. + output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld .*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + ;; + esac + ;; + lynxos*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + m88k*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + mvs*) + case $cc_basename in + cxx*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + *) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + ;; + netbsd*) + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $predep_objects $libobjs $deplibs $postdep_objects $linker_flags' + wlarc= + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + fi + # Workaround some broken pre-1.5 toolchains + output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep conftest.$objext | $SED -e "s:-lgcc -lc -lgcc::"' + ;; + openbsd2*) + # C++ shared libraries are fairly broken + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + openbsd*) + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-retain-symbols-file,$export_symbols -o $lib' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' + fi + output_verbose_link_cmd='echo' + ;; + osf3*) + case $cc_basename in + KCC*) + # Kuck and Associates, Inc. (KAI) C++ Compiler + + # KCC will only create a shared library if the output file + # ends with ".so" (or ".sl" for HP-UX), so rename the library + # to its proper name (with version) after linking. + _LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Archives containing C++ object files must be created using + # "CC -Bstatic", where "CC" is the KAI C++ compiler. + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -Bstatic -o $oldlib $oldobjs' + + ;; + RCC*) + # Rational C++ 2.4.1 + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + cxx*) + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $soname `test -n "$verstring" && echo ${wl}-set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + # + # There doesn't appear to be a way to prevent this compiler from + # explicitly linking system object files so we need to strip them + # from the output so that they don't get included in the library + # dependencies. + output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + ;; + *) + if test "$GXX" = yes && test "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib ${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"' + + else + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + esac + ;; + osf4* | osf5*) + case $cc_basename in + KCC*) + # Kuck and Associates, Inc. (KAI) C++ Compiler + + # KCC will only create a shared library if the output file + # ends with ".so" (or ".sl" for HP-UX), so rename the library + # to its proper name (with version) after linking. + _LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Archives containing C++ object files must be created using + # the KAI C++ compiler. + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -o $oldlib $oldobjs' + ;; + RCC*) + # Rational C++ 2.4.1 + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + cxx*) + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done~ + echo "-hidden">> $lib.exp~ + $CC -shared$allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname -Wl,-input -Wl,$lib.exp `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~ + $rm $lib.exp' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + # + # There doesn't appear to be a way to prevent this compiler from + # explicitly linking system object files so we need to strip them + # from the output so that they don't get included in the library + # dependencies. + output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; echo $list' + ;; + *) + if test "$GXX" = yes && test "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib ${allow_undefined_flag} $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep "\-L"' + + else + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + esac + ;; + psos*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + sunos4*) + case $cc_basename in + CC*) + # Sun C++ 4.x + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + lcc*) + # Lucid + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + *) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + ;; + solaris*) + case $cc_basename in + CC*) + # Sun C++ 4.2, 5.x and Centerline C++ + _LT_AC_TAGVAR(archive_cmds_need_lc,$1)=yes + _LT_AC_TAGVAR(no_undefined_flag, $1)=' -zdefs' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G${allow_undefined_flag} -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $CC -G${allow_undefined_flag} ${wl}-M ${wl}$lib.exp -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + case $host_os in + solaris2.[[0-5]] | solaris2.[[0-5]].*) ;; + *) + # The C++ compiler is used as linker so we must use $wl + # flag to pass the commands to the underlying system + # linker. We must also pass each convience library through + # to the system linker between allextract/defaultextract. + # The C++ compiler will combine linker options so we + # cannot just pass the convience library names through + # without $wl. + # Supported since Solaris 2.6 (maybe 2.5.1?) + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' + ;; + esac + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + + output_verbose_link_cmd='echo' + + # Archives containing C++ object files must be created using + # "CC -xar", where "CC" is the Sun C++ compiler. This is + # necessary to make sure instantiated templates are included + # in the archive. + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -xar -o $oldlib $oldobjs' + ;; + gcx*) + # Green Hills C++ Compiler + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib' + + # The C++ compiler must be used to create the archive. + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC $LDFLAGS -archive -o $oldlib $oldobjs' + ;; + *) + # GNU C++ compiler with Solaris linker + if test "$GXX" = yes && test "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-z ${wl}defs' + if $CC --version | grep -v '^2\.7' > /dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $CC -shared -nostdlib ${wl}-M $wl$lib.exp -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + output_verbose_link_cmd="$CC -shared $CFLAGS -v conftest.$objext 2>&1 | grep \"\-L\"" + else + # g++ 2.7 appears to require `-G' NOT `-shared' on this + # platform. + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G -nostdlib $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $CC -G -nostdlib ${wl}-M $wl$lib.exp -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm $lib.exp' + + # Commands to make compiler produce verbose output that lists + # what "hidden" libraries, object files and flags are used when + # linking a shared library. + output_verbose_link_cmd="$CC -G $CFLAGS -v conftest.$objext 2>&1 | grep \"\-L\"" + fi + + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R $wl$libdir' + fi + ;; + esac + ;; + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[[01]].[[10]]* | unixware7* | sco3.2v5.0.[[024]]*) + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + runpath_var='LD_RUN_PATH' + + case $cc_basename in + CC*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + ;; + sysv5* | sco3.2v5* | sco5v6*) + # Note: We can NOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + # For security reasons, it is highly recommended that you always + # use absolute paths for naming shared libraries, and exclude the + # DT_RUNPATH tag from executables and libraries. But doing so + # requires that you compile everything twice, which is a pain. + # So that behaviour is only enabled if SCOABSPATH is set to a + # non-empty value in the environment. Most likely only useful for + # creating official distributions of packages. + # This is a hack until libtool officially supports absolute path + # names for shared libraries. + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-z,nodefs' + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-Bexport' + runpath_var='LD_RUN_PATH' + + case $cc_basename in + CC*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + ;; + tandem*) + case $cc_basename in + NCC*) + # NonStop-UX NCC 3.20 + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + *) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + ;; + vxworks*) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + *) + # FIXME: insert proper C++ library support + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; +esac +AC_MSG_RESULT([$_LT_AC_TAGVAR(ld_shlibs, $1)]) +test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no && can_build_shared=no + +_LT_AC_TAGVAR(GCC, $1)="$GXX" +_LT_AC_TAGVAR(LD, $1)="$LD" + +AC_LIBTOOL_POSTDEP_PREDEP($1) +AC_LIBTOOL_PROG_COMPILER_PIC($1) +AC_LIBTOOL_PROG_CC_C_O($1) +AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1) +AC_LIBTOOL_PROG_LD_SHLIBS($1) +AC_LIBTOOL_SYS_DYNAMIC_LINKER($1) +AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1) + +AC_LIBTOOL_CONFIG($1) + +AC_LANG_POP +CC=$lt_save_CC +LDCXX=$LD +LD=$lt_save_LD +GCC=$lt_save_GCC +with_gnu_ldcxx=$with_gnu_ld +with_gnu_ld=$lt_save_with_gnu_ld +lt_cv_path_LDCXX=$lt_cv_path_LD +lt_cv_path_LD=$lt_save_path_LD +lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld +lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld +])# AC_LIBTOOL_LANG_CXX_CONFIG + +# AC_LIBTOOL_POSTDEP_PREDEP([TAGNAME]) +# ------------------------------------ +# Figure out "hidden" library dependencies from verbose +# compiler output when linking a shared library. +# Parse the compiler output and extract the necessary +# objects, libraries and library flags. +AC_DEFUN([AC_LIBTOOL_POSTDEP_PREDEP],[ +dnl we can't use the lt_simple_compile_test_code here, +dnl because it contains code intended for an executable, +dnl not a library. It's possible we should let each +dnl tag define a new lt_????_link_test_code variable, +dnl but it's only used here... +ifelse([$1],[],[cat > conftest.$ac_ext < conftest.$ac_ext < conftest.$ac_ext < conftest.$ac_ext <> "$cfgfile" +ifelse([$1], [], +[#! $SHELL + +# `$echo "$cfgfile" | sed 's%^.*/%%'` - Provide generalized library-building support services. +# Generated automatically by $PROGRAM (GNU $PACKAGE $VERSION$TIMESTAMP) +# NOTE: Changes made to this file will be lost: look at ltmain.sh. +# +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001 +# Free Software Foundation, Inc. +# +# This file is part of GNU Libtool: +# Originally by Gordon Matzigkeit , 1996 +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. +# +# As a special exception to the GNU General Public License, if you +# distribute this file as part of a program that contains a +# configuration script generated by Autoconf, you may include it under +# the same distribution terms that you use for the rest of that program. + +# A sed program that does not truncate output. +SED=$lt_SED + +# Sed that helps us avoid accidentally triggering echo(1) options like -n. +Xsed="$SED -e 1s/^X//" + +# The HP-UX ksh and POSIX shell print the target directory to stdout +# if CDPATH is set. +(unset CDPATH) >/dev/null 2>&1 && unset CDPATH + +# The names of the tagged configurations supported by this script. +available_tags= + +# ### BEGIN LIBTOOL CONFIG], +[# ### BEGIN LIBTOOL TAG CONFIG: $tagname]) + +# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: + +# Shell to use when invoking shell scripts. +SHELL=$lt_SHELL + +# Whether or not to build shared libraries. +build_libtool_libs=$enable_shared + +# Whether or not to build static libraries. +build_old_libs=$enable_static + +# Whether or not to add -lc for building shared libraries. +build_libtool_need_lc=$_LT_AC_TAGVAR(archive_cmds_need_lc, $1) + +# Whether or not to disallow shared libs when runtime libs are static +allow_libtool_libs_with_static_runtimes=$_LT_AC_TAGVAR(enable_shared_wi th_static_runtimes, $1) + +# Whether or not to optimize for fast installation. +fast_install=$enable_fast_install + +# The host system. +host_alias=$host_alias +host=$host +host_os=$host_os + +# The build system. +build_alias=$build_alias +build=$build +build_os=$build_os + +# An echo program that does not interpret backslashes. +echo=$lt_echo + +# The archiver. +AR=$lt_AR +AR_FLAGS=$lt_AR_FLAGS + +# A C compiler. +LTCC=$lt_LTCC + +# LTCC compiler flags. +LTCFLAGS=$lt_LTCFLAGS + +# A language-specific compiler. +CC=$lt_[]_LT_AC_TAGVAR(compiler, $1) + +# Is the compiler the GNU C compiler? +with_gcc=$_LT_AC_TAGVAR(GCC, $1) + +# An ERE matcher. +EGREP=$lt_EGREP + +# The linker used to build libraries. +LD=$lt_[]_LT_AC_TAGVAR(LD, $1) + +# Whether we need hard or soft links. +LN_S=$lt_LN_S + +# A BSD-compatible nm program. +NM=$lt_NM + +# A symbol stripping program +STRIP=$lt_STRIP + +# Used to examine libraries when file_magic_cmd begins "file" +MAGIC_CMD=$MAGIC_CMD + +# Used on cygwin: DLL creation program. +DLLTOOL="$DLLTOOL" + +# Used on cygwin: object dumper. +OBJDUMP="$OBJDUMP" + +# Used on cygwin: assembler. +AS="$AS" + +# The name of the directory that contains temporary libtool files. +objdir=$objdir + +# How to create reloadable object files. +reload_flag=$lt_reload_flag +reload_cmds=$lt_reload_cmds + +# How to pass a linker flag through the compiler. +wl=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) + +# Object file suffix (normally "o"). +objext="$ac_objext" + +# Old archive suffix (normally "a"). +libext="$libext" + +# Shared library suffix (normally ".so"). +shrext_cmds='$shrext_cmds' + +# Executable file suffix (normally ""). +exeext="$exeext" + +# Additional compiler flags for building library objects. +pic_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) +pic_mode=$pic_mode + +# What is the maximum length of a command? +max_cmd_len=$lt_cv_sys_max_cmd_len + +# Does compiler simultaneously support -c and -o options? +compiler_c_o=$lt_[]_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1) + +# Must we lock files when doing compilation? +need_locks=$lt_need_locks + +# Do we need the lib prefix for modules? +need_lib_prefix=$need_lib_prefix + +# Do we need a version for libraries? +need_version=$need_version + +# Whether dlopen is supported. +dlopen_support=$enable_dlopen + +# Whether dlopen of programs is supported. +dlopen_self=$enable_dlopen_self + +# Whether dlopen of statically linked programs is supported. +dlopen_self_static=$enable_dlopen_self_static + +# Compiler flag to prevent dynamic linking. +link_static_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_static, $1) + +# Compiler flag to turn off builtin functions. +no_builtin_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1) + +# Compiler flag to allow reflexive dlopens. +export_dynamic_flag_spec=$lt_[]_LT_AC_TAGVAR(export_dynamic_flag_spec, $1) + +# Compiler flag to generate shared objects directly from archives. +whole_archive_flag_spec=$lt_[]_LT_AC_TAGVAR(whole_archive_flag_spec, $1) + +# Compiler flag to generate thread-safe objects. +thread_safe_flag_spec=$lt_[]_LT_AC_TAGVAR(thread_safe_flag_spec, $1) + +# Library versioning type. +version_type=$version_type + +# Format of library name prefix. +libname_spec=$lt_libname_spec + +# List of archive names. First name is the real one, the rest are links. +# The last name is the one that the linker finds with -lNAME. +library_names_spec=$lt_library_names_spec + +# The coded name of the library, if different from the real name. +soname_spec=$lt_soname_spec + +# Commands used to build and install an old-style archive. +RANLIB=$lt_RANLIB +old_archive_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_cmds, $1) +old_postinstall_cmds=$lt_old_postinstall_cmds +old_postuninstall_cmds=$lt_old_postuninstall_cmds + +# Create an old-style archive from a shared archive. +old_archive_from_new_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_from_new_cmds , $1) + +# Create a temporary old-style archive to link instead of a shared archive. +old_archive_from_expsyms_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_from_exps yms_cmds, $1) + +# Commands used to build and install a shared archive. +archive_cmds=$lt_[]_LT_AC_TAGVAR(archive_cmds, $1) +archive_expsym_cmds=$lt_[]_LT_AC_TAGVAR(archive_expsym_cmds, $1) +postinstall_cmds=$lt_postinstall_cmds +postuninstall_cmds=$lt_postuninstall_cmds + +# Commands used to build a loadable module (assumed same as above if empty) +module_cmds=$lt_[]_LT_AC_TAGVAR(module_cmds, $1) +module_expsym_cmds=$lt_[]_LT_AC_TAGVAR(module_expsym_cmds, $1) + +# Commands to strip libraries. +old_striplib=$lt_old_striplib +striplib=$lt_striplib + +# Dependencies to place before the objects being linked to create a +# shared library. +predep_objects=$lt_[]_LT_AC_TAGVAR(predep_objects, $1) + +# Dependencies to place after the objects being linked to create a +# shared library. +postdep_objects=$lt_[]_LT_AC_TAGVAR(postdep_objects, $1) + +# Dependencies to place before the objects being linked to create a +# shared library. +predeps=$lt_[]_LT_AC_TAGVAR(predeps, $1) + +# Dependencies to place after the objects being linked to create a +# shared library. +postdeps=$lt_[]_LT_AC_TAGVAR(postdeps, $1) + +# The library search path used internally by the compiler when linking +# a shared library. +compiler_lib_search_path=$lt_[]_LT_AC_TAGVAR(compiler_lib_search_path, $1) + +# Method to check whether dependent libraries are shared objects. +deplibs_check_method=$lt_deplibs_check_method + +# Command to use when deplibs_check_method == file_magic. +file_magic_cmd=$lt_file_magic_cmd + +# Flag that allows shared libraries with undefined symbols to be built. +allow_undefined_flag=$lt_[]_LT_AC_TAGVAR(allow_undefined_flag, $1) + +# Flag that forces no undefined symbols. +no_undefined_flag=$lt_[]_LT_AC_TAGVAR(no_undefined_flag, $1) + +# Commands used to finish a libtool library installation in a directory. +finish_cmds=$lt_finish_cmds + +# Same as above, but a single script fragment to be evaled but not shown. +finish_eval=$lt_finish_eval + +# Take the output of nm and produce a listing of raw symbols and C names. +global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe + +# Transform the output of nm in a proper C declaration +global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl + +# Transform the output of nm in a C name address pair +global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_a ddress + +# This is the shared library runtime path variable. +runpath_var=$runpath_var + +# This is the shared library path variable. +shlibpath_var=$shlibpath_var + +# Is shlibpath searched before the hard-coded library search path? +shlibpath_overrides_runpath=$shlibpath_overrides_runpath + +# How to hardcode a shared library path into an executable. +hardcode_action=$_LT_AC_TAGVAR(hardcode_action, $1) + +# Whether we should hardcode library paths into libraries. +hardcode_into_libs=$hardcode_into_libs + +# Flag to hardcode \$libdir into a binary during linking. +# This must work even if \$libdir does not exist. +hardcode_libdir_flag_spec=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_flag_spec , $1) + +# If ld is used when linking, flag to hardcode \$libdir into +# a binary during linking. This must work even if \$libdir does +# not exist. +hardcode_libdir_flag_spec_ld=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_flag_s pec_ld, $1) + +# Whether we need a single -rpath flag with a separated argument. +hardcode_libdir_separator=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_separator , $1) + +# Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes DIR into the +# resulting binary. +hardcode_direct=$_LT_AC_TAGVAR(hardcode_direct, $1) + +# Set to yes if using the -LDIR flag during linking hardcodes DIR into the +# resulting binary. +hardcode_minus_L=$_LT_AC_TAGVAR(hardcode_minus_L, $1) + +# Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR into +# the resulting binary. +hardcode_shlibpath_var=$_LT_AC_TAGVAR(hardcode_shlibpath_var, $1) + +# Set to yes if building a shared library automatically hardcodes DIR into the library +# and all subsequent libraries and executables linked against it. +hardcode_automatic=$_LT_AC_TAGVAR(hardcode_automatic, $1) + +# Variables whose values should be saved in libtool wrapper scripts and +# restored at relink time. +variables_saved_for_relink="$variables_saved_for_relink" + +# Whether libtool must link a program against all its dependency libraries. +link_all_deplibs=$_LT_AC_TAGVAR(link_all_deplibs, $1) + +# Compile-time system search path for libraries +sys_lib_search_path_spec=$lt_sys_lib_search_path_spec + +# Run-time system search path for libraries +sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec + +# Fix the shell variable \$srcfile for the compiler. +fix_srcfile_path="$_LT_AC_TAGVAR(fix_srcfile_path, $1)" + +# Set to yes if exported symbols are required. +always_export_symbols=$_LT_AC_TAGVAR(always_export_symbols, $1) + +# The commands to list exported symbols. +export_symbols_cmds=$lt_[]_LT_AC_TAGVAR(export_symbols_cmds, $1) + +# The commands to extract the exported symbol list from a shared archive. +extract_expsyms_cmds=$lt_extract_expsyms_cmds + +# Symbols that should not be listed in the preloaded symbols. +exclude_expsyms=$lt_[]_LT_AC_TAGVAR(exclude_expsyms, $1) + +# Symbols that must always be exported. +include_expsyms=$lt_[]_LT_AC_TAGVAR(include_expsyms, $1) + +ifelse([$1],[], +[# ### END LIBTOOL CONFIG], +[# ### END LIBTOOL TAG CONFIG: $tagname]) + +__EOF__ + +ifelse([$1],[], [ + case $host_os in + aix3*) + cat <<\EOF >> "$cfgfile" + +# AIX sometimes has problems with the GCC collect2 program. For some +# reason, if we set the COLLECT_NAMES environment variable, the problems +# vanish in a puff of smoke. +if test "X${COLLECT_NAMES+set}" != Xset; then + COLLECT_NAMES= + export COLLECT_NAMES +fi +EOF + ;; + esac + + # We use sed instead of cat because bash on DJGPP gets confused if + # if finds mixed CR/LF and LF-only lines. Since sed operates in + # text mode, it properly converts lines to CR/LF. This bash problem + # is reportedly fixed, but why not run on old versions too? + sed '$q' "$ltmain" >> "$cfgfile" || (rm -f "$cfgfile"; exit 1) + + mv -f "$cfgfile" "$ofile" || \ + (rm -f "$ofile" && cp "$cfgfile" "$ofile" && rm -f "$cfgfile") + chmod +x "$ofile" +]) +else + # If there is no Makefile yet, we rely on a make rule to execute + # `config.status --recheck' to rerun these tests and create the + # libtool script then. + ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` + if test -f "$ltmain_in"; then + test -f Makefile && make "$ltmain" + fi +fi +])# AC_LIBTOOL_CONFIG + + +# AC_LIBTOOL_PROG_COMPILER_NO_RTTI([TAGNAME]) +# ------------------------------------------- +AC_DEFUN([AC_LIBTOOL_PROG_COMPILER_NO_RTTI], +[AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl + +_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)= + +if test "$GCC" = yes; then + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=' -fno-builtin' + + AC_LIBTOOL_COMPILER_OPTION([if $compiler supports -fno-rtti -fno-exceptions], + lt_cv_prog_compiler_rtti_exceptions, + [-fno-rtti -fno-exceptions], [], + [_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)="$_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1) -fno-rtti -fno-exceptions"]) +fi +])# AC_LIBTOOL_PROG_COMPILER_NO_RTTI + + +# AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE +# --------------------------------- +AC_DEFUN([AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE], +[AC_REQUIRE([AC_CANONICAL_HOST]) +AC_REQUIRE([AC_PROG_NM]) +AC_REQUIRE([AC_OBJEXT]) +# Check for command to grab the raw symbol name followed by C symbol from nm. +AC_MSG_CHECKING([command to parse $NM output from $compiler object]) +AC_CACHE_VAL([lt_cv_sys_global_symbol_pipe], +[ +# These are sane defaults that work on at least a few old systems. +# [They come from Ultrix. What could be older than Ultrix?!! ;)] + +# Character class describing NM global symbol codes. +symcode='[[BCDEGRST]]' + +# Regexp to match symbols that can be accessed directly from C. +sympat='\([[_A-Za-z]][[_A-Za-z0-9]]*\)' + +# Transform an extracted symbol line into a proper C declaration +lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^. .* \(.*\)$/extern int \1;/p'" + +# Transform an extracted symbol line into symbol name and symbol address +lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode \([[^ ]]*\) \([[^ ]]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" + +# Define system-specific variables. +case $host_os in +aix*) + symcode='[[BCDT]]' + ;; +cygwin* | mingw* | pw32*) + symcode='[[ABCDGISTW]]' + ;; +hpux*) # Its linker distinguishes data from code symbols + if test "$host_cpu" = ia64; then + symcode='[[ABCDEGRST]]' + fi + lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'" + lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([[^ ]]*\) \([[^ ]]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" + ;; +linux*) + if test "$host_cpu" = ia64; then + symcode='[[ABCDGIRSTW]]' + lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'" + lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([[^ ]]*\) \([[^ ]]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" + fi + ;; +irix* | nonstopux*) + symcode='[[BCDEGRST]]' + ;; +osf*) + symcode='[[BCDEGQRST]]' + ;; +solaris*) + symcode='[[BDRT]]' + ;; +sco3.2v5*) + symcode='[[DT]]' + ;; +sysv4.2uw2*) + symcode='[[DT]]' + ;; +sysv5* | sco5v6* | unixware* | OpenUNIX*) + symcode='[[ABDT]]' + ;; +sysv4) + symcode='[[DFNSTU]]' + ;; +esac + +# Handle CRLF in mingw tool chain +opt_cr= +case $build_os in +mingw*) + opt_cr=`echo 'x\{0,1\}' | tr x '\015'` # option cr in regexp + ;; +esac + +# If we're using GNU nm, then use its standard symbol codes. +case `$NM -V 2>&1` in +*GNU* | *'with BFD'*) + symcode='[[ABCDGIRSTW]]' ;; +esac + +# Try without a prefix undercore, then with it. +for ac_symprfx in "" "_"; do + + # Transform symcode, sympat, and symprfx into a raw symbol and a C symbol. + symxfrm="\\1 $ac_symprfx\\2 \\2" + + # Write the raw and C identifiers. + lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[ ]]\($symcode$symcode*\)[[ ]][[ ]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'" + + # Check to see that the pipe works correctly. + pipe_works=no + + rm -f conftest* + cat > conftest.$ac_ext < $nlist) && test -s "$nlist"; then + # Try sorting and uniquifying the output. + if sort "$nlist" | uniq > "$nlist"T; then + mv -f "$nlist"T "$nlist" + else + rm -f "$nlist"T + fi + + # Make sure that we snagged all the symbols we need. + if grep ' nm_test_var$' "$nlist" >/dev/null; then + if grep ' nm_test_func$' "$nlist" >/dev/null; then + cat < conftest.$ac_ext +#ifdef __cplusplus +extern "C" { +#endif + +EOF + # Now generate the symbol file. + eval "$lt_cv_sys_global_symbol_to_cdecl"' < "$nlist" | grep -v main >> conftest.$ac_ext' + + cat <> conftest.$ac_ext +#if defined (__STDC__) && __STDC__ +# define lt_ptr_t void * +#else +# define lt_ptr_t char * +# define const +#endif + +/* The mapping between symbol names and symbols. */ +const struct { + const char *name; + lt_ptr_t address; +} +lt_preloaded_symbols[[]] = +{ +EOF + $SED "s/^$symcode$symcode* \(.*\) \(.*\)$/ {\"\2\", (lt_ptr_t) \&\2},/" < "$nlist" | grep -v main >> conftest.$ac_ext + cat <<\EOF >> conftest.$ac_ext + {0, (lt_ptr_t) 0} +}; + +#ifdef __cplusplus +} +#endif +EOF + # Now try linking the two files. + mv conftest.$ac_objext conftstm.$ac_objext + lt_save_LIBS="$LIBS" + lt_save_CFLAGS="$CFLAGS" + LIBS="conftstm.$ac_objext" + CFLAGS="$CFLAGS$_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)" + if AC_TRY_EVAL(ac_link) && test -s conftest${ac_exeext}; then + pipe_works=yes + fi + LIBS="$lt_save_LIBS" + CFLAGS="$lt_save_CFLAGS" + else + echo "cannot find nm_test_func in $nlist" >&AS_MESSAGE_LOG_FD + fi + else + echo "cannot find nm_test_var in $nlist" >&AS_MESSAGE_LOG_FD + fi + else + echo "cannot run $lt_cv_sys_global_symbol_pipe" >&AS_MESSAGE_LOG_FD + fi + else + echo "$progname: failed program was:" >&AS_MESSAGE_LOG_FD + cat conftest.$ac_ext >&5 + fi + rm -f conftest* conftst* + + # Do not use the global_symbol_pipe unless it works. + if test "$pipe_works" = yes; then + break + else + lt_cv_sys_global_symbol_pipe= + fi +done +]) +if test -z "$lt_cv_sys_global_symbol_pipe"; then + lt_cv_sys_global_symbol_to_cdecl= +fi +if test -z "$lt_cv_sys_global_symbol_pipe$lt_cv_sys_global_symbol_to_cdecl"; then + AC_MSG_RESULT(failed) +else + AC_MSG_RESULT(ok) +fi +]) # AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE + + +# AC_LIBTOOL_PROG_COMPILER_PIC([TAGNAME]) +# --------------------------------------- +AC_DEFUN([AC_LIBTOOL_PROG_COMPILER_PIC], +[_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)= +_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= +_LT_AC_TAGVAR(lt_prog_compiler_static, $1)= + +AC_MSG_CHECKING([for $compiler option to produce PIC]) + ifelse([$1],[CXX],[ + # C++ specific cases for pic, static, wl, etc. + if test "$GXX" = yes; then + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' + + case $host_os in + aix*) + # All AIX code is PIC. + if test "$host_cpu" = ia64; then + # AIX 5 now supports IA64 processor + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + fi + ;; + amigaos*) + # FIXME: we need at least 68020 code to build shared libraries, but + # adding the `-m68020' flag to GCC prevents building anything better, + # like `-m68040'. + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-m68020 -resident32 -malways-restore-a4' + ;; + beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) + # PIC is the default for these OSes. + ;; + mingw* | os2* | pw32*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT' + ;; + darwin* | rhapsody*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fno-common' + ;; + *djgpp*) + # DJGPP does not support shared libraries at all + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= + ;; + interix3*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; + sysv4*MP*) + if test -d /usr/nec; then + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=-Kconform_pic + fi + ;; + hpux*) + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but + # not for PA HP-UX. + case $host_cpu in + hppa*64*|ia64*) + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' + ;; + esac + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' + ;; + esac + else + case $host_os in + aix4* | aix5*) + # All AIX code is PIC. + if test "$host_cpu" = ia64; then + # AIX 5 now supports IA64 processor + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + else + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-bnso -bI:/lib/syscalls.exp' + fi + ;; + chorus*) + case $cc_basename in + cxch68*) + # Green Hills C++ Compiler + # _LT_AC_TAGVAR(lt_prog_compiler_static, $1)="--no_auto_instantiation -u __main -u __premain -u _abort -r $COOL_DIR/lib/libOrb.a $MVME_DIR/lib/CC/libC.a $MVME_DIR/lib/classix/libcx.s.a" + ;; + esac + ;; + darwin*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + case $cc_basename in + xlc*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-qnocommon' + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + ;; + esac + ;; + dgux*) + case $cc_basename in + ec++*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + ;; + ghcx*) + # Green Hills C++ Compiler + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' + ;; + *) + ;; + esac + ;; + freebsd* | kfreebsd*-gnu | dragonfly*) + # FreeBSD uses GNU C++ + ;; + hpux9* | hpux10* | hpux11*) + case $cc_basename in + CC*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive' + if test "$host_cpu" != ia64; then + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z' + fi + ;; + aCC*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive' + case $host_cpu in + hppa*64*|ia64*) + # +Z the default + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z' + ;; + esac + ;; + *) + ;; + esac + ;; + interix*) + # This is c89, which is MS Visual C++ (no shared libs) + # Anyone wants to do a port? + ;; + irix5* | irix6* | nonstopux*) + case $cc_basename in + CC*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' + # CC pic flag -KPIC is the default. + ;; + *) + ;; + esac + ;; + linux*) + case $cc_basename in + KCC*) + # KAI C++ Compiler + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='--backend -Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' + ;; + icpc* | ecpc*) + # Intel C++ + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' + ;; + pgCC*) + # Portland Group C++ compiler. + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fpic' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + cxx*) + # Compaq C++ + # Make sure the PIC flag is empty. It appears that all Alpha + # Linux and Compaq Tru64 Unix objects are PIC. + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' + ;; + *) + ;; + esac + ;; + lynxos*) + ;; + m88k*) + ;; + mvs*) + case $cc_basename in + cxx*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-W c,exportall' + ;; + *) + ;; + esac + ;; + netbsd*) + ;; + osf3* | osf4* | osf5*) + case $cc_basename in + KCC*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='--backend -Wl,' + ;; + RCC*) + # Rational C++ 2.4.1 + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' + ;; + cxx*) + # Digital/Compaq C++ + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + # Make sure the PIC flag is empty. It appears that all Alpha + # Linux and Compaq Tru64 Unix objects are PIC. + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' + ;; + *) + ;; + esac + ;; + psos*) + ;; + solaris*) + case $cc_basename in + CC*) + # Sun C++ 4.2, 5.x and Centerline C++ + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ' + ;; + gcx*) + # Green Hills C++ Compiler + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-PIC' + ;; + *) + ;; + esac + ;; + sunos4*) + case $cc_basename in + CC*) + # Sun C++ 4.x + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + lcc*) + # Lucid + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' + ;; + *) + ;; + esac + ;; + tandem*) + case $cc_basename in + NCC*) + # NonStop-UX NCC 3.20 + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + ;; + *) + ;; + esac + ;; + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + case $cc_basename in + CC*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + esac + ;; + vxworks*) + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no + ;; + esac + fi +], +[ + if test "$GCC" = yes; then + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' + + case $host_os in + aix*) + # All AIX code is PIC. + if test "$host_cpu" = ia64; then + # AIX 5 now supports IA64 processor + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + fi + ;; + + amigaos*) + # FIXME: we need at least 68020 code to build shared libraries, but + # adding the `-m68020' flag to GCC prevents building anything better, + # like `-m68040'. + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-m68020 -resident32 -malways-restore-a4' + ;; + + beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) + # PIC is the default for these OSes. + ;; + + mingw* | pw32* | os2*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT' + ;; + + darwin* | rhapsody*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fno-common' + ;; + + interix3*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; + + msdosdjgpp*) + # Just because we use GCC doesn't mean we suddenly get shared libraries + # on systems that don't support them. + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no + enable_shared=no + ;; + + sysv4*MP*) + if test -d /usr/nec; then + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=-Kconform_pic + fi + ;; + + hpux*) + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but + # not for PA HP-UX. + case $host_cpu in + hppa*64*|ia64*) + # +Z the default + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' + ;; + esac + ;; + + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' + ;; + esac + else + # PORTME Check for flag to pass linker flags through the system compiler. + case $host_os in + aix*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + if test "$host_cpu" = ia64; then + # AIX 5 now supports IA64 processor + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + else + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-bnso -bI:/lib/syscalls.exp' + fi + ;; + darwin*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + case $cc_basename in + xlc*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-qnocommon' + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + ;; + esac + ;; + + mingw* | pw32* | os2*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT' + ;; + + hpux9* | hpux10* | hpux11*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but + # not for PA HP-UX. + case $host_cpu in + hppa*64*|ia64*) + # +Z the default + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z' + ;; + esac + # Is there a better lt_prog_compiler_static that works with the bundled CC? + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive' + ;; + + irix5* | irix6* | nonstopux*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + # PIC (with -KPIC) is the default. + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' + ;; + + newsos6) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + + linux*) + case $cc_basename in + icc* | ecc*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' + ;; + pgcc* | pgf77* | pgf90* | pgf95*) + # Portland Group compilers (*not* the Pentium gcc compiler, + # which looks to be a dead project) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fpic' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + ccc*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + # All Alpha code is PIC. + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' + ;; + esac + ;; + + osf3* | osf4* | osf5*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + # All OSF/1 code is PIC. + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' + ;; + + solaris*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + case $cc_basename in + f77* | f90* | f95*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ';; + *) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,';; + esac + ;; + + sunos4*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-PIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + + sysv4 | sysv4.2uw2* | sysv4.3*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + + sysv4*MP*) + if test -d /usr/nec ;then + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-Kconform_pic' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + fi + ;; + + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + + unicos*) + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no + ;; + + uts4*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' + ;; + + *) + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no + ;; + esac + fi +]) +AC_MSG_RESULT([$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)]) + +# +# Check to make sure the PIC flag actually works. +# +if test -n "$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)"; then + AC_LIBTOOL_COMPILER_OPTION([if $compiler PIC flag $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) works], + _LT_AC_TAGVAR(lt_prog_compiler_pic_works, $1), + [$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)ifelse([$1],[],[ -DPIC],[ifelse([$1],[CXX],[ -DPIC],[])])], [], + [case $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) in + "" | " "*) ;; + *) _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=" $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)" ;; + esac], + [_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no]) +fi +case $host_os in + # For platforms which do not support PIC, -DPIC is meaningless: + *djgpp*) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= + ;; + *) + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)="$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)ifelse([$1],[],[ -DPIC],[ifelse([$1],[CXX],[ -DPIC],[])])" + ;; +esac + +# +# Check to make sure the static flag actually works. +# +wl=$_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) eval lt_tmp_static_flag=\"$_LT_AC_TAGVAR(lt_prog_compiler_static, $1)\" +AC_LIBTOOL_LINKER_OPTION([if $compiler static flag $lt_tmp_static_flag works], + _LT_AC_TAGVAR(lt_prog_compiler_static_works, $1), + $lt_tmp_static_flag, + [], + [_LT_AC_TAGVAR(lt_prog_compiler_static, $1)=]) +]) + + +# AC_LIBTOOL_PROG_LD_SHLIBS([TAGNAME]) +# ------------------------------------ +# See if the linker supports building shared libraries. +AC_DEFUN([AC_LIBTOOL_PROG_LD_SHLIBS], +[AC_MSG_CHECKING([whether the $compiler linker ($LD) supports shared libraries]) +ifelse([$1],[CXX],[ + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' + case $host_os in + aix4* | aix5*) + # If we're using GNU nm, then we don't want the "-C" option. + # -C means demangle to AIX nm, but means don't demangle with GNU nm + if $NM -V 2>&1 | grep 'GNU' > /dev/null; then + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols' + else + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols' + fi + ;; + pw32*) + _LT_AC_TAGVAR(export_symbols_cmds, $1)="$ltdll_cmds" + ;; + cygwin* | mingw*) + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[[BCDGRS]] /s/.* \([[^ ]]*\)/\1 DATA/;/^.* __nm__/s/^.* __nm__\([[^ ]]*\) [[^ ]]*/\1 DATA/;/^I /d;/^[[AITW]] /s/.* //'\'' | sort | uniq > $export_symbols' + ;; + *) + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' + ;; + esac +],[ + runpath_var= + _LT_AC_TAGVAR(allow_undefined_flag, $1)= + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no + _LT_AC_TAGVAR(archive_cmds, $1)= + _LT_AC_TAGVAR(archive_expsym_cmds, $1)= + _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)= + _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1)= + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= + _LT_AC_TAGVAR(thread_safe_flag_spec, $1)= + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)= + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)= + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_minus_L, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported + _LT_AC_TAGVAR(link_all_deplibs, $1)=unknown + _LT_AC_TAGVAR(hardcode_automatic, $1)=no + _LT_AC_TAGVAR(module_cmds, $1)= + _LT_AC_TAGVAR(module_expsym_cmds, $1)= + _LT_AC_TAGVAR(always_export_symbols, $1)=no + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' + # include_expsyms should be a list of space-separated symbols to be *always* + # included in the symbol list + _LT_AC_TAGVAR(include_expsyms, $1)= + # exclude_expsyms can be an extended regexp of symbols to exclude + # it will be wrapped by ` (' and `)$', so one must not match beginning or + # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' and `bc', + # as well as any symbol that contains `d'. + _LT_AC_TAGVAR(exclude_expsyms, $1)="_GLOBAL_OFFSET_TABLE_" + # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out + # platforms (ab)use it in PIC code, but their linkers get confused if + # the symbol is explicitly referenced. Since portable code cannot + # rely on this symbol name, it's probably fine to never include it in + # preloaded symbol tables. + extract_expsyms_cmds= + # Just being paranoid about ensuring that cc_basename is set. + _LT_CC_BASENAME([$compiler]) + case $host_os in + cygwin* | mingw* | pw32*) + # FIXME: the MSVC++ port hasn't been tested in a loooong time + # When not using gcc, we currently assume that we are using + # Microsoft Visual C++. + if test "$GCC" != yes; then + with_gnu_ld=no + fi + ;; + interix*) + # we just hope/assume this is gcc and not c89 (= MSVC++) + with_gnu_ld=yes + ;; + openbsd*) + with_gnu_ld=no + ;; + esac + + _LT_AC_TAGVAR(ld_shlibs, $1)=yes + if test "$with_gnu_ld" = yes; then + # If archive_cmds runs LD, not CC, wlarc should be empty + wlarc='${wl}' + + # Set some defaults for GNU ld with shared library support. These + # are reset later if shared libraries are not supported. Putting them + # here allows them to be overridden if necessary. + runpath_var=LD_RUN_PATH + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath ${wl}$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' + # ancient GNU ld didn't support --whole-archive et. al. + if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' + else + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= + fi + supports_anon_versioning=no + case `$LD -v 2>/dev/null` in + *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.10.*) ;; # catch versions < 2.11 + *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... + *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... + *\ 2.11.*) ;; # other 2.11 versions + *) supports_anon_versioning=yes ;; + esac + + # See if GNU ld supports shared libraries. + case $host_os in + aix3* | aix4* | aix5*) + # On AIX/PPC, the GNU linker is very broken + if test "$host_cpu" != ia64; then + _LT_AC_TAGVAR(ld_shlibs, $1)=no + cat <&2 + +*** Warning: the GNU linker, at least up to release 2.9.1, is reported +*** to be unable to reliably create shared libraries on AIX. +*** Therefore, libtool is disabling shared libraries support. If you +*** really care for shared libraries, you may want to modify your PATH +*** so that a non-GNU linker is found, and then restart. + +EOF + fi + ;; + + amigaos*) + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + + # Samuel A. Falvo II reports + # that the semantics of dynamic libraries on AmigaOS, at least up + # to version 4, is to share data among multiple programs linked + # with the same dynamic library. Since this doesn't match the + # behavior of shared libraries on other platforms, we can't use + # them. + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + + beos*) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + # Joseph Beckenbach says some releases of gcc + # support --undefined. This deserves some investigation. FIXME + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -nostart $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + + cygwin* | mingw* | pw32*) + # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) is actually meaningless, + # as there is no search path for DLLs. + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + _LT_AC_TAGVAR(always_export_symbols, $1)=no + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[[BCDGRS]] /s/.* \([[^ ]]*\)/\1 DATA/'\'' | $SED -e '\''/^[[AITW]] /s/.* //'\'' | sort | uniq > $export_symbols' + + if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + # If the export-symbols file already is a .def file (1st line + # is EXPORTS), use it as is; otherwise, prepend... + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='if test "x`$SED 1q $export_symbols`" = xEXPORTS; then + cp $export_symbols $output_objdir/$soname.def; + else + echo EXPORTS > $output_objdir/$soname.def; + cat $export_symbols >> $output_objdir/$soname.def; + fi~ + $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + + interix3*) + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed "s,^,_," $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags ${wl}-h,$soname ${wl}--retain-symbols-file,$output_objdir/$soname.expsym ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + ;; + + linux*) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + tmp_addflag= + case $cc_basename,$host_cpu in + pgcc*) # Portland Group C compiler + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag' + ;; + pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 compilers + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}--no-whole-archive' + tmp_addflag=' $pic_flag -Mnomain' ;; + ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 + tmp_addflag=' -i_dynamic' ;; + efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 + tmp_addflag=' -i_dynamic -nofor_main' ;; + ifc* | ifort*) # Intel Fortran compiler + tmp_addflag=' -nofor_main' ;; + esac + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + + if test $supports_anon_versioning = yes; then + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $output_objdir/$libname.ver~ + cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ + $echo "local: *; };" >> $output_objdir/$libname.ver~ + $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-version-script ${wl}$output_objdir/$libname.ver -o $lib' + fi + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + + netbsd*) + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' + wlarc= + else + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + fi + ;; + + solaris*) + if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then + _LT_AC_TAGVAR(ld_shlibs, $1)=no + cat <&2 + +*** Warning: The releases 2.8.* of the GNU linker cannot reliably +*** create shared libraries on Solaris systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.9.1 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. + +EOF + elif $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + + sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) + case `$LD -v 2>&1` in + *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.1[[0-5]].*) + _LT_AC_TAGVAR(ld_shlibs, $1)=no + cat <<_LT_EOF 1>&2 + +*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not +*** reliably create shared libraries on SCO systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.16.91.0.3 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. + +_LT_EOF + ;; + *) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-f ile,$export_symbols -o $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + esac + ;; + + sunos4*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' + wlarc= + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + *) + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' + else + _LT_AC_TAGVAR(ld_shlibs, $1)=no + fi + ;; + esac + + if test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no; then + runpath_var= + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= + fi + else + # PORTME fill in a description of your system's linker (not GNU ld) + case $host_os in + aix3*) + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + _LT_AC_TAGVAR(always_export_symbols, $1)=yes + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' + # Note: this linker hardcodes the directories in LIBPATH if there + # are no directories specified by -L. + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then + # Neither direct hardcoding nor static linking is supported with a + # broken collect2. + _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported + fi + ;; + + aix4* | aix5*) + if test "$host_cpu" = ia64; then + # On IA64, the linker does run time linking by default, so we don't + # have to do anything special. + aix_use_runtimelinking=no + exp_sym_flag='-Bexport' + no_entry_flag="" + else + # If we're using GNU nm, then we don't want the "-C" option. + # -C means demangle to AIX nm, but means don't demangle with GNU nm + if $NM -V 2>&1 | grep 'GNU' > /dev/null; then + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols' + else + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort -u > $export_symbols' + fi + aix_use_runtimelinking=no + + # Test if we are trying to use run time linking or normal + # AIX style linking. If -brtl is somewhere in LDFLAGS, we + # need to do runtime linking. + case $host_os in aix4.[[23]]|aix4.[[23]].*|aix5*) + for ld_flag in $LDFLAGS; do + if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); then + aix_use_runtimelinking=yes + break + fi + done + ;; + esac + + exp_sym_flag='-bexport' + no_entry_flag='-bnoentry' + fi + + # When large executables or shared objects are built, AIX ld can + # have problems creating the table of contents. If linking a library + # or program results in "error TOC overflow" add -mminimal-toc to + # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not + # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. + + _LT_AC_TAGVAR(archive_cmds, $1)='' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + + if test "$GCC" = yes; then + case $host_os in aix4.[[012]]|aix4.[[012]].*) + # We only want to do this on AIX 4.2 and lower, the check + # below for broken collect2 doesn't work under 4.3+ + collect2name=`${CC} -print-prog-name=collect2` + if test -f "$collect2name" && \ + strings "$collect2name" | grep resolve_lib_name >/dev/null + then + # We have reworked collect2 + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + else + # We have old collect2 + _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported + # It fails to find uninstalled libraries when the uninstalled + # path is not listed in the libpath. Setting hardcode_minus_L + # to unsupported forces relinking + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)= + fi + ;; + esac + shared_flag='-shared' + if test "$aix_use_runtimelinking" = yes; then + shared_flag="$shared_flag "'${wl}-G' + fi + else + # not using gcc + if test "$host_cpu" = ia64; then + # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release + # chokes on -Wl,-G. The following line is correct: + shared_flag='-G' + else + if test "$aix_use_runtimelinking" = yes; then + shared_flag='${wl}-G' + else + shared_flag='${wl}-bM:SRE' + fi + fi + fi + + # It seems that -bexpall does not export symbols beginning with + # underscore (_), so it is better to generate a list of symbols to export. + _LT_AC_TAGVAR(always_export_symbols, $1)=yes + if test "$aix_use_runtimelinking" = yes; then + # Warning - without using the other runtime loading flags (-brtl), + # -berok will link without error, but may produce a broken library. + _LT_AC_TAGVAR(allow_undefined_flag, $1)='-berok' + # Determine the default libpath from the value encoded in an empty executable. + _LT_AC_SYS_LIBPATH_AIX + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath" + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo "${wl}${allow_undefined_flag}"; else :; fi` '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" + else + if test "$host_cpu" = ia64; then + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R $libdir:/usr/lib:/lib' + _LT_AC_TAGVAR(allow_undefined_flag, $1)="-z nodefs" + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' $compiler_flags ${wl}${allow_undefined_flag} '"\${wl}$exp_sym_flag:\$export_symbols" + else + # Determine the default libpath from the value encoded in an empty executable. + _LT_AC_SYS_LIBPATH_AIX + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-blibpath:$libdir:'"$aix_libpath" + # Warning - without using the other run time loading flags, + # -berok will link without error, but may produce a broken library. + _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-bernotok' + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-berok' + # Exported symbols can be pulled into shared objects from archives + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='$convenience' + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes + # This is similar to how AIX traditionally builds its shared libraries. + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$soname' + fi + fi + ;; + + amigaos*) + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + # see comment about different semantics on the GNU ld section + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + + bsdi[[45]]*) + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=-rdynamic + ;; + + cygwin* | mingw* | pw32*) + # When not using gcc, we currently assume that we are using + # Microsoft Visual C++. + # hardcode_libdir_flag_spec is actually meaningless, as there is + # no search path for DLLs. + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=' ' + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + # Tell ltmain to make .lib files, not .a files. + libext=lib + # Tell ltmain to make .dll files, not .so files. + shrext_cmds=".dll" + # FIXME: Setting linknames here is a bad hack. + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -o $lib $libobjs $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link -dll~linknames=' + # The linker will automatically build a .lib file if we build a DLL. + _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)='true' + # FIXME: Should let the user specify the lib program. + _LT_AC_TAGVAR(old_archive_cmds, $1)='lib /OUT:$oldlib$oldobjs$old_deplibs' + _LT_AC_TAGVAR(fix_srcfile_path, $1)='`cygpath -w "$srcfile"`' + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes + ;; + + darwin* | rhapsody*) + case $host_os in + rhapsody* | darwin1.[[012]]) + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}suppress' + ;; + *) # Darwin 1.3 on + if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' + else + case ${MACOSX_DEPLOYMENT_TARGET} in + 10.[[012]]) + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' + ;; + 10.*) + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined ${wl}dynamic_lookup' + ;; + esac + fi + ;; + esac + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_automatic, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='' + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + if test "$GCC" = yes ; then + output_verbose_link_cmd='echo' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring' + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags -install_name $rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + else + case $cc_basename in + xlc*) + output_verbose_link_cmd='echo' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags' + # Don't fix this by using the ld -exported_symbols_list flag, it doesn't exist in older darwin lds + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s $output_objdir/${libname}-symbols.expsym ${lib}' + ;; + *) + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + fi + ;; + + dgux*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + freebsd1*) + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + + # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor + # support. Future versions do this automatically, but an explicit c++rt0.o + # does not break anything, and helps significantly (at the cost of a little + # extra space). + freebsd2.2*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + # Unfortunately, older versions of FreeBSD 2 do not have this feature. + freebsd2*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + # FreeBSD 3 and greater uses gcc -shared to do shared libraries. + freebsd* | kfreebsd*-gnu | dragonfly*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + hpux9*) + if test "$GCC" = yes; then + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' + fi + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + ;; + + hpux10*) + if test "$GCC" = yes -a "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + fi + if test "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + fi + ;; + + hpux11*) + if test "$GCC" = yes -a "$with_gnu_ld" = no; then + case $host_cpu in + hppa*64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + else + case $host_cpu in + hppa*64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + fi + if test "$with_gnu_ld" = no; then + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + + case $host_cpu in + hppa*64*|ia64*) + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='+b $libdir' + _LT_AC_TAGVAR(hardcode_direct, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + *) + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + ;; + esac + fi + ;; + + irix5* | irix6* | nonstopux*) + if test "$GCC" = yes; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='-rpath $libdir' + fi + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + ;; + + netbsd*) + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out + else + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF + fi + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + newsos6) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + openbsd*) + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags ${wl}-retain-symbols-file,$export_symbols' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' + else + case $host_os in + openbsd[[01]].* | openbsd2.[[0-7]] | openbsd2.[[0-7]].*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + ;; + *) + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' + ;; + esac + fi + ;; + + os2*) + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported + _LT_AC_TAGVAR(archive_cmds, $1)='$echo "LIBRARY $libname INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> $output_objdir/$libname.def~$echo EXPORTS >> $output_objdir/$libname.def~emxexp $libobjs >> $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs $compiler_flags $output_objdir/$libname.def' + _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o $output_objdir/$libname.a $output_objdir/$libname.def' + ;; + + osf3*) + if test "$GCC" = yes; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + else + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*' + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + fi + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + ;; + + osf4* | osf5*) # as osf3* with the addition of -msym flag + if test "$GCC" = yes; then + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-expect_unresolved ${wl}\*' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir' + else + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*' + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; echo "-hidden">> $lib.exp~ + $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && echo -set_version $verstring` -update_registry ${output_objdir}/so_locations -o $lib~$rm $lib.exp' + + # Both c and cxx compiler support -rpath directly + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir' + fi + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: + ;; + + solaris*) + _LT_AC_TAGVAR(no_undefined_flag, $1)=' -z text' + if test "$GCC" = yes; then + wlarc='${wl}' + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags~$rm $lib.exp' + else + wlarc='' + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G${allow_undefined_flag} -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo "local: *; };" >> $lib.exp~ + $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$rm $lib.exp' + fi + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + case $host_os in + solaris2.[[0-5]] | solaris2.[[0-5]].*) ;; + *) + # The compiler driver will combine linker options so we + # cannot just pass the convience library names through + # without $wl, iff we do not link with $LD. + # Luckily, gcc supports the same syntax we need for Sun Studio. + # Supported since Solaris 2.6 (maybe 2.5.1?) + case $wlarc in + '') + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='-z allextract$convenience -z defaultextract' ;; + *) + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}-z ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; + esac ;; + esac + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + ;; + + sunos4*) + if test "x$host_vendor" = xsequent; then + # Use $CC to link under sequent, because it throws in some extra .o + # files that make .init and .fini sections work. + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h $soname -o $lib $libobjs $deplibs $compiler_flags' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' + fi + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + sysv4) + case $host_vendor in + sni) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_direct, $1)=yes # is this really true??? + ;; + siemens) + ## LD is ld it makes a PLAMLIB + ## CC just makes a GrossModule. + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(reload_cmds, $1)='$CC -r -o $output$reload_objs' + _LT_AC_TAGVAR(hardcode_direct, $1)=no + ;; + motorola) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_direct, $1)=no #Motorola manual says yes, but my tests say they lie + ;; + esac + runpath_var='LD_RUN_PATH' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + sysv4.3*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='-Bexport' + ;; + + sysv4*MP*) + if test -d /usr/nec; then + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + runpath_var=LD_RUN_PATH + hardcode_runpath_var=yes + _LT_AC_TAGVAR(ld_shlibs, $1)=yes + fi + ;; + + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[[01]].[[10]]* | unixware7*) + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + runpath_var='LD_RUN_PATH' + + if test "$GCC" = yes; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + fi + ;; + + sysv5* | sco3.2v5* | sco5v6*) + # Note: We can NOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-z,nodefs' + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z "$SCOABSPATH" && echo ${wl}-R,$libdir`' + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-Bexport' + runpath_var='LD_RUN_PATH' + + if test "$GCC" = yes; then + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + else + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G ${wl}-Bexport:$export_symbols ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs $deplibs $compiler_flags' + fi + ;; + + uts4*) + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no + ;; + + *) + _LT_AC_TAGVAR(ld_shlibs, $1)=no + ;; + esac + fi +]) +AC_MSG_RESULT([$_LT_AC_TAGVAR(ld_shlibs, $1)]) +test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no && can_build_shared=no + +# +# Do we need to explicitly link libc? +# +case "x$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)" in +x|xyes) + # Assume -lc should be added + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes + + if test "$enable_shared" = yes && test "$GCC" = yes; then + case $_LT_AC_TAGVAR(archive_cmds, $1) in + *'~'*) + # FIXME: we may have to deal with multi-command sequences. + ;; + '$CC '*) + # Test whether the compiler implicitly links with -lc since on some + # systems, -lgcc has to come before -lc. If gcc already passes -lc + # to ld, don't add -lc before -lgcc. + AC_MSG_CHECKING([whether -lc should be explicitly linked in]) + $rm conftest* + printf "$lt_simple_compile_test_code" > conftest.$ac_ext + + if AC_TRY_EVAL(ac_compile) 2>conftest.err; then + soname=conftest + lib=conftest + libobjs=conftest.$ac_objext + deplibs= + wl=$_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) + pic_flag=$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) + compiler_flags=-v + linker_flags=-v + verstring= + output_objdir=. + libname=conftest + lt_save_allow_undefined_flag=$_LT_AC_TAGVAR(allow_undefined_flag, $1) + _LT_AC_TAGVAR(allow_undefined_flag, $1)= + if AC_TRY_EVAL(_LT_AC_TAGVAR(archive_cmds, $1) 2\>\&1 \| grep \" -lc \" \>/dev/null 2\>\&1) + then + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no + else + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes + fi + _LT_AC_TAGVAR(allow_undefined_flag, $1)=$lt_save_allow_undefined_flag + else + cat conftest.err 1>&5 + fi + $rm conftest* + AC_MSG_RESULT([$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)]) + ;; + esac + fi + ;; +esac +])# AC_LIBTOOL_PROG_LD_SHLIBS + + +# _LT_AC_FILE_LTDLL_C +# ------------------- +# Be careful that the start marker always follows a newline. +AC_DEFUN([_LT_AC_FILE_LTDLL_C], [ +# /* ltdll.c starts here */ +# #define WIN32_LEAN_AND_MEAN +# #include +# #undef WIN32_LEAN_AND_MEAN +# #include +# +# #ifndef __CYGWIN__ +# # ifdef __CYGWIN32__ +# # define __CYGWIN__ __CYGWIN32__ +# # endif +# #endif +# +# #ifdef __cplusplus +# extern "C" { +# #endif +# BOOL APIENTRY DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved); +# #ifdef __cplusplus +# } +# #endif +# +# #ifdef __CYGWIN__ +# #include +# DECLARE_CYGWIN_DLL( DllMain ); +# #endif +# HINSTANCE __hDllInstance_base; +# +# BOOL APIENTRY +# DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved) +# { +# __hDllInstance_base = hInst; +# return TRUE; +# } +# /* ltdll.c ends here */ +])# _LT_AC_FILE_LTDLL_C + + +# _LT_AC_TAGVAR(VARNAME, [TAGNAME]) +# --------------------------------- +AC_DEFUN([_LT_AC_TAGVAR], [ifelse([$2], [], [$1], [$1_$2])]) + + +# old names +AC_DEFUN([AM_PROG_LIBTOOL], [AC_PROG_LIBTOOL]) +AC_DEFUN([AM_ENABLE_SHARED], [AC_ENABLE_SHARED($@)]) +AC_DEFUN([AM_ENABLE_STATIC], [AC_ENABLE_STATIC($@)]) +AC_DEFUN([AM_DISABLE_SHARED], [AC_DISABLE_SHARED($@)]) +AC_DEFUN([AM_DISABLE_STATIC], [AC_DISABLE_STATIC($@)]) +AC_DEFUN([AM_PROG_LD], [AC_PROG_LD]) +AC_DEFUN([AM_PROG_NM], [AC_PROG_NM]) + +# This is just to silence aclocal about the macro not being used +ifelse([AC_DISABLE_FAST_INSTALL]) + +AC_DEFUN([LT_AC_PROG_GCJ], +[AC_CHECK_TOOL(GCJ, gcj, no) + test "x${GCJFLAGS+set}" = xset || GCJFLAGS="-g -O2" + AC_SUBST(GCJFLAGS) +]) + +AC_DEFUN([LT_AC_PROG_RC], +[AC_CHECK_TOOL(RC, windres, no) +]) + +# NOTE: This macro has been submitted for inclusion into # +# GNU Autoconf as AC_PROG_SED. When it is available in # +# a released version of Autoconf we should remove this # +# macro and use it instead. # +# LT_AC_PROG_SED +# -------------- +# Check for a fully-functional sed program, that truncates +# as few characters as possible. Prefer GNU sed if found. +AC_DEFUN([LT_AC_PROG_SED], +[AC_MSG_CHECKING([for a sed that does not truncate output]) +AC_CACHE_VAL(lt_cv_path_SED, +[# Loop through the user's path and test for sed and gsed. +# Then use that list of sed's as ones to test for truncation. +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for lt_ac_prog in sed gsed; do + for ac_exec_ext in '' $ac_executable_extensions; do + if $as_executable_p "$as_dir/$lt_ac_prog$ac_exec_ext"; then + lt_ac_sed_list="$lt_ac_sed_list $as_dir/$lt_ac_prog$ac_exec_ext" + fi + done + done +done +lt_ac_max=0 +lt_ac_count=0 +# Add /usr/xpg4/bin/sed as it is typically found on Solaris +# along with /bin/sed that truncates output. +for lt_ac_sed in $lt_ac_sed_list /usr/xpg4/bin/sed; do + test ! -f $lt_ac_sed && continue + cat /dev/null > conftest.in + lt_ac_count=0 + echo $ECHO_N "0123456789$ECHO_C" >conftest.in + # Check for GNU sed and select it if it is found. + if "$lt_ac_sed" --version 2>&1 < /dev/null | grep 'GNU' > /dev/null; then + lt_cv_path_SED=$lt_ac_sed + break + fi + while true; do + cat conftest.in conftest.in >conftest.tmp + mv conftest.tmp conftest.in + cp conftest.in conftest.nl + echo >>conftest.nl + $lt_ac_sed -e 's/a$//' < conftest.nl >conftest.out || break + cmp -s conftest.out conftest.nl || break + # 10000 chars as input seems more than enough + test $lt_ac_count -gt 10 && break + lt_ac_count=`expr $lt_ac_count + 1` + if test $lt_ac_count -gt $lt_ac_max; then + lt_ac_max=$lt_ac_count + lt_cv_path_SED=$lt_ac_sed + fi + done +done +]) +SED=$lt_cv_path_SED +AC_MSG_RESULT([$SED]) +]) + +# Copyright (C) 2002, 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# AM_AUTOMAKE_VERSION(VERSION) +# ---------------------------- +# Automake X.Y traces this macro to ensure aclocal.m4 has been +# generated from the m4 files accompanying Automake X.Y. +AC_DEFUN([AM_AUTOMAKE_VERSION], [am__api_version="1.9"]) + +# AM_SET_CURRENT_AUTOMAKE_VERSION +# ------------------------------- +# Call AM_AUTOMAKE_VERSION so it can be traced. +# This function is AC_REQUIREd by AC_INIT_AUTOMAKE. +AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION], + [AM_AUTOMAKE_VERSION([1.9.6])]) + +# AM_AUX_DIR_EXPAND -*- Autoconf -*- + +# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# For projects using AC_CONFIG_AUX_DIR([foo]), Autoconf sets +# $ac_aux_dir to `$srcdir/foo'. In other projects, it is set to +# `$srcdir', `$srcdir/..', or `$srcdir/../..'. +# +# Of course, Automake must honor this variable whenever it calls a +# tool from the auxiliary directory. The problem is that $srcdir (and +# therefore $ac_aux_dir as well) can be either absolute or relative, +# depending on how configure is run. This is pretty annoying, since +# it makes $ac_aux_dir quite unusable in subdirectories: in the top +# source directory, any form will work fine, but in subdirectories a +# relative path needs to be adjusted first. +# +# $ac_aux_dir/missing +# fails when called from a subdirectory if $ac_aux_dir is relative +# $top_srcdir/$ac_aux_dir/missing +# fails if $ac_aux_dir is absolute, +# fails when called from a subdirectory in a VPATH build with +# a relative $ac_aux_dir +# +# The reason of the latter failure is that $top_srcdir and $ac_aux_dir +# are both prefixed by $srcdir. In an in-source build this is usually +# harmless because $srcdir is `.', but things will broke when you +# start a VPATH build or use an absolute $srcdir. +# +# So we could use something similar to $top_srcdir/$ac_aux_dir/missing, +# iff we strip the leading $srcdir from $ac_aux_dir. That would be: +# am_aux_dir='\$(top_srcdir)/'`expr "$ac_aux_dir" : "$srcdir//*\(.*\)"` +# and then we would define $MISSING as +# MISSING="\${SHELL} $am_aux_dir/missing" +# This will work as long as MISSING is not called from configure, because +# unfortunately $(top_srcdir) has no meaning in configure. +# However there are other variables, like CC, which are often used in +# configure, and could therefore not use this "fixed" $ac_aux_dir. +# +# Another solution, used here, is to always expand $ac_aux_dir to an +# absolute PATH. The drawback is that using absolute paths prevent a +# configured tree to be moved without reconfiguration. + +AC_DEFUN([AM_AUX_DIR_EXPAND], +[dnl Rely on autoconf to set up CDPATH properly. +AC_PREREQ([2.50])dnl +# expand $ac_aux_dir to an absolute path +am_aux_dir=`cd $ac_aux_dir && pwd` +]) + +# AM_CONDITIONAL -*- Autoconf -*- + +# Copyright (C) 1997, 2000, 2001, 2003, 2004, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 7 + +# AM_CONDITIONAL(NAME, SHELL-CONDITION) +# ------------------------------------- +# Define a conditional. +AC_DEFUN([AM_CONDITIONAL], +[AC_PREREQ(2.52)dnl + ifelse([$1], [TRUE], [AC_FATAL([$0: invalid condition: $1])], + [$1], [FALSE], [AC_FATAL([$0: invalid condition: $1])])dnl +AC_SUBST([$1_TRUE]) +AC_SUBST([$1_FALSE]) +if $2; then + $1_TRUE= + $1_FALSE='#' +else + $1_TRUE='#' + $1_FALSE= +fi +AC_CONFIG_COMMANDS_PRE( +[if test -z "${$1_TRUE}" && test -z "${$1_FALSE}"; then + AC_MSG_ERROR([[conditional "$1" was never defined. +Usually this means the macro was only invoked conditionally.]]) +fi])]) + + +# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 8 + +# There are a few dirty hacks below to avoid letting `AC_PROG_CC' be +# written in clear, in which case automake, when reading aclocal.m4, +# will think it sees a *use*, and therefore will trigger all it's +# C support machinery. Also note that it means that autoscan, seeing +# CC etc. in the Makefile, will ask for an AC_PROG_CC use... + + +# _AM_DEPENDENCIES(NAME) +# ---------------------- +# See how the compiler implements dependency checking. +# NAME is "CC", "CXX", "GCJ", or "OBJC". +# We try a few techniques and use that to set a single cache variable. +# +# We don't AC_REQUIRE the corresponding AC_PROG_CC since the latter was +# modified to invoke _AM_DEPENDENCIES(CC); we would have a circular +# dependency, and given that the user is not expected to run this macro, +# just rely on AC_PROG_CC. +AC_DEFUN([_AM_DEPENDENCIES], +[AC_REQUIRE([AM_SET_DEPDIR])dnl +AC_REQUIRE([AM_OUTPUT_DEPENDENCY_COMMANDS])dnl +AC_REQUIRE([AM_MAKE_INCLUDE])dnl +AC_REQUIRE([AM_DEP_TRACK])dnl + +ifelse([$1], CC, [depcc="$CC" am_compiler_list=], + [$1], CXX, [depcc="$CXX" am_compiler_list=], + [$1], OBJC, [depcc="$OBJC" am_compiler_list='gcc3 gcc'], + [$1], GCJ, [depcc="$GCJ" am_compiler_list='gcc3 gcc'], + [depcc="$$1" am_compiler_list=]) + +AC_CACHE_CHECK([dependency style of $depcc], + [am_cv_$1_dependencies_compiler_type], +[if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then + # We make a subdir and do the tests there. Otherwise we can end up + # making bogus files that we don't know about and never remove. For + # instance it was reported that on HP-UX the gcc test will end up + # making a dummy file named `D' -- because `-MD' means `put the output + # in D'. + mkdir conftest.dir + # Copy depcomp to subdir because otherwise we won't find it if we're + # using a relative directory. + cp "$am_depcomp" conftest.dir + cd conftest.dir + # We will build objects and dependencies in a subdirectory because + # it helps to detect inapplicable dependency modes. For instance + # both Tru64's cc and ICC support -MD to output dependencies as a + # side effect of compilation, but ICC will put the dependencies in + # the current directory while Tru64 will put them in the object + # directory. + mkdir sub + + am_cv_$1_dependencies_compiler_type=none + if test "$am_compiler_list" = ""; then + am_compiler_list=`sed -n ['s/^#*\([a-zA-Z0-9]*\))$/\1/p'] < ./depcomp` + fi + for depmode in $am_compiler_list; do + # Setup a source with many dependencies, because some compilers + # like to wrap large dependency lists on column 80 (with \), and + # we should not choose a depcomp mode which is confused by this. + # + # We need to recreate these files for each test, as the compiler may + # overwrite some of them when testing with obscure command lines. + # This happens at least with the AIX C compiler. + : > sub/conftest.c + for i in 1 2 3 4 5 6; do + echo '#include "conftst'$i'.h"' >> sub/conftest.c + # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with + # Solaris 8's {/usr,}/bin/sh. + touch sub/conftst$i.h + done + echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf + + case $depmode in + nosideeffect) + # after this tag, mechanisms are not by side-effect, so they'll + # only be used when explicitly requested + if test "x$enable_dependency_tracking" = xyes; then + continue + else + break + fi + ;; + none) break ;; + esac + # We check with `-c' and `-o' for the sake of the "dashmstdout" + # mode. It turns out that the SunPro C++ compiler does not properly + # handle `-M -o', and we need to detect this. + if depmode=$depmode \ + source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ + depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ + $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \ + >/dev/null 2>conftest.err && + grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && + grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 && + ${MAKE-make} -s -f confmf > /dev/null 2>&1; then + # icc doesn't choke on unknown options, it will just issue warnings + # or remarks (even with -Werror). So we grep stderr for any message + # that says an option was ignored or not supported. + # When given -MP, icc 7.0 and 7.1 complain thusly: + # icc: Command line warning: ignoring option '-M'; no argument required + # The diagnosis changed in icc 8.0: + # icc: Command line remark: option '-MP' not supported + if (grep 'ignoring option' conftest.err || + grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else + am_cv_$1_dependencies_compiler_type=$depmode + break + fi + fi + done + + cd .. + rm -rf conftest.dir +else + am_cv_$1_dependencies_compiler_type=none +fi +]) +AC_SUBST([$1DEPMODE], [depmode=$am_cv_$1_dependencies_compiler_type]) +AM_CONDITIONAL([am__fastdep$1], [ + test "x$enable_dependency_tracking" != xno \ + && test "$am_cv_$1_dependencies_compiler_type" = gcc3]) +]) + + +# AM_SET_DEPDIR +# ------------- +# Choose a directory name for dependency files. +# This macro is AC_REQUIREd in _AM_DEPENDENCIES +AC_DEFUN([AM_SET_DEPDIR], +[AC_REQUIRE([AM_SET_LEADING_DOT])dnl +AC_SUBST([DEPDIR], ["${am__leading_dot}deps"])dnl +]) + + +# AM_DEP_TRACK +# ------------ +AC_DEFUN([AM_DEP_TRACK], +[AC_ARG_ENABLE(dependency-tracking, +[ --disable-dependency-tracking speeds up one-time build + --enable-dependency-tracking do not reject slow dependency extractors]) +if test "x$enable_dependency_tracking" != xno; then + am_depcomp="$ac_aux_dir/depcomp" + AMDEPBACKSLASH='\' +fi +AM_CONDITIONAL([AMDEP], [test "x$enable_dependency_tracking" != xno]) +AC_SUBST([AMDEPBACKSLASH]) +]) + +# Generate code to set up dependency tracking. -*- Autoconf -*- + +# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +#serial 3 + +# _AM_OUTPUT_DEPENDENCY_COMMANDS +# ------------------------------ +AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS], +[for mf in $CONFIG_FILES; do + # Strip MF so we end up with the name of the file. + mf=`echo "$mf" | sed -e 's/:.*$//'` + # Check whether this is an Automake generated Makefile or not. + # We used to match only the files named `Makefile.in', but + # some people rename them; so instead we look at the file content. + # Grep'ing the first line is not enough: some people post-process + # each Makefile.in and add a new line on top of each file to say so. + # So let's grep whole file. + if grep '^#.*generated by automake' $mf > /dev/null 2>&1; then + dirpart=`AS_DIRNAME("$mf")` + else + continue + fi + # Extract the definition of DEPDIR, am__include, and am__quote + # from the Makefile without running `make'. + DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"` + test -z "$DEPDIR" && continue + am__include=`sed -n 's/^am__include = //p' < "$mf"` + test -z "am__include" && continue + am__quote=`sed -n 's/^am__quote = //p' < "$mf"` + # When using ansi2knr, U may be empty or an underscore; expand it + U=`sed -n 's/^U = //p' < "$mf"` + # Find all dependency output files, they are included files with + # $(DEPDIR) in their names. We invoke sed twice because it is the + # simplest approach to changing $(DEPDIR) to its actual value in the + # expansion. + for file in `sed -n " + s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \ + sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g' -e 's/\$U/'"$U"'/g'`; do + # Make sure the directory exists. + test -f "$dirpart/$file" && continue + fdir=`AS_DIRNAME(["$file"])` + AS_MKDIR_P([$dirpart/$fdir]) + # echo "creating $dirpart/$file" + echo '# dummy' > "$dirpart/$file" + done +done +])# _AM_OUTPUT_DEPENDENCY_COMMANDS + + +# AM_OUTPUT_DEPENDENCY_COMMANDS +# ----------------------------- +# This macro should only be invoked once -- use via AC_REQUIRE. +# +# This code is only required when automatic dependency tracking +# is enabled. FIXME. This creates each `.P' file that we will +# need in order to bootstrap the dependency handling code. +AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS], +[AC_CONFIG_COMMANDS([depfiles], + [test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS], + [AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"]) +]) + +# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 8 + +# AM_CONFIG_HEADER is obsolete. It has been replaced by AC_CONFIG_HEADERS. +AU_DEFUN([AM_CONFIG_HEADER], [AC_CONFIG_HEADERS($@)]) + +# Do all the work for Automake. -*- Autoconf -*- + +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 12 + +# This macro actually does too much. Some checks are only needed if +# your package does certain things. But this isn't really a big deal. + +# AM_INIT_AUTOMAKE(PACKAGE, VERSION, [NO-DEFINE]) +# AM_INIT_AUTOMAKE([OPTIONS]) +# ----------------------------------------------- +# The call with PACKAGE and VERSION arguments is the old style +# call (pre autoconf-2.50), which is being phased out. PACKAGE +# and VERSION should now be passed to AC_INIT and removed from +# the call to AM_INIT_AUTOMAKE. +# We support both call styles for the transition. After +# the next Automake release, Autoconf can make the AC_INIT +# arguments mandatory, and then we can depend on a new Autoconf +# release and drop the old call support. +AC_DEFUN([AM_INIT_AUTOMAKE], +[AC_PREREQ([2.58])dnl +dnl Autoconf wants to disallow AM_ names. We explicitly allow +dnl the ones we care about. +m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl +AC_REQUIRE([AM_SET_CURRENT_AUTOMAKE_VERSION])dnl +AC_REQUIRE([AC_PROG_INSTALL])dnl +# test to see if srcdir already configured +if test "`cd $srcdir && pwd`" != "`pwd`" && + test -f $srcdir/config.status; then + AC_MSG_ERROR([source directory already configured; run "make distclean" there first]) +fi + +# test whether we have cygpath +if test -z "$CYGPATH_W"; then + if (cygpath --version) >/dev/null 2>/dev/null; then + CYGPATH_W='cygpath -w' + else + CYGPATH_W=echo + fi +fi +AC_SUBST([CYGPATH_W]) + +# Define the identity of the package. +dnl Distinguish between old-style and new-style calls. +m4_ifval([$2], +[m4_ifval([$3], [_AM_SET_OPTION([no-define])])dnl + AC_SUBST([PACKAGE], [$1])dnl + AC_SUBST([VERSION], [$2])], +[_AM_SET_OPTIONS([$1])dnl + AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl + AC_SUBST([VERSION], ['AC_PACKAGE_VERSION'])])dnl + +_AM_IF_OPTION([no-define],, +[AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE", [Name of package]) + AC_DEFINE_UNQUOTED(VERSION, "$VERSION", [Version number of package])])dnl + +# Some tools Automake needs. +AC_REQUIRE([AM_SANITY_CHECK])dnl +AC_REQUIRE([AC_ARG_PROGRAM])dnl +AM_MISSING_PROG(ACLOCAL, aclocal-${am__api_version}) +AM_MISSING_PROG(AUTOCONF, autoconf) +AM_MISSING_PROG(AUTOMAKE, automake-${am__api_version}) +AM_MISSING_PROG(AUTOHEADER, autoheader) +AM_MISSING_PROG(MAKEINFO, makeinfo) +AM_PROG_INSTALL_SH +AM_PROG_INSTALL_STRIP +AC_REQUIRE([AM_PROG_MKDIR_P])dnl +# We need awk for the "check" target. The system "awk" is bad on +# some platforms. +AC_REQUIRE([AC_PROG_AWK])dnl +AC_REQUIRE([AC_PROG_MAKE_SET])dnl +AC_REQUIRE([AM_SET_LEADING_DOT])dnl +_AM_IF_OPTION([tar-ustar], [_AM_PROG_TAR([ustar])], + [_AM_IF_OPTION([tar-pax], [_AM_PROG_TAR([pax])], + [_AM_PROG_TAR([v7])])]) +_AM_IF_OPTION([no-dependencies],, +[AC_PROVIDE_IFELSE([AC_PROG_CC], + [_AM_DEPENDENCIES(CC)], + [define([AC_PROG_CC], + defn([AC_PROG_CC])[_AM_DEPENDENCIES(CC)])])dnl +AC_PROVIDE_IFELSE([AC_PROG_CXX], + [_AM_DEPENDENCIES(CXX)], + [define([AC_PROG_CXX], + defn([AC_PROG_CXX])[_AM_DEPENDENCIES(CXX)])])dnl +]) +]) + + +# When config.status generates a header, we must update the stamp-h file. +# This file resides in the same directory as the config header +# that is generated. The stamp files are numbered to have different names. + +# Autoconf calls _AC_AM_CONFIG_HEADER_HOOK (when defined) in the +# loop where config.status creates the headers, so we can generate +# our stamp files there. +AC_DEFUN([_AC_AM_CONFIG_HEADER_HOOK], +[# Compute $1's index in $config_headers. +_am_stamp_count=1 +for _am_header in $config_headers :; do + case $_am_header in + $1 | $1:* ) + break ;; + * ) + _am_stamp_count=`expr $_am_stamp_count + 1` ;; + esac +done +echo "timestamp for $1" >`AS_DIRNAME([$1])`/stamp-h[]$_am_stamp_count]) + +# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# AM_PROG_INSTALL_SH +# ------------------ +# Define $install_sh. +AC_DEFUN([AM_PROG_INSTALL_SH], +[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl +install_sh=${install_sh-"$am_aux_dir/install-sh"} +AC_SUBST(install_sh)]) + +# Copyright (C) 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 2 + +# Check whether the underlying file-system supports filenames +# with a leading dot. For instance MS-DOS doesn't. +AC_DEFUN([AM_SET_LEADING_DOT], +[rm -rf .tst 2>/dev/null +mkdir .tst 2>/dev/null +if test -d .tst; then + am__leading_dot=. +else + am__leading_dot=_ +fi +rmdir .tst 2>/dev/null +AC_SUBST([am__leading_dot])]) + +# Check to see how 'make' treats includes. -*- Autoconf -*- + +# Copyright (C) 2001, 2002, 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 3 + +# AM_MAKE_INCLUDE() +# ----------------- +# Check to see how make treats includes. +AC_DEFUN([AM_MAKE_INCLUDE], +[am_make=${MAKE-make} +cat > confinc << 'END' +am__doit: + @echo done +.PHONY: am__doit +END +# If we don't find an include directive, just comment out the code. +AC_MSG_CHECKING([for style of include used by $am_make]) +am__include="#" +am__quote= +_am_result=none +# First try GNU make style include. +echo "include confinc" > confmf +# We grep out `Entering directory' and `Leaving directory' +# messages which can occur if `w' ends up in MAKEFLAGS. +# In particular we don't look at `^make:' because GNU make might +# be invoked under some other name (usually "gmake"), in which +# case it prints its new name instead of `make'. +if test "`$am_make -s -f confmf 2> /dev/null | grep -v 'ing directory'`" = "done"; then + am__include=include + am__quote= + _am_result=GNU +fi +# Now try BSD make style include. +if test "$am__include" = "#"; then + echo '.include "confinc"' > confmf + if test "`$am_make -s -f confmf 2> /dev/null`" = "done"; then + am__include=.include + am__quote="\"" + _am_result=BSD + fi +fi +AC_SUBST([am__include]) +AC_SUBST([am__quote]) +AC_MSG_RESULT([$_am_result]) +rm -f confinc confmf +]) + +# Fake the existence of programs that GNU maintainers use. -*- Autoconf -*- + +# Copyright (C) 1997, 1999, 2000, 2001, 2003, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 4 + +# AM_MISSING_PROG(NAME, PROGRAM) +# ------------------------------ +AC_DEFUN([AM_MISSING_PROG], +[AC_REQUIRE([AM_MISSING_HAS_RUN]) +$1=${$1-"${am_missing_run}$2"} +AC_SUBST($1)]) + + +# AM_MISSING_HAS_RUN +# ------------------ +# Define MISSING if not defined so far and test if it supports --run. +# If it does, set am_missing_run to use it, otherwise, to nothing. +AC_DEFUN([AM_MISSING_HAS_RUN], +[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl +test x"${MISSING+set}" = xset || MISSING="\${SHELL} $am_aux_dir/missing" +# Use eval to expand $SHELL +if eval "$MISSING --run true"; then + am_missing_run="$MISSING --run " +else + am_missing_run= + AC_MSG_WARN([`missing' script is too old or missing]) +fi +]) + +# Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# AM_PROG_MKDIR_P +# --------------- +# Check whether `mkdir -p' is supported, fallback to mkinstalldirs otherwise. +# +# Automake 1.8 used `mkdir -m 0755 -p --' to ensure that directories +# created by `make install' are always world readable, even if the +# installer happens to have an overly restrictive umask (e.g. 077). +# This was a mistake. There are at least two reasons why we must not +# use `-m 0755': +# - it causes special bits like SGID to be ignored, +# - it may be too restrictive (some setups expect 775 directories). +# +# Do not use -m 0755 and let people choose whatever they expect by +# setting umask. +# +# We cannot accept any implementation of `mkdir' that recognizes `-p'. +# Some implementations (such as Solaris 8's) are not thread-safe: if a +# parallel make tries to run `mkdir -p a/b' and `mkdir -p a/c' +# concurrently, both version can detect that a/ is missing, but only +# one can create it and the other will error out. Consequently we +# restrict ourselves to GNU make (using the --version option ensures +# this.) +AC_DEFUN([AM_PROG_MKDIR_P], +[if mkdir -p --version . >/dev/null 2>&1 && test ! -d ./--version; then + # We used to keeping the `.' as first argument, in order to + # allow $(mkdir_p) to be used without argument. As in + # $(mkdir_p) $(somedir) + # where $(somedir) is conditionally defined. However this is wrong + # for two reasons: + # 1. if the package is installed by a user who cannot write `.' + # make install will fail, + # 2. the above comment should most certainly read + # $(mkdir_p) $(DESTDIR)$(somedir) + # so it does not work when $(somedir) is undefined and + # $(DESTDIR) is not. + # To support the latter case, we have to write + # test -z "$(somedir)" || $(mkdir_p) $(DESTDIR)$(somedir), + # so the `.' trick is pointless. + mkdir_p='mkdir -p --' +else + # On NextStep and OpenStep, the `mkdir' command does not + # recognize any option. It will interpret all options as + # directories to create, and then abort because `.' already + # exists. + for d in ./-p ./--version; + do + test -d $d && rmdir $d + done + # $(mkinstalldirs) is defined by Automake if mkinstalldirs exists. + if test -f "$ac_aux_dir/mkinstalldirs"; then + mkdir_p='$(mkinstalldirs)' + else + mkdir_p='$(install_sh) -d' + fi +fi +AC_SUBST([mkdir_p])]) + +# Helper functions for option handling. -*- Autoconf -*- + +# Copyright (C) 2001, 2002, 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 3 + +# _AM_MANGLE_OPTION(NAME) +# ----------------------- +AC_DEFUN([_AM_MANGLE_OPTION], +[[_AM_OPTION_]m4_bpatsubst($1, [[^a-zA-Z0-9_]], [_])]) + +# _AM_SET_OPTION(NAME) +# ------------------------------ +# Set option NAME. Presently that only means defining a flag for this option. +AC_DEFUN([_AM_SET_OPTION], +[m4_define(_AM_MANGLE_OPTION([$1]), 1)]) + +# _AM_SET_OPTIONS(OPTIONS) +# ---------------------------------- +# OPTIONS is a space-separated list of Automake options. +AC_DEFUN([_AM_SET_OPTIONS], +[AC_FOREACH([_AM_Option], [$1], [_AM_SET_OPTION(_AM_Option)])]) + +# _AM_IF_OPTION(OPTION, IF-SET, [IF-NOT-SET]) +# ------------------------------------------- +# Execute IF-SET if OPTION is set, IF-NOT-SET otherwise. +AC_DEFUN([_AM_IF_OPTION], +[m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])]) + +# Check to make sure that the build environment is sane. -*- Autoconf -*- + +# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005 +# Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 4 + +# AM_SANITY_CHECK +# --------------- +AC_DEFUN([AM_SANITY_CHECK], +[AC_MSG_CHECKING([whether build environment is sane]) +# Just in case +sleep 1 +echo timestamp > conftest.file +# Do `set' in a subshell so we don't clobber the current shell's +# arguments. Must try -L first in case configure is actually a +# symlink; some systems play weird games with the mod time of symlinks +# (eg FreeBSD returns the mod time of the symlink's containing +# directory). +if ( + set X `ls -Lt $srcdir/configure conftest.file 2> /dev/null` + if test "$[*]" = "X"; then + # -L didn't work. + set X `ls -t $srcdir/configure conftest.file` + fi + rm -f conftest.file + if test "$[*]" != "X $srcdir/configure conftest.file" \ + && test "$[*]" != "X conftest.file $srcdir/configure"; then + + # If neither matched, then we have a broken ls. This can happen + # if, for instance, CONFIG_SHELL is bash and it inherits a + # broken ls alias from the environment. This has actually + # happened. Such a system could not be considered "sane". + AC_MSG_ERROR([ls -t appears to fail. Make sure there is not a broken +alias in your environment]) + fi + + test "$[2]" = conftest.file + ) +then + # Ok. + : +else + AC_MSG_ERROR([newly created file is older than distributed files! +Check your system clock]) +fi +AC_MSG_RESULT(yes)]) + +# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# AM_PROG_INSTALL_STRIP +# --------------------- +# One issue with vendor `install' (even GNU) is that you can't +# specify the program used to strip binaries. This is especially +# annoying in cross-compiling environments, where the build's strip +# is unlikely to handle the host's binaries. +# Fortunately install-sh will honor a STRIPPROG variable, so we +# always use install-sh in `make install-strip', and initialize +# STRIPPROG with the value of the STRIP variable (set by the user). +AC_DEFUN([AM_PROG_INSTALL_STRIP], +[AC_REQUIRE([AM_PROG_INSTALL_SH])dnl +# Installed binaries are usually stripped using `strip' when the user +# run `make install-strip'. However `strip' might not be the right +# tool to use in cross-compilation environments, therefore Automake +# will honor the `STRIP' environment variable to overrule this program. +dnl Don't test for $cross_compiling = yes, because it might be `maybe'. +if test "$cross_compiling" != no; then + AC_CHECK_TOOL([STRIP], [strip], :) +fi +INSTALL_STRIP_PROGRAM="\${SHELL} \$(install_sh) -c -s" +AC_SUBST([INSTALL_STRIP_PROGRAM])]) + +# Check how to create a tarball. -*- Autoconf -*- + +# Copyright (C) 2004, 2005 Free Software Foundation, Inc. +# +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# serial 2 + +# _AM_PROG_TAR(FORMAT) +# -------------------- +# Check how to create a tarball in format FORMAT. +# FORMAT should be one of `v7', `ustar', or `pax'. +# +# Substitute a variable $(am__tar) that is a command +# writing to stdout a FORMAT-tarball containing the directory +# $tardir. +# tardir=directory && $(am__tar) > result.tar +# +# Substitute a variable $(am__untar) that extract such +# a tarball read from stdin. +# $(am__untar) < result.tar +AC_DEFUN([_AM_PROG_TAR], +[# Always define AMTAR for backward compatibility. +AM_MISSING_PROG([AMTAR], [tar]) +m4_if([$1], [v7], + [am__tar='${AMTAR} chof - "$$tardir"'; am__untar='${AMTAR} xf -'], + [m4_case([$1], [ustar],, [pax],, + [m4_fatal([Unknown tar format])]) +AC_MSG_CHECKING([how to create a $1 tar archive]) +# Loop over all known methods to create a tar archive until one works. +_am_tools='gnutar m4_if([$1], [ustar], [plaintar]) pax cpio none' +_am_tools=${am_cv_prog_tar_$1-$_am_tools} +# Do not fold the above two line into one, because Tru64 sh and +# Solaris sh will not grok spaces in the rhs of `-'. +for _am_tool in $_am_tools +do + case $_am_tool in + gnutar) + for _am_tar in tar gnutar gtar; + do + AM_RUN_LOG([$_am_tar --version]) && break + done + am__tar="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - "'"$$tardir"' + am__tar_="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - "'"$tardir"' + am__untar="$_am_tar -xf -" + ;; + plaintar) + # Must skip GNU tar: if it does not support --format= it doesn't create + # ustar tarball either. + (tar --version) >/dev/null 2>&1 && continue + am__tar='tar chf - "$$tardir"' + am__tar_='tar chf - "$tardir"' + am__untar='tar xf -' + ;; + pax) + am__tar='pax -L -x $1 -w "$$tardir"' + am__tar_='pax -L -x $1 -w "$tardir"' + am__untar='pax -r' + ;; + cpio) + am__tar='find "$$tardir" -print | cpio -o -H $1 -L' + am__tar_='find "$tardir" -print | cpio -o -H $1 -L' + am__untar='cpio -i -H $1 -d' + ;; + none) + am__tar=false + am__tar_=false + am__untar=false + ;; + esac + + # If the value was cached, stop now. We just wanted to have am__tar + # and am__untar set. + test -n "${am_cv_prog_tar_$1}" && break + + # tar/untar a dummy directory, and stop if the command works + rm -rf conftest.dir + mkdir conftest.dir + echo GrepMe > conftest.dir/file + AM_RUN_LOG([tardir=conftest.dir && eval $am__tar_ >conftest.tar]) + rm -rf conftest.dir + if test -s conftest.tar; then + AM_RUN_LOG([$am__untar /dev/null 2>&1 && break + fi +done +rm -rf conftest.dir + +AC_CACHE_VAL([am_cv_prog_tar_$1], [am_cv_prog_tar_$1=$_am_tool]) +AC_MSG_RESULT([$am_cv_prog_tar_$1])]) +AC_SUBST([am__tar]) +AC_SUBST([am__untar]) +]) # _AM_PROG_TAR + diff -ruNp old/src/userspace/libnes/autogen.sh new/src/userspace/libnes/autogen.sh --- old/src/userspace/libnes/autogen.sh 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/autogen.sh 2006-10-23 14:41:36.000000000 -0500 @@ -0,0 +1,8 @@ +#! /bin/sh + +set -x +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf diff -ruNp old/src/userspace/libnes/configure.in new/src/userspace/libnes/configure.in --- old/src/userspace/libnes/configure.in 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/configure.in 2006-10-25 11:11:16.000000000 -0500 @@ -0,0 +1,39 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(libnes, 0.1, openib-general at openib.org) +AC_CONFIG_SRCDIR([src/nes_umain.h]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(libnes, 0.1) +AM_PROG_LIBTOOL + +dnl Checks for programs +AC_PROG_CC + +dnl Checks for libraries + +dnl Checks for header files. +AC_CHECK_HEADERS(sysfs/libsysfs.h) +AC_CHECK_HEADER(infiniband/driver.h, [], + AC_MSG_ERROR([ not found. Is libibverbs installed?])) +AC_HEADER_STDC + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_CHECK_SIZEOF(long) + +dnl Checks for library functions +AC_CHECK_FUNCS(ibv_read_sysfs_file) + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, + if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then + ac_cv_version_script=yes + else + ac_cv_version_script=no + fi) + +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") + +AC_CONFIG_FILES([Makefile libnes.spec]) +AC_OUTPUT From ggrundstrom at NetEffect.com Thu Oct 26 17:43:59 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:43:59 -0500 Subject: [openib-general] [PATCH 2/5] NetEffect 10Gb RNIC Userspace Library: makefile generation Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECD@venom2> Kernel driver patch 2 of 5. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/src/userspace/libnes/libnes.spec.in new/src/userspace/libnes/libnes.spec.in --- old/src/userspace/libnes/libnes.spec.in 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/libnes.spec.in 2006-10-25 11:11:23.000000000 -0500 @@ -0,0 +1,57 @@ + +%define ver @VERSION@ + +Name: libnes +Version: 0.1 +Release: 0.%{?dist} +Summary: NetEffect RNIC Userspace Driver + +Group: System Environment/Libraries +License: GPL/BSD +Url: http://openib.org/ +Source: http://openib.org/downloads/%{name}-%{ver}.tar.gz +BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) + +BuildRequires: libibverbs-devel + +%description +libnes provides a device-specific userspace driver for NetEffect RNICs +for use with the libibverbs library. + +%package devel +Summary: Development files for the libnes driver +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description devel +Static version of libnes that may be linked directly to an +application, which may be useful for debugging. + +%prep +%setup -q -n %{name}-%{ver} + +%build +%configure +make %{?_smp_mflags} + +%install +rm -rf $RPM_BUILD_ROOT +%makeinstall +# remove unpackaged files from the buildroot +rm -f $RPM_BUILD_ROOT%{_libdir}/infiniband/*.la + +%clean +rm -rf $RPM_BUILD_ROOT + +%files +%defattr(-,root,root,-) +%{_libdir}/infiniband/nes.so +%doc AUTHORS COPYING ChangeLog README + +%files devel +%defattr(-,root,root,-) +%{_libdir}/infiniband/nes.a + +%changelog +* Wed May 10 2006 nesdev - 1.0 +- First development Effort diff -ruNp old/src/userspace/libnes/Makefile.am new/src/userspace/libnes/Makefile.am --- old/src/userspace/libnes/Makefile.am 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/Makefile.am 2006-10-25 11:11:30.000000000 -0500 @@ -0,0 +1,25 @@ + +neslibdir = $(libdir)/infiniband + +neslib_LTLIBRARIES = src/nes.la + +src_nes_la_CFLAGS = -g -Wall -D_GNU_SOURCE + +if HAVE_LD_VERSION_SCRIPT + nes_version_script = -Wl,--version-script=$(srcdir)/src/nes.map +else + nes_version_script = +endif + +src_nes_la_SOURCES = src/nes_umain.c src/nes_uverbs.c +src_nes_la_LDFLAGS = -avoid-version -module \ + $(nes_version_script) + +DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ + debian/libnes1.install debian/libnes-dev.install debian/rules + +EXTRA_DIST = src/nes.h src/nes-abi.h \ + src/nes.map libnes.spec.in $(DEBIAN) + +dist-hook: libnes.spec + cp libnes.spec $(distdir) From ggrundstrom at NetEffect.com Thu Oct 26 17:47:40 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:47:40 -0500 Subject: [openib-general] [PATCH 3/5] NetEffect 10Gb RNIC Userspace Library: userspace header files Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECF@venom2> Userspace driver patch 3 of 5. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/src/userspace/libnes/src/nes-abi.h new/src/userspace/libnes/src/nes-abi.h --- old/src/userspace/libnes/src/nes-abi.h 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/src/nes-abi.h 2006-10-25 10:27:58.000000000 -0500 @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef nes_ABI_H +#define nes_ABI_H + +#include + +struct nes_ualloc_ucontext_resp { + struct ibv_get_context_resp ibv_resp; + __u32 max_pds; /* maximum pds allowed for this user process */ + __u32 max_qps; /* maximum qps allowed for this user process */ + __u32 wq_size; /* defines the size of the WQs (sq+rq) allocated to the mmaped area */ + __u32 reserved; +}; + +struct nes_ualloc_pd_resp { + struct ibv_alloc_pd_resp ibv_resp; + __u32 pd_id; + __u32 db_index; +}; + +struct nes_ucreate_cq { + struct ibv_create_cq ibv_cmd; + __u64 user_cq_buffer; +}; + +enum nes_umemreg_type { + NES_UMEMREG_TYPE_MEM = 0x0000, + NES_UMEMREG_TYPE_QP = 0x0001, + NES_UMEMREG_TYPE_CQ = 0x0002, +}; + +struct nes_ureg_mr { + struct ibv_reg_mr ibv_cmd; + __u32 reg_type; /* indicates if id is memory, QP or CQ */ + __u32 reserved; /* QP or CQ ID */ +}; + +struct nes_ucreate_cq_resp { + struct ibv_create_cq_resp ibv_resp; + __u32 cq_id; + __u32 cq_size; + __u32 mmap_db_index; + __u32 reserved; +}; + +struct nes_ucreate_qp { + struct ibv_create_qp ibv_cmd; +}; + +struct nes_ucreate_qp_resp { + struct ibv_create_qp_resp ibv_resp; + __u32 qp_id; + __u32 actual_sq_size; + __u32 actual_rq_size; + __u32 mmap_sq_db_index; + __u32 mmap_rq_db_index; + __u32 reserved; +}; + + +struct nes_cqe { + __u32 header; + __u32 len; + __u32 wrid_hi_stag; + __u32 wrid_low_msn; +}; + +#endif /* nes_ABI_H */ diff -ruNp old/src/userspace/libnes/src/nes.map new/src/userspace/libnes/src/nes.map --- old/src/userspace/libnes/src/nes.map 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/src/nes.map 2006-10-25 11:11:45.000000000 -0500 @@ -0,0 +1,6 @@ +{ + global: + ibv_driver_init; + openib_driver_init; + local: *; +}; diff -ruNp old/src/userspace/libnes/src/nes_umain.h new/src/userspace/libnes/src/nes_umain.h --- old/src/userspace/libnes/src/nes_umain.h 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/src/nes_umain.h 2006-10-25 10:27:59.000000000 -0500 @@ -0,0 +1,271 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef nes_umain_H +#define nes_umain_H + +#include +#include + +#define HIDDEN __attribute__((visibility ("hidden"))) + +#define PFX "nes: " + +#define NES_MAX_SQ_PAYLOAD_SIZE 0x40000000 /* for errata 5, bug 2784 */ + +enum nes_cqe_opcode_bits { + NES_CQE_STAG_VALID = (1<<6), + NES_CQE_ERROR = (1<<7), + NES_CQE_SQ = (1<<8), + NES_CQE_SE = (1<<9), + NES_CQE_PSH = (1<<29), + NES_CQE_FIN = (1<<30), + NES_CQE_VALID = (1<<31), +}; + +enum nes_cqe_word_idx { + NES_CQE_PAYLOAD_LENGTH_IDX = 0, + NES_CQE_COMP_COMP_CTX_LOW_IDX = 2, + NES_CQE_COMP_COMP_CTX_HIGH_IDX = 3, + NES_CQE_INV_STAG_IDX = 4, + NES_CQE_QP_ID_IDX = 5, + NES_CQE_ERROR_CODE_IDX = 6, + NES_CQE_OPCODE_IDX = 7, +}; + +enum nes_cqe_allocate_bits { + NES_CQE_ALLOC_INC_SELECT = (1<<28), + NES_CQE_ALLOC_NOTIFY_NEXT = (1<<29), + NES_CQE_ALLOC_NOTIFY_SE = (1<<30), + NES_CQE_ALLOC_RESET = (1<<31), +}; + +enum nes_iwarp_sq_wqe_word_idx { + NES_IWARP_SQ_WQE_MISC_IDX = 0, + NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX = 1, + NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX = 2, + NES_IWARP_SQ_WQE_COMP_CTX_HIGH_IDX = 3, + NES_IWARP_SQ_WQE_COMP_SCRATCH_LOW_IDX = 4, + NES_IWARP_SQ_WQE_COMP_SCRATCH_HIGH_IDX = 5, + NES_IWARP_SQ_WQE_INV_STAG_LOW_IDX = 7, + NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX = 8, + NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX = 9, + NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX = 10, + NES_IWARP_SQ_WQE_RDMA_STAG_IDX = 11, + NES_IWARP_SQ_WQE_FRAG0_LOW_IDX = 16, + NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX = 17, + NES_IWARP_SQ_WQE_LENGTH0_IDX = 18, + NES_IWARP_SQ_WQE_STAG0_IDX = 19, + NES_IWARP_SQ_WQE_FRAG1_LOW_IDX = 20, + NES_IWARP_SQ_WQE_FRAG1_HIGH_IDX = 21, + NES_IWARP_SQ_WQE_LENGTH1_IDX = 22, + NES_IWARP_SQ_WQE_STAG1_IDX = 23, + NES_IWARP_SQ_WQE_FRAG2_LOW_IDX = 24, + NES_IWARP_SQ_WQE_FRAG2_HIGH_IDX = 25, + NES_IWARP_SQ_WQE_LENGTH2_IDX = 26, + NES_IWARP_SQ_WQE_STAG2_IDX = 27, + NES_IWARP_SQ_WQE_FRAG3_LOW_IDX = 28, + NES_IWARP_SQ_WQE_FRAG3_HIGH_IDX = 29, + NES_IWARP_SQ_WQE_LENGTH3_IDX = 30, + NES_IWARP_SQ_WQE_STAG3_IDX = 31, +}; + +enum nes_iwarp_rq_wqe_word_idx { + NES_IWARP_RQ_WQE_TOTAL_PAYLOAD_IDX = 1, + NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX = 2, + NES_IWARP_RQ_WQE_COMP_CTX_HIGH_IDX = 3, + NES_IWARP_RQ_WQE_COMP_SCRATCH_LOW_IDX = 4, + NES_IWARP_RQ_WQE_COMP_SCRATCH_HIGH_IDX = 5, + NES_IWARP_RQ_WQE_FRAG0_LOW_IDX = 8, + NES_IWARP_RQ_WQE_FRAG0_HIGH_IDX = 9, + NES_IWARP_RQ_WQE_LENGTH0_IDX = 10, + NES_IWARP_RQ_WQE_STAG0_IDX = 11, + NES_IWARP_RQ_WQE_FRAG1_LOW_IDX = 12, + NES_IWARP_RQ_WQE_FRAG1_HIGH_IDX = 13, + NES_IWARP_RQ_WQE_LENGTH1_IDX = 14, + NES_IWARP_RQ_WQE_STAG1_IDX = 15, + NES_IWARP_RQ_WQE_FRAG2_LOW_IDX = 16, + NES_IWARP_RQ_WQE_FRAG2_HIGH_IDX = 17, + NES_IWARP_RQ_WQE_LENGTH2_IDX = 18, + NES_IWARP_RQ_WQE_STAG2_IDX = 19, + NES_IWARP_RQ_WQE_FRAG3_LOW_IDX = 20, + NES_IWARP_RQ_WQE_FRAG3_HIGH_IDX = 21, + NES_IWARP_RQ_WQE_LENGTH3_IDX = 22, + NES_IWARP_RQ_WQE_STAG3_IDX = 23, +}; + +enum nes_iwarp_sq_opcodes { + NES_IWARP_SQ_WQE_STREAMING = (1<<23), + NES_IWARP_SQ_WQE_READ_FENCE = (1<<29), + NES_IWARP_SQ_WQE_LOCAL_FENCE = (1<<30), + NES_IWARP_SQ_WQE_SIGNALED_COMPL = (1<<31), +}; + +enum nes_iwarp_sq_wqe_bits { + NES_IWARP_SQ_OP_RDMAW = 0, + NES_IWARP_SQ_OP_RDMAR = 1, + NES_IWARP_SQ_OP_SEND = 3, + NES_IWARP_SQ_OP_SENDINV = 4, + NES_IWARP_SQ_OP_SENDSE = 5, + NES_IWARP_SQ_OP_SENDSEINV = 6, + NES_IWARP_SQ_OP_BIND = 8, + NES_IWARP_SQ_OP_FAST_REG = 9, + NES_IWARP_SQ_OP_LOCINV = 10, + NES_IWARP_SQ_OP_RDMAR_LOCINV = 11, + NES_IWARP_SQ_OP_NOP = 12, +}; + +struct nes_hw_qp_wqe { + uint32_t wqe_words[32]; +}; + +struct nes_hw_cqe { + uint32_t cqe_words[8]; +}; + +enum nes_uhca_type { + NETEFFECT_nes +}; + +struct nes_user_doorbell { + uint32_t wqe_alloc; + uint32_t reserved[3]; + uint32_t cqe_alloc; +}; + +struct nes_udevice { + struct ibv_device ibv_dev; + enum nes_uhca_type hca_type; + int page_size; +}; + +struct nes_upd { + struct ibv_pd ibv_pd; + struct nes_user_doorbell volatile *udoorbell; + uint32_t pd_id; + uint32_t db_index; +}; + +struct nes_uvcontext { + struct ibv_context ibv_ctx; + struct nes_upd *nesupd; + uint32_t max_pds; /* maximum pds allowed for this user process */ + uint32_t max_qps; /* maximum qps allowed for this user process */ + uint32_t wq_size; /* defines the size of the WQs (sq+rq) allocated to the mmaped area */ +}; + +struct nes_ucq { + struct ibv_cq ibv_cq; + struct nes_hw_cqe volatile *cqes; + struct ibv_mr mr; + uint32_t cq_id; + uint16_t size; + uint16_t head; + uint16_t polled_completions; +}; + +struct nes_uqp { + struct ibv_qp ibv_qp; + struct nes_hw_qp_wqe volatile *sq_vbase; + struct nes_hw_qp_wqe volatile *rq_vbase; + uint32_t qp_id; + uint32_t bytes_sent; + uint16_t sq_db_index; + uint16_t sq_head; + uint16_t sq_tail; + uint16_t sq_size; + uint16_t rq_db_index; + uint16_t rq_head; + uint16_t rq_tail; + uint16_t rq_size; +}; + +#define to_nes_uxxx(xxx, type) \ + ((struct nes_u##type *) \ + ((void *) ib##xxx - offsetof(struct nes_u##type, ibv_##xxx))) + +static inline struct nes_udevice *to_nes_udev(struct ibv_device *ibdev) +{ + return to_nes_uxxx(dev, device); +} + +static inline struct nes_uvcontext *to_nes_uctx(struct ibv_context *ibctx) +{ + return to_nes_uxxx(ctx, vcontext); +} + +static inline struct nes_upd *to_nes_upd(struct ibv_pd *ibpd) +{ + return to_nes_uxxx(pd, pd); +} + +static inline struct nes_ucq *to_nes_ucq(struct ibv_cq *ibcq) +{ + return to_nes_uxxx(cq, cq); +} + +static inline struct nes_uqp *to_nes_uqp(struct ibv_qp *ibqp) +{ + return to_nes_uxxx(qp, qp); +} + + +/* nes_umain.c */ +struct ibv_device *ibv_driver_init(const char *, int); + +/* nes_uverbs.c */ +int nes_uquery_device(struct ibv_context *, struct ibv_device_attr *); +int nes_uquery_port(struct ibv_context *, uint8_t, struct ibv_port_attr *); +struct ibv_pd *nes_ualloc_pd(struct ibv_context *); +int nes_ufree_pd(struct ibv_pd *); +struct ibv_mr *nes_ureg_mr(struct ibv_pd *, void *, size_t, enum ibv_access_flags); +int nes_udereg_mr(struct ibv_mr *); +struct ibv_cq *nes_ucreate_cq(struct ibv_context *, int, struct ibv_comp_channel *, int); +int nes_uresize_cq(struct ibv_cq *, int); +int nes_udestroy_cq(struct ibv_cq *); +int nes_upoll_cq(struct ibv_cq *, int, struct ibv_wc *); +int nes_uarm_cq(struct ibv_cq *, int); +struct ibv_srq *nes_ucreate_srq(struct ibv_pd *, struct ibv_srq_init_attr *); +int nes_umodify_srq(struct ibv_srq *, struct ibv_srq_attr *, enum ibv_srq_attr_mask); +int nes_udestroy_srq(struct ibv_srq *); +int nes_upost_srq_recv(struct ibv_srq *, struct ibv_recv_wr *, struct ibv_recv_wr **); +struct ibv_qp *nes_ucreate_qp(struct ibv_pd *, struct ibv_qp_init_attr *); +int nes_umodify_qp(struct ibv_qp *, struct ibv_qp_attr *, enum ibv_qp_attr_mask); +int nes_udestroy_qp(struct ibv_qp *); +int nes_upost_send(struct ibv_qp *, struct ibv_send_wr *, struct ibv_send_wr **); +int nes_upost_recv(struct ibv_qp *, struct ibv_recv_wr *, struct ibv_recv_wr **); +struct ibv_ah *nes_ucreate_ah(struct ibv_pd *, struct ibv_ah_attr *); +int nes_udestroy_ah(struct ibv_ah *); +int nes_uattach_mcast(struct ibv_qp *, union ibv_gid *, uint16_t); +int nes_udetach_mcast(struct ibv_qp *, union ibv_gid *, uint16_t); + +#endif /* nes_umain_H */ From ggrundstrom at NetEffect.com Thu Oct 26 17:49:38 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:49:38 -0500 Subject: [openib-general] [PATCH 4/5] NetEffect 10Gb RNIC Userspace Library: userspace main c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ED0@venom2> Userspace driver patch 4 of 5. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/src/userspace/libnes/src/nes_umain.c new/src/userspace/libnes/src/nes_umain.c --- old/src/userspace/libnes/src/nes_umain.c 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/src/nes_umain.c 2006-10-25 10:27:58.000000000 -0500 @@ -0,0 +1,251 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include + +#include "nes_umain.h" +#include "nes-abi.h" + +long int page_size; + +#ifdef HAVE_SYSFS_LIBSYSFS_H +#include +#endif + +#include +#include +#include + +#ifndef PCI_VENDOR_ID_NETEFFECT +#define PCI_VENDOR_ID_NETEFFECT 0x1678 +#endif + +#ifndef PCI_DEVICE_ID_NETEFFECT_nes +#define PCI_DEVICE_ID_NETEFFECT_nes 0x0100 +#endif + + +#define HCA(v, d, t) \ + { .vendor = PCI_VENDOR_ID_##v, \ + .device = PCI_DEVICE_ID_NETEFFECT_##d, \ + .type = NETEFFECT_##t } + +struct { + unsigned vendor; + unsigned device; + enum nes_uhca_type type; +} hca_table[] = { + HCA(NETEFFECT, nes, nes),}; + +static struct ibv_context *nes_ualloc_context(struct ibv_device *, int); +static void nes_ufree_context(struct ibv_context *); + +static struct ibv_context_ops nes_uctx_ops = { + .query_device = nes_uquery_device, + .query_port = nes_uquery_port, + .alloc_pd = nes_ualloc_pd, + .dealloc_pd = nes_ufree_pd, + .reg_mr = nes_ureg_mr, + .dereg_mr = nes_udereg_mr, + .create_cq = nes_ucreate_cq, + .poll_cq = nes_upoll_cq, + .req_notify_cq = nes_uarm_cq, + .cq_event = NULL, + .resize_cq = nes_uresize_cq, + .destroy_cq = nes_udestroy_cq, + .create_srq = NULL, + .modify_srq = NULL, + .query_srq = NULL, + .destroy_srq = NULL, + .post_srq_recv = NULL, + .create_qp = nes_ucreate_qp, + .modify_qp = nes_umodify_qp, + .destroy_qp = nes_udestroy_qp, + .post_send = nes_upost_send, + .post_recv = nes_upost_recv, + .create_ah = nes_ucreate_ah, + .destroy_ah = nes_udestroy_ah, + .attach_mcast = nes_uattach_mcast, + .detach_mcast = nes_udetach_mcast +}; + + +/** + * nes_ualloc_context + * + * @param ibdev + * @param cmd_fd + * + * @return struct ibv_context* + */ +static struct ibv_context *nes_ualloc_context(struct ibv_device *ibdev, int cmd_fd) +{ + // void *mymmapp = NULL; + struct ibv_pd *ibv_pd; + struct nes_uvcontext *nesvctx; + struct ibv_get_context cmd; + struct nes_ualloc_ucontext_resp resp; + + page_size = sysconf(_SC_PAGESIZE); + + nesvctx = malloc(sizeof *nesvctx); + if (!nesvctx) + return NULL; + + nesvctx->ibv_ctx.cmd_fd = cmd_fd; + + if (ibv_cmd_get_context(&nesvctx->ibv_ctx, &cmd, sizeof cmd, + &resp.ibv_resp, sizeof resp)) + goto err_free; + + nesvctx->ibv_ctx.device = ibdev; + nesvctx->ibv_ctx.ops = nes_uctx_ops; + nesvctx->max_pds = resp.max_pds; + nesvctx->max_qps = resp.max_qps; + nesvctx->wq_size = resp.wq_size; + + /* Get a doorbell region for the CQs */ + ibv_pd = nes_ualloc_pd(&nesvctx->ibv_ctx); + if (!ibv_pd) + goto err_free; + ibv_pd->context = &nesvctx->ibv_ctx; + nesvctx->nesupd = to_nes_upd(ibv_pd); + + + return (&nesvctx->ibv_ctx); + +err_free: + fprintf(stderr, PFX "%s: Failed to allocate context for device.\n", __FUNCTION__); + free(nesvctx); + + return NULL; +} + + +/** + * nes_ufree_context + * + * @param ibctx + */ +static void nes_ufree_context(struct ibv_context *ibctx) +{ + struct nes_uvcontext *nesvctx = to_nes_uctx(ibctx); + nes_ufree_pd(&nesvctx->nesupd->ibv_pd); + + free(nesvctx); +} + + +static struct ibv_device_ops nes_udev_ops = { + .alloc_context = nes_ualloc_context, + .free_context = nes_ufree_context +}; + + +/** + * ibv_driver_init + * + * @param uverbs_sys_path + * @param abi_version + * + * @return struct ibv_device* + */ +struct ibv_device *ibv_driver_init(const char *uverbs_sys_path, int abi_version) +{ + char value[8]; + struct nes_udevice *dev; + unsigned vendor, device; + int i; + + if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor", value, sizeof(value)) < 0) { + return (NULL); + } + + sscanf(value, "%i", &vendor); + + if (ibv_read_sysfs_file(uverbs_sys_path, "device/device", value, sizeof(value)) < 0) { + return (NULL); + } + sscanf(value, "%i", &device); + + for (i = 0; i < sizeof hca_table / sizeof hca_table[0]; ++i) + if (vendor == hca_table[i].vendor && + device == hca_table[i].device) + goto found; + + return NULL; + +found: + dev = malloc(sizeof *dev); + if (!dev) { + fprintf(stderr, PFX "Fatal: couldn't allocate device for libnes\n"); + return (NULL); + } + + dev->ibv_dev.ops = nes_udev_ops; + dev->hca_type = hca_table[i].type; + dev->page_size = sysconf(_SC_PAGESIZE); + + return (&dev->ibv_dev); +} + + +#ifdef HAVE_SYSFS_LIBSYSFS_H +/** + * openib_driver_init + * + * @param sysdev + * + * @return struct ibv_device* + */ +struct ibv_device *openib_driver_init(struct sysfs_class_device *sysdev) +{ + int abi_ver = 0; + char value[8]; + + if (ibv_read_sysfs_file(sysdev->path, "abi_version", value, sizeof(value)) > 0) { + abi_ver = strtol(value, NULL, 10); + } + + return (ibv_driver_init(sysdev->path, abi_ver)); +} +#endif /* HAVE_SYSFS_LIBSYSFS_H */ + From ggrundstrom at NetEffect.com Thu Oct 26 17:52:43 2006 From: ggrundstrom at NetEffect.com (Glenn Grundstrom) Date: Thu, 26 Oct 2006 19:52:43 -0500 Subject: [openib-general] [PATCH 5/5] NetEffect 10Gb RNIC Userspace Library: openfabrics verbs interface c file Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ED1@venom2> Userspace driver patch 5 of 5. Signed-off-by: Glenn Grundstrom ====================================================== diff -ruNp old/src/userspace/libnes/src/nes_uverbs.c new/src/userspace/libnes/src/nes_uverbs.c --- old/src/userspace/libnes/src/nes_uverbs.c 1969-12-31 18:00:00.000000000 -0600 +++ new/src/userspace/libnes/src/nes_uverbs.c 2006-10-25 10:27:59.000000000 -0500 @@ -0,0 +1,933 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "nes_umain.h" +#include "nes-abi.h" + +extern long int page_size; + + +/** + * nes_uquery_device + * + * @param context + * @param attr + * + * @return int + */ +int nes_uquery_device(struct ibv_context *context, struct ibv_device_attr *attr) +{ + struct ibv_query_device cmd; + uint64_t reserved; + int ret; + + ret = ibv_cmd_query_device(context, attr, &reserved, &cmd, sizeof cmd); + if (ret) + return ret; + + return 0; +} + + +/** + * nes_uquery_port + * + * @param context + * @param port + * @param attr + * + * @return int + */ +int nes_uquery_port(struct ibv_context *context, uint8_t port, + struct ibv_port_attr *attr) +{ + struct ibv_query_port cmd; + + return ibv_cmd_query_port(context, port, attr, &cmd, sizeof cmd); +} + + +/** + * nes_ualloc_pd + * + * @param context + * + * @return struct ibv_pd* + */ +struct ibv_pd *nes_ualloc_pd(struct ibv_context *context) +{ + struct ibv_alloc_pd cmd; + struct nes_ualloc_pd_resp resp; + struct nes_upd *nesupd; + + nesupd = malloc(sizeof *nesupd); + if (!nesupd) + return NULL; + + if (ibv_cmd_alloc_pd(context, &nesupd->ibv_pd, &cmd, sizeof cmd, + &resp.ibv_resp, sizeof resp)) { + free(nesupd); + return NULL; + } + nesupd->pd_id = resp.pd_id; + nesupd->db_index = resp.db_index; + + nesupd->udoorbell = mmap(NULL, 4096, PROT_WRITE | PROT_READ, MAP_SHARED, + context->cmd_fd, nesupd->db_index * 4096); + + if (((void *)-1) == nesupd->udoorbell) { + free(nesupd); + return NULL; + } + + return (&nesupd->ibv_pd); +} + + +/** + * nes_ufree_pd + * + * @param pd + * + * @return int + */ +int nes_ufree_pd(struct ibv_pd *pd) +{ + int ret; + struct nes_upd *nesupd; +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + nesupd = to_nes_upd(pd); + + ret = ibv_cmd_dealloc_pd(pd); + if (ret) + return ret; + + munmap((void *)nesupd->udoorbell, 4096); + free(nesupd); + return 0; +} + + +/** + * nes_ureg_mr + * + * @param pd + * @param addr + * @param length + * @param access + * + * @return struct ibv_mr* + */ +struct ibv_mr *nes_ureg_mr(struct ibv_pd *pd, void *addr, + size_t length, enum ibv_access_flags access) +{ + struct ibv_mr *mr; + struct nes_ureg_mr cmd; + +// fprintf(stderr, PFX "%s: address = %p, length = %u.\n", __FUNCTION__, addr, length); + + mr = malloc(sizeof *mr); + if (!mr) + return NULL; + + cmd.reg_type = NES_UMEMREG_TYPE_MEM; + if (ibv_cmd_reg_mr(pd, addr, length, (uintptr_t) addr, + access, mr, &cmd.ibv_cmd, sizeof cmd)) { + fprintf(stderr, "ibv_cmd_reg_mr failed\n"); + free(mr); + return NULL; + } + + return mr; +} + + +/** + * nes_udereg_mr + * + * @param mr + * + * @return int + */ +int nes_udereg_mr(struct ibv_mr *mr) +{ + int ret; + +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + ret = ibv_cmd_dereg_mr(mr); + if (ret) + return ret; + + free(mr); + return 0; +} + + +/** + * nes_ucreate_cq + * + * @param context + * @param cqe + * @param channel + * @param comp_vector + * + * @return struct ibv_cq* + */ +struct ibv_cq *nes_ucreate_cq(struct ibv_context *context, int cqe, + struct ibv_comp_channel *channel, int comp_vector) +{ + struct nes_ucq *nesucq; + struct nes_ureg_mr reg_mr_cmd; + struct nes_ucreate_cq cmd; + struct nes_ucreate_cq_resp resp; + int ret; + struct nes_uvcontext *nesvctx = to_nes_uctx(context); + +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + nesucq = malloc(sizeof *nesucq); +// fprintf(stderr, "nesucq=%p, size=%u\n", nesucq, sizeof(*nesucq)); + if (!nesucq) { + return NULL; + } + memset(nesucq, 0, sizeof(*nesucq)); + + if (cqe < 4) /* just trying to keep to a reasonable minimum */ + cqe = 4; + nesucq->size = cqe + 1; + + nesucq->cqes = memalign(page_size, nesucq->size*sizeof(struct nes_hw_cqe)); + if (!nesucq->cqes) + goto err; + + /* Register the memory for the CQ */ + reg_mr_cmd.reg_type = NES_UMEMREG_TYPE_CQ; + + ret = ibv_cmd_reg_mr(&nesvctx->nesupd->ibv_pd, (void *)nesucq->cqes, + (nesucq->size*sizeof(struct nes_hw_cqe)), + (uintptr_t) nesucq->cqes, + IBV_ACCESS_LOCAL_WRITE, &nesucq->mr, + ®_mr_cmd.ibv_cmd, sizeof reg_mr_cmd); + if (ret) { + fprintf(stderr, "ibv_cmd_reg_mr failed (ret = %d).\n", ret); + free(nesucq->cqes); + goto err; + } + + /* Create the CQ */ + memset(&cmd, 0, sizeof(cmd)); + cmd.user_cq_buffer = (__u64)((uintptr_t)nesucq->cqes); + + ret = ibv_cmd_create_cq(context, nesucq->size-1, channel, comp_vector, + &nesucq->ibv_cq, &cmd.ibv_cmd, sizeof cmd, + &resp.ibv_resp, sizeof resp); + if (ret) + goto err; + + nesucq->cq_id = (uint16_t)resp.cq_id; + if (nesucq->size != (uint16_t)resp.cq_size) { + fprintf(stderr, PFX "%s: CQ allocation error: number of requested entries = %u, returned = %u.\n", + __FUNCTION__, nesucq->size, (uint16_t)resp.cq_size); + } + + /* Zero out the CQ */ + memset(nesucq->cqes, 0, nesucq->size*sizeof(struct nes_hw_cqe)); + + return (&nesucq->ibv_cq); + +err: + fprintf(stderr, PFX "%s: Error Creating CQ.\n", __FUNCTION__); + free(nesucq); + + return NULL; +} + + +/** + * nes_uresize_cq + * + * @param cq + * @param cqe + * + * @return int + */ +int nes_uresize_cq(struct ibv_cq *cq, int cqe) +{ + fprintf(stderr, PFX "%s\n", __FUNCTION__); + + return -ENOSYS; +} + + +/** + * nes_udestroy_cq + * + * @param cq + * + * @return int + */ +int nes_udestroy_cq(struct ibv_cq *cq) +{ + struct nes_ucq *nesucq = to_nes_ucq(cq); + int ret; +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + ret = ibv_cmd_destroy_cq(cq); + if (ret) + return ret; + + /* Free CQ the memory */ + free(nesucq->cqes); + free(nesucq); + + return 0; +} + + +/** + * nes_upoll_cq + * + * @param cq + * @param num_entries + * @param entry + * + * @return int + */ +int nes_upoll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *entry) +{ + uint64_t wrid; + struct nes_ucq *nesucq; + struct nes_uvcontext *nesvctx = NULL; + struct nes_uqp *nesuqp; + int cqe_count=0; + uint32_t head; + uint32_t wq_tail; + uint32_t cq_size; + uint32_t wqe_index; + struct nes_hw_cqe cqe; + +// fprintf(stderr, PFX "%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); + nesucq = to_nes_ucq(cq); + nesvctx = to_nes_uctx(cq->context); + head = nesucq->head; + cq_size = nesucq->size; + + // fprintf(stderr, PFX "%s: Polling CQ%u, head = %u, num_entries = %u.\n", __FUNCTION__, nesucq->cq_id, head, num_entries); + + while (cqe_countcqes[head].cqe_words[NES_CQE_OPCODE_IDX] & NES_CQE_VALID) { + cqe = (volatile struct nes_hw_cqe)nesucq->cqes[head]; + + memset(entry, 0, sizeof *entry); + /* this is for both the cqe copy and the zeroing of entry */ + asm __volatile__("": : :"memory"); + + nesucq->cqes[head].cqe_words[NES_CQE_OPCODE_IDX] = 0; + /* parse CQE, get completion context from WQE (either rq or sq */ + + wqe_index = cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]&511; + nesuqp = *((struct nes_uqp **)&cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]); + nesuqp = (struct nes_uqp *)((uintptr_t)nesuqp&(~1023)); +// fprintf(stderr, PFX "wqe index = %u. nesuqp = %p\n", wqe_index, nesuqp); + entry->status = IBV_WC_SUCCESS; + entry->qp_num = nesuqp->qp_id; + entry->src_qp = nesuqp->qp_id; + + if (cqe.cqe_words[NES_CQE_OPCODE_IDX] & NES_CQE_SQ) { + /* Working on a SQ Completion*/ + wq_tail = wqe_index; + nesuqp->sq_tail = (wqe_index+1)&(nesuqp->sq_size - 1); + wrid = *((uint64_t *)&nesuqp->sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_COMP_SCRATCH_LOW _IDX]); + entry->byte_len = nesuqp->sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX]; + + switch (nesuqp->sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_MISC_IDX]&0x3f) + { + case NES_IWARP_SQ_OP_RDMAW: + // fprintf(stderr, PFX "%s: Operation = RDMA WRITE.\n", __FUNCTION__ ); + entry->opcode = IBV_WC_RDMA_WRITE; + break; + case NES_IWARP_SQ_OP_RDMAR: + // fprintf(stderr, PFX "%s: Operation = RDMA READ.\n", __FUNCTION__ ); + entry->opcode = IBV_WC_RDMA_READ; + entry->byte_len = nesuqp->sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX]; + break; + case NES_IWARP_SQ_OP_SENDINV: + case NES_IWARP_SQ_OP_SENDSEINV: + case NES_IWARP_SQ_OP_SEND: + case NES_IWARP_SQ_OP_SENDSE: + // fprintf(stderr, PFX "%s: Operation = Send.\n", __FUNCTION__ ); + entry->opcode = IBV_WC_SEND; + break; + } + } else { + /* Working on a RQ Completion*/ + wq_tail = wqe_index; + nesuqp->rq_tail = (wqe_index+1)&(nesuqp->rq_size - 1); + entry->byte_len = cqe.cqe_words[NES_CQE_PAYLOAD_LENGTH_IDX]; + + wrid = *((uint64_t *)&nesuqp->rq_vbase[wq_tail].wqe_words[NES_IWARP_RQ_WQE_COMP_SCRATCH_LOW _IDX]); + entry->opcode = IBV_WC_RECV; + } + entry->wr_id = wrid; + + if (++head >= cq_size) + head = 0; + cqe_count++; + nesucq->polled_completions++; + + /* TODO: find a better number...if there is one */ + if ((nesucq->polled_completions>(cq_size/2)) || (nesucq->polled_completions==255)) { + if (NULL == nesvctx) + nesvctx = to_nes_uctx(cq->context); + nesvctx->nesupd->udoorbell->cqe_alloc = nesucq->cq_id | (nesucq->polled_completions << 16); + nesucq->polled_completions = 0; + } + entry++; + } else + break; + } + + if (nesucq->polled_completions) { + if (NULL == nesvctx) + nesvctx = to_nes_uctx(cq->context); + nesvctx->nesupd->udoorbell->cqe_alloc = nesucq->cq_id | (nesucq->polled_completions << 16); + nesucq->polled_completions = 0; + } + nesucq->head = head; +// if (cqe_count != 0) +// fprintf(stderr, PFX "%s: Reporting %u completions for CQ%u.\n", __FUNCTION__, cqe_count, nesucq->cq_id); + //asm __volatile__("int $3": : :"memory"); + + return cqe_count; +} + + +/** + * nes_uarm_cq + * + * @param cq + * @param solicited + * + * @return int + */ +int nes_uarm_cq(struct ibv_cq *cq, int solicited) +{ + struct nes_ucq *nesucq; + struct nes_uvcontext *nesvctx; + uint32_t cq_arm; + +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + nesucq = to_nes_ucq(cq); + nesvctx = to_nes_uctx(cq->context); + +// fprintf(stderr, PFX "%s: Requesting notification for CQ%u.\n", __FUNCTION__, nesucq->cq_id); + cq_arm = nesucq->cq_id; + + if (solicited) + cq_arm |= NES_CQE_ALLOC_NOTIFY_SE; + else + cq_arm |= NES_CQE_ALLOC_NOTIFY_NEXT; + +// fprintf(stderr, PFX "%s: Arming CQ%u, command = 0x%08X.\n", __FUNCTION__, nesucq->cq_id, cq_arm); + nesvctx->nesupd->udoorbell->cqe_alloc = cq_arm; + + return 0; +} + + +/** + * nes_ucreate_srq + * + * @param pd + * @param attr + * + * @return struct ibv_srq* + */ +struct ibv_srq *nes_ucreate_srq(struct ibv_pd *pd, + struct ibv_srq_init_attr *attr) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return (void *) -ENOSYS; +} + + +/** + * nes_umodify_srq + * + * @param srq + * @param attr + * @param attr_mask + * + * @return int + */ +int nes_umodify_srq(struct ibv_srq *srq, + struct ibv_srq_attr *attr, enum ibv_srq_attr_mask attr_mask) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return -ENOSYS; +} + + +/** + * nes_udestroy_srq + * + * @param srq + * + * @return int + */ +int nes_udestroy_srq(struct ibv_srq *srq) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return -ENOSYS; +} + + +/** + * nes_upost_srq_recv + * + * @param ibsrq + * @param wr + * @param bad_wr + * + * @return int + */ +int nes_upost_srq_recv(struct ibv_srq *ibsrq, + struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return -ENOSYS; +} + + +/** + * nes_ucreate_qp + * + * @param pd + * @param attr + * + * @return struct ibv_qp* + */ +struct ibv_qp *nes_ucreate_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) +{ + struct nes_uqp *nesuqp; + struct nes_uvcontext *nesvctx = to_nes_uctx(pd->context); + struct nes_ucreate_qp cmd; + struct nes_ucreate_qp_resp resp; + unsigned long mmap_offset; + int ret; + +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + /* Sanity check QP size before proceeding */ + if (attr->cap.max_send_wr > 510 || + attr->cap.max_recv_wr > 510 || + attr->cap.max_send_sge > 4 || + attr->cap.max_recv_sge > 4 ) + return NULL; + + nesuqp = memalign(1024, sizeof(*nesuqp)); + if (!nesuqp) + return NULL; + + memset(nesuqp, 0, sizeof(*nesuqp)); + + ret = ibv_cmd_create_qp(pd, &nesuqp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd, + &resp.ibv_resp, sizeof resp); + if (ret) { + free(nesuqp); + return NULL; + } + + nesuqp->qp_id = resp.qp_id; + nesuqp->sq_db_index = resp.mmap_sq_db_index; + nesuqp->rq_db_index = resp.mmap_rq_db_index; + nesuqp->sq_size = resp.actual_sq_size; + nesuqp->rq_size = resp.actual_rq_size; + /* Account for LSMM, in theory, could get overrun if app preposts to SQ */ + nesuqp->sq_head = 1; + nesuqp->sq_tail = 1; + + /* Map the SQ/RQ buffers */ + mmap_offset = ((nesvctx->max_pds*4096) + page_size-1) & (~(page_size-1)); + mmap_offset += (((sizeof(struct nes_hw_qp_wqe) * nesvctx->wq_size) + page_size-1) & (~(page_size-1)))*nesuqp->sq_db_index; + + nesuqp->sq_vbase = mmap(NULL, (nesuqp->sq_size+nesuqp->rq_size)*sizeof(struct nes_hw_qp_wqe), + PROT_WRITE | PROT_READ, MAP_SHARED, pd->context->cmd_fd, mmap_offset); + + if (((void *)-1) == nesuqp->sq_vbase) { + free(nesuqp); + return NULL; + } + nesuqp->rq_vbase = (struct nes_hw_qp_wqe *)(((char *)nesuqp->sq_vbase) + + (nesuqp->sq_size*sizeof(struct nes_hw_qp_wqe))); + *((unsigned int *)nesuqp->sq_vbase) = 0; + + return (&nesuqp->ibv_qp); +} + + +/** + * nes_umodify_qp + * + * @param qp + * @param attr + * @param attr_mask + * + * @return int + */ +int nes_umodify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, + enum ibv_qp_attr_mask attr_mask) +{ + struct ibv_modify_qp cmd; + +// fprintf(stderr, PFX "%s, QP State = %u, attr_mask = 0x%X.\n", __FUNCTION__, +// (unsigned int)attr->qp_state, (unsigned int)attr_mask ); + return (ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof cmd)); +} + + +/** + * nes_udestroy_qp + * + * @param qp + * + * @return int + */ +int nes_udestroy_qp(struct ibv_qp *qp) +{ + struct nes_uqp *nesuqp = to_nes_uqp(qp); + int ret; +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + ret = ibv_cmd_destroy_qp(qp); + if (ret) + return ret; + + free(nesuqp); + + return 0; +} + + +/** + * nes_upost_send + * + * @param ib_qp + * @param ib_wr + * @param bad_wr + * + * @return int + */ +int nes_upost_send(struct ibv_qp *ib_qp, struct ibv_send_wr *ib_wr, + struct ibv_send_wr **bad_wr) +{ + struct nes_uqp *nesuqp = to_nes_uqp(ib_qp); + struct nes_upd *nesupd = to_nes_upd(ib_qp->pd); + struct nes_hw_qp_wqe volatile *wqe; + uint32_t head = nesuqp->sq_head; + uint32_t qsize = nesuqp->sq_size; + uint32_t counter; + uint32_t err = 0; + uint32_t wqe_count = 0; + uint32_t outstanding_wqes; + int sge_index; + uint32_t total_payload_length; + +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + + while (ib_wr) { + /* Check for SQ overflow */ + outstanding_wqes = head + (2 * qsize) - nesuqp->sq_tail; + outstanding_wqes &= qsize - 1; + if (unlikely(outstanding_wqes == (qsize - 1))) { + err = -EINVAL; + break; + } + + wqe = (struct nes_hw_qp_wqe *)&nesuqp->sq_vbase[head]; +// fprintf(stderr, PFX "%s:processing sq wqe at %p, head = %u.\n", __FUNCTION__, wqe, head); + *((volatile uint64_t *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_SCRATCH_LOW_IDX]) = ib_wr->wr_id; + *((volatile uint64_t *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) = (uint64_t)((uintptr_t)nesuqp); + asm __volatile__("": : :"memory"); + wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX] |= head; + + switch (ib_wr->opcode) { + case IBV_WR_SEND: +// fprintf(stderr, PFX "%s:processing sq wqe%u. Opcode = %s\n", __FUNCTION__, head, "Send"); + if (ib_wr->send_flags & IBV_SEND_SOLICITED) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_SENDSE; + } else { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_SEND; + } + + if (ib_wr->send_flags & IBV_SEND_FENCE) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_LOCAL_FENCE; + } + total_payload_length = 0; + for (sge_index=0; sge_index < ib_wr->num_sge; sge_index++) { + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = (uint32_t)ib_wr->sg_list[sge_index].addr; + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = (uint32_t)(ib_wr->sg_list[sge_index].addr>>32); + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX+(sge_index*4)] = ib_wr->sg_list[sge_index].length; + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX+(sge_index*4)] = ib_wr->sg_list[sge_index].lkey; + total_payload_length += ib_wr->sg_list[sge_index].length; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = total_payload_length; + nesuqp->bytes_sent += total_payload_length; + if (nesuqp->bytes_sent > NES_MAX_SQ_PAYLOAD_SIZE) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_READ_FENCE; + nesuqp->bytes_sent = 0; + } + break; + case IBV_WR_RDMA_WRITE: +// fprintf(stderr, PFX "%s:processing sq wqe%u. Opcode = %s\n", __FUNCTION__, head, "Write"); + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_RDMAW; + + if (ib_wr->send_flags & IBV_SEND_FENCE) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_LOCAL_FENCE; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_STAG_IDX] = ib_wr->wr.rdma.rkey; + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX] = (uint32_t)ib_wr->wr.rdma.remote_addr; + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX] = (uint32_t)(ib_wr->wr.rdma.remote_addr>>32); + total_payload_length = 0; + for (sge_index=0; sge_index < ib_wr->num_sge; sge_index++) { + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = (uint32_t)ib_wr->sg_list[sge_index].addr; + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = (uint32_t)(ib_wr->sg_list[sge_index].addr>>32); + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX+(sge_index*4)] = ib_wr->sg_list[sge_index].length; + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX+(sge_index*4)] = ib_wr->sg_list[sge_index].lkey; + total_payload_length += ib_wr->sg_list[sge_index].length; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = total_payload_length; + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX] = wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX]; + nesuqp->bytes_sent += total_payload_length; + if (nesuqp->bytes_sent > NES_MAX_SQ_PAYLOAD_SIZE) { + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_READ_FENCE; + nesuqp->bytes_sent = 0; + } + break; + case IBV_WR_RDMA_READ: +// fprintf(stderr, PFX "%s:processing sq wqe%u. Opcode = %s\n", __FUNCTION__, head, "Read"); + /* IWarp only supports 1 sge for RDMA reads */ + if (ib_wr->num_sge > 1) { + err = -EINVAL; + break; + } + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_RDMAR; + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX] = (uint32_t)ib_wr->wr.rdma.remote_addr; + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX] = (uint32_t)(ib_wr->wr.rdma.remote_addr>>32); + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_STAG_IDX] = ib_wr->wr.rdma.rkey; + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX] = ib_wr->sg_list->length; + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX] = (uint32_t)ib_wr->sg_list->addr; + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX] = (uint32_t)(ib_wr->sg_list->addr>>32); + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX] = ib_wr->sg_list->lkey; + break; + default: + /* error */ + err = -EINVAL; + break; + } + + if (ib_wr->send_flags & IBV_SEND_SIGNALED) { +// fprintf(stderr, PFX "%s:sq wqe%u is signalled\n", __FUNCTION__, head); + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= NES_IWARP_SQ_WQE_SIGNALED_COMPL; + } + ib_wr = ib_wr->next; + head ++; + wqe_count ++; + if (head >= qsize) + head = 0; + } + + nesuqp->sq_head = head; + asm __volatile__("": : :"memory"); + while (wqe_count) { + counter = (wqe_count<(uint32_t)255) ? wqe_count : 255; + wqe_count -= counter; + nesupd->udoorbell->wqe_alloc = (counter<<24) | 0x00800000 | nesuqp->qp_id; + } + + if (err) + *bad_wr = ib_wr; + return err; +} + + +/** + * nes_upost_recv + * + * @param ib_qp + * @param ib_wr + * @param bad_wr + * + * @return int + */ +int nes_upost_recv(struct ibv_qp *ib_qp, struct ibv_recv_wr *ib_wr, + struct ibv_recv_wr **bad_wr) +{ + struct nes_uqp *nesuqp = to_nes_uqp(ib_qp); + struct nes_upd *nesupd = to_nes_upd(ib_qp->pd); + struct nes_hw_qp_wqe *wqe; + uint32_t head = nesuqp->rq_head; + uint32_t qsize = nesuqp->rq_size; + uint32_t counter; + uint32_t err = 0; + uint32_t wqe_count = 0; + uint32_t outstanding_wqes; + int sge_index; + uint32_t total_payload_length; + +// fprintf(stderr, PFX "%s: nesuqp = %p, nesupd = %p.\n", __FUNCTION__, nesuqp, nesupd); +// fprintf(stderr, PFX "%s: rq_base = %p, sq_base = %p.\n", __FUNCTION__, nesuqp->rq_vbase, nesuqp->sq_vbase); + while (ib_wr) { + /* Check for RQ overflow */ + outstanding_wqes = head + (2 * qsize) - nesuqp->rq_tail; + outstanding_wqes &= qsize - 1; + if (unlikely(outstanding_wqes == (qsize - 1))) { + err = -EINVAL; + break; + } + +// fprintf(stderr, PFX "%s: ibwr (%p) sge count = %u, sglist = %p.\n", __FUNCTION__, ib_wr, ib_wr->num_sge, ib_wr->sg_list); + wqe = (struct nes_hw_qp_wqe *)&nesuqp->rq_vbase[head]; +// fprintf(stderr, PFX "%s:processing rq wqe at %p, head = %u.\n", __FUNCTION__, wqe, head); + *((uint64_t volatile *)&wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_SCRATCH_LOW_IDX]) = ib_wr->wr_id; + *((uint64_t volatile *)&wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX]) = (uint64_t)((uintptr_t)nesuqp); + asm __volatile__("": : :"memory"); + wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX] |= head; + + total_payload_length = 0; + for (sge_index=0; sge_index < ib_wr->num_sge; sge_index++) { + wqe->wqe_words[NES_IWARP_RQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = (uint32_t)ib_wr->sg_list[sge_index].addr; + wqe->wqe_words[NES_IWARP_RQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = (uint32_t)(ib_wr->sg_list[sge_index].addr>>32); + wqe->wqe_words[NES_IWARP_RQ_WQE_LENGTH0_IDX+(sge_index*4)] = ib_wr->sg_list[sge_index].length; + wqe->wqe_words[NES_IWARP_RQ_WQE_STAG0_IDX+(sge_index*4)] = ib_wr->sg_list[sge_index].lkey; + total_payload_length += ib_wr->sg_list->length; + } + wqe->wqe_words[NES_IWARP_RQ_WQE_TOTAL_PAYLOAD_IDX] = total_payload_length; + + ib_wr = ib_wr->next; + head ++; + wqe_count ++; + if (head >= qsize) + head = 0; + } + + nesuqp->rq_head = head; + asm __volatile__("": : :"memory"); + while (wqe_count) { + counter = (wqe_count<(uint32_t)255) ? wqe_count : 255; + wqe_count -= counter; + nesupd->udoorbell->wqe_alloc = (counter << 24) | nesuqp->qp_id; + } + + if (err) + *bad_wr = ib_wr; + return err; +} + + +/** + * nes_ucreate_ah + * + * @param pd + * @param attr + * + * @return struct ibv_ah* + */ +struct ibv_ah *nes_ucreate_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return (void *) -ENOSYS; +} + + +/** + * nes_udestroy_ah + * + * @param ah + * + * @return int + */ +int nes_udestroy_ah(struct ibv_ah *ah) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return -ENOSYS; +} + + +/** + * nes_uattach_mcast + * + * @param qp + * @param gid + * @param lid + * + * @return int + */ +int nes_uattach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return -ENOSYS; +} + + +/** + * nes_udetach_mcast + * + * @param qp + * @param gid + * @param lid + * + * @return int + */ +int nes_udetach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid) +{ +// fprintf(stderr, PFX "%s\n", __FUNCTION__); + return -ENOSYS; +} + From mst at mellanox.co.il Fri Oct 27 01:00:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 27 Oct 2006 10:00:02 +0200 Subject: [openib-general] [openfabrics-ewg] new server up and running In-Reply-To: <20061026185532.GE11425@sashak.voltaire.com> References: <20061026185532.GE11425@sashak.voltaire.com> Message-ID: <20061027080002.GA31235@mellanox.co.il> Quoting r. Sasha Khapyorsky : > > I even think we should replace --base-path with --user-path=scm. > > > > As it is, we have e.g. > > /pub/scm/management.git -> /home/sashak/management.git > > which is just confusing. > > It is accessable just as git://staging.openfabrics.org/management , > what is confusing here? It's not clear from the name who's tree it is. ~sashak/management/git would be better - let's everyone keep his tree uner his home directory. -- MST From michael.arndt at informatik.tu-chemnitz.de Fri Oct 27 02:56:01 2006 From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt) Date: Fri, 27 Oct 2006 11:56:01 +0200 Subject: [openib-general] SM Receive Handling Message-ID: <000301c6f9ae$1c0b82e0$21606d86@one7> Hi, I have a question about the way the SM is informed when a SMP is received. If I look at the sources and go from the bottom up I stop at the 'ib_mad_recv_done_handler' (core/mad.c). At this point the SMI is processing the packet and notice if the SMP has to be handled by the SM or SMA. In this case the function jumps to the label 'local' at line 1860 (see code attachment). I would really like if someone can explain the steps are taking between the labels 'local' and 'out'. The reason is, that I can't see were the __osm_sm_mad_ctrl_rcv_callback (which is the function the SM register to handle received MADs, right?) is informed, in any way (Message, JobQueue, Interrupt)? Thanks Michael static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv, struct ib_wc *wc) { struct ib_mad_qp_info *qp_info; struct ib_mad_private_header *mad_priv_hdr; struct ib_mad_private *recv, *response; struct ib_mad_list_head *mad_list; struct ib_mad_agent_private *mad_agent; response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL); if (!response) printk(KERN_ERR PFX "ib_mad_recv_done_handler no memory " "for response buffer\n"); mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id; qp_info = mad_list->mad_queue->qp_info; dequeue_mad(mad_list); mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header, mad_list); recv = container_of(mad_priv_hdr, struct ib_mad_private, header); dma_unmap_single(port_priv->device->dma_device, pci_unmap_addr(&recv->header, mapping), sizeof(struct ib_mad_private) - sizeof(struct ib_mad_private_header), DMA_FROM_DEVICE); /* Setup MAD receive work completion from "normal" work completion */ recv->header.wc = *wc; recv->header.recv_wc.wc = &recv->header.wc; recv->header.recv_wc.mad_len = sizeof(struct ib_mad); recv->header.recv_wc.recv_buf.mad = &recv->mad.mad; recv->header.recv_wc.recv_buf.grh = &recv->grh; if (atomic_read(&qp_info->snoop_count)) snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS); /* Validate MAD */ if (!validate_mad(&recv->mad.mad, qp_info->qp->qp_num)) goto out; if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { if (!smi_handle_dr_smp_recv(&recv->mad.smp, port_priv->device->node_type, port_priv->port_num, port_priv->device->phys_port_cnt)) goto out; if (!smi_check_forward_dr_smp(&recv->mad.smp)) goto local; if (!smi_handle_dr_smp_send(&recv->mad.smp, port_priv->device->node_type, port_priv->port_num)) goto out; if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) goto out; } local: /* Give driver "right of first refusal" on incoming MAD */ if (port_priv->device->process_mad) { int ret; if (!response) { printk(KERN_ERR PFX "No memory for response MAD\n"); /* * Is it better to assume that * it wouldn't be processed ? */ goto out; } ret = port_priv->device->process_mad(port_priv->device, 0, port_priv->port_num, wc, &recv->grh, &recv->mad.mad, &response->mad.mad); if (ret & IB_MAD_RESULT_SUCCESS) { if (ret & IB_MAD_RESULT_CONSUMED) goto out; if (ret & IB_MAD_RESULT_REPLY) { agent_send_response(&response->mad.mad, &recv->grh, wc, port_priv->device, port_priv->port_num, qp_info->qp->qp_num); goto out; } } } mad_agent = find_mad_agent(port_priv, &recv->mad.mad); if (mad_agent) { ib_mad_complete_recv(mad_agent, &recv->header.recv_wc); /* * recv is freed up in error cases in ib_mad_complete_recv * or via recv_handler in ib_mad_complete_recv() */ recv = NULL; } out: /* Post another receive request for this QP */ if (response) { ib_mad_post_receive_mads(qp_info, response); if (recv) kmem_cache_free(ib_mad_cache, recv); } else ib_mad_post_receive_mads(qp_info, recv); } From mst at mellanox.co.il Fri Oct 27 05:01:10 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 27 Oct 2006 14:01:10 +0200 Subject: [openib-general] New server svn up In-Reply-To: <20061026211655.GH11425@sashak.voltaire.com> References: <1161826839.26066.99.camel@localhost> <20061026021635.GA14818@sashak.voltaire.com> <1161885128.31851.16.camel@localhost> <20061026211655.GH11425@sashak.voltaire.com> Message-ID: <20061027120110.GB31235@mellanox.co.il> Quoting r. Sasha Khapyorsky : > Subject: Re: [openib-general] New server svn up > > On 13:51 Thu 26 Oct , Roland Dreier wrote: > > > That's up to the developers. I suggest folks try out the new server > > > and move over to using git/svn on it as soon as possible. We can > > > figure out how to clean up or remove the svn user space tree during the > > > summit as SC06. > > > > How does one use git on the new server? > > To put your tree there you need user account. > > Then to make 'tree' publically available you can place it under > ~rdreier/scm/ and this will be pullable as > git://staging.openfabrics.org/~rdreier/tree , or under /pub/scm/ , then > this will be available as git://staging.openfabrics.org/tree . > > Sasha Adding stuff under /pub/scm/ seems to require root account on the server. Why do we need 2 places? Let's have everyone keep stuff under ~user/scm. -- MST From tom at opengridcomputing.com Fri Oct 27 07:27:15 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 27 Oct 2006 09:27:15 -0500 Subject: [openib-general] [PATCH 1/5] NetEffect 10Gb RNIC Userspace Library: userspace config generation In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECA@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECA@venom2> Message-ID: <1161959235.2748.1.camel@trinity.ogc.int> Glenn: I don't think the userspace stuff belongs on netdev. Someone please correct me if I'm wrong. On Thu, 2006-10-26 at 19:41 -0500, Glenn Grundstrom wrote: > Userspace patch 1 of 5. > > Signed-off-by: Glenn Grundstrom > > ====================================================== > > diff -ruNp old/src/userspace/libnes/aclocal.m4 > new/src/userspace/libnes/aclocal.m4 > --- old/src/userspace/libnes/aclocal.m4 1969-12-31 18:00:00.000000000 > -0600 > +++ new/src/userspace/libnes/aclocal.m4 2006-10-25 11:11:08.000000000 > -0500 > @@ -0,0 +1,7256 @@ > +# generated automatically by aclocal 1.9.6 -*- Autoconf -*- > + > +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, > +# 2005 Free Software Foundation, Inc. > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY, to the extent permitted by law; without > +# even the implied warranty of MERCHANTABILITY or FITNESS FOR A > +# PARTICULAR PURPOSE. > + > +# libtool.m4 - Configure libtool for the host system. -*-Autoconf-*- > + > +# serial 48 AC_PROG_LIBTOOL > + > + > +# AC_PROVIDE_IFELSE(MACRO-NAME, IF-PROVIDED, IF-NOT-PROVIDED) > +# ----------------------------------------------------------- > +# If this macro is not defined by Autoconf, define it here. > +m4_ifdef([AC_PROVIDE_IFELSE], > + [], > + [m4_define([AC_PROVIDE_IFELSE], > + [m4_ifdef([AC_PROVIDE_$1], > + [$2], [$3])])]) > + > + > +# AC_PROG_LIBTOOL > +# --------------- > +AC_DEFUN([AC_PROG_LIBTOOL], > +[AC_REQUIRE([_AC_PROG_LIBTOOL])dnl > +dnl If AC_PROG_CXX has already been expanded, run AC_LIBTOOL_CXX > +dnl immediately, otherwise, hook it in at the end of AC_PROG_CXX. > + AC_PROVIDE_IFELSE([AC_PROG_CXX], > + [AC_LIBTOOL_CXX], > + [define([AC_PROG_CXX], defn([AC_PROG_CXX])[AC_LIBTOOL_CXX > + ])]) > +dnl And a similar setup for Fortran 77 support > + AC_PROVIDE_IFELSE([AC_PROG_F77], > + [AC_LIBTOOL_F77], > + [define([AC_PROG_F77], defn([AC_PROG_F77])[AC_LIBTOOL_F77 > +])]) > + > +dnl Quote A][M_PROG_GCJ so that aclocal doesn't bring it in needlessly. > +dnl If either AC_PROG_GCJ or A][M_PROG_GCJ have already been expanded, > run > +dnl AC_LIBTOOL_GCJ immediately, otherwise, hook it in at the end of > both. > + AC_PROVIDE_IFELSE([AC_PROG_GCJ], > + [AC_LIBTOOL_GCJ], > + [AC_PROVIDE_IFELSE([A][M_PROG_GCJ], > + [AC_LIBTOOL_GCJ], > + [AC_PROVIDE_IFELSE([LT_AC_PROG_GCJ], > + [AC_LIBTOOL_GCJ], > + [ifdef([AC_PROG_GCJ], > + [define([AC_PROG_GCJ], > defn([AC_PROG_GCJ])[AC_LIBTOOL_GCJ])]) > + ifdef([A][M_PROG_GCJ], > + [define([A][M_PROG_GCJ], > defn([A][M_PROG_GCJ])[AC_LIBTOOL_GCJ])]) > + ifdef([LT_AC_PROG_GCJ], > + [define([LT_AC_PROG_GCJ], > + defn([LT_AC_PROG_GCJ])[AC_LIBTOOL_GCJ])])])]) > +])])# AC_PROG_LIBTOOL > + > + > +# _AC_PROG_LIBTOOL > +# ---------------- > +AC_DEFUN([_AC_PROG_LIBTOOL], > +[AC_REQUIRE([AC_LIBTOOL_SETUP])dnl > +AC_BEFORE([$0],[AC_LIBTOOL_CXX])dnl > +AC_BEFORE([$0],[AC_LIBTOOL_F77])dnl > +AC_BEFORE([$0],[AC_LIBTOOL_GCJ])dnl > + > +# This can be used to rebuild libtool when needed > +LIBTOOL_DEPS="$ac_aux_dir/ltmain.sh" > + > +# Always use our own libtool. > +LIBTOOL='$(SHELL) $(top_builddir)/libtool' > +AC_SUBST(LIBTOOL)dnl > + > +# Prevent multiple expansion > +define([AC_PROG_LIBTOOL], []) > +])# _AC_PROG_LIBTOOL > + > + > +# AC_LIBTOOL_SETUP > +# ---------------- > +AC_DEFUN([AC_LIBTOOL_SETUP], > +[AC_PREREQ(2.50)dnl > +AC_REQUIRE([AC_ENABLE_SHARED])dnl > +AC_REQUIRE([AC_ENABLE_STATIC])dnl > +AC_REQUIRE([AC_ENABLE_FAST_INSTALL])dnl > +AC_REQUIRE([AC_CANONICAL_HOST])dnl > +AC_REQUIRE([AC_CANONICAL_BUILD])dnl > +AC_REQUIRE([AC_PROG_CC])dnl > +AC_REQUIRE([AC_PROG_LD])dnl > +AC_REQUIRE([AC_PROG_LD_RELOAD_FLAG])dnl > +AC_REQUIRE([AC_PROG_NM])dnl > + > +AC_REQUIRE([AC_PROG_LN_S])dnl > +AC_REQUIRE([AC_DEPLIBS_CHECK_METHOD])dnl > +# Autoconf 2.13's AC_OBJEXT and AC_EXEEXT macros only works for C > compilers! > +AC_REQUIRE([AC_OBJEXT])dnl > +AC_REQUIRE([AC_EXEEXT])dnl > +dnl > + > +AC_LIBTOOL_SYS_MAX_CMD_LEN > +AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE > +AC_LIBTOOL_OBJDIR > + > +AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl > +_LT_AC_PROG_ECHO_BACKSLASH > + > +case $host_os in > +aix3*) > + # AIX sometimes has problems with the GCC collect2 program. For some > + # reason, if we set the COLLECT_NAMES environment variable, the > problems > + # vanish in a puff of smoke. > + if test "X${COLLECT_NAMES+set}" != Xset; then > + COLLECT_NAMES= > + export COLLECT_NAMES > + fi > + ;; > +esac > + > +# Sed substitution that helps us do robust quoting. It backslashifies > +# metacharacters that are still active within double-quoted strings. > +Xsed='sed -e 1s/^X//' > +[sed_quote_subst='s/\([\\"\\`$\\\\]\)/\\\1/g'] > + > +# Same as above, but do not quote variable references. > +[double_quote_subst='s/\([\\"\\`\\\\]\)/\\\1/g'] > + > +# Sed substitution to delay expansion of an escaped shell variable in a > +# double_quote_subst'ed string. > +delay_variable_subst='s/\\\\\\\\\\\$/\\\\\\$/g' > + > +# Sed substitution to avoid accidental globbing in evaled expressions > +no_glob_subst='s/\*/\\\*/g' > + > +# Constants: > +rm="rm -f" > + > +# Global variables: > +default_ofile=libtool > +can_build_shared=yes > + > +# All known linkers require a `.a' archive for static linking (except > MSVC, > +# which needs '.lib'). > +libext=a > +ltmain="$ac_aux_dir/ltmain.sh" > +ofile="$default_ofile" > +with_gnu_ld="$lt_cv_prog_gnu_ld" > + > +AC_CHECK_TOOL(AR, ar, false) > +AC_CHECK_TOOL(RANLIB, ranlib, :) > +AC_CHECK_TOOL(STRIP, strip, :) > + > +old_CC="$CC" > +old_CFLAGS="$CFLAGS" > + > +# Set sane defaults for various variables > +test -z "$AR" && AR=ar > +test -z "$AR_FLAGS" && AR_FLAGS=cru > +test -z "$AS" && AS=as > +test -z "$CC" && CC=cc > +test -z "$LTCC" && LTCC=$CC > +test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS > +test -z "$DLLTOOL" && DLLTOOL=dlltool > +test -z "$LD" && LD=ld > +test -z "$LN_S" && LN_S="ln -s" > +test -z "$MAGIC_CMD" && MAGIC_CMD=file > +test -z "$NM" && NM=nm > +test -z "$SED" && SED=sed > +test -z "$OBJDUMP" && OBJDUMP=objdump > +test -z "$RANLIB" && RANLIB=: > +test -z "$STRIP" && STRIP=: > +test -z "$ac_objext" && ac_objext=o > + > +# Determine commands to create old-style static archives. > +old_archive_cmds='$AR $AR_FLAGS $oldlib$oldobjs$old_deplibs' > +old_postinstall_cmds='chmod 644 $oldlib' > +old_postuninstall_cmds= > + > +if test -n "$RANLIB"; then > + case $host_os in > + openbsd*) > + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$oldlib" > + ;; > + *) > + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$oldlib" > + ;; > + esac > + old_archive_cmds="$old_archive_cmds~\$RANLIB \$oldlib" > +fi > + > +_LT_CC_BASENAME([$compiler]) > + > +# Only perform the check for file, if the check method requires it > +case $deplibs_check_method in > +file_magic*) > + if test "$file_magic_cmd" = '$MAGIC_CMD'; then > + AC_PATH_MAGIC > + fi > + ;; > +esac > + > +AC_PROVIDE_IFELSE([AC_LIBTOOL_DLOPEN], enable_dlopen=yes, > enable_dlopen=no) > +AC_PROVIDE_IFELSE([AC_LIBTOOL_WIN32_DLL], > +enable_win32_dll=yes, enable_win32_dll=no) > + > +AC_ARG_ENABLE([libtool-lock], > + [AC_HELP_STRING([--disable-libtool-lock], > + [avoid locking (might break parallel builds)])]) > +test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes > + > +AC_ARG_WITH([pic], > + [AC_HELP_STRING([--with-pic], > + [try to use only PIC/non-PIC objects @<:@default=use > both@:>@])], > + [pic_mode="$withval"], > + [pic_mode=default]) > +test -z "$pic_mode" && pic_mode=default > + > +# Use C for the default configuration in the libtool script > +tagname= > +AC_LIBTOOL_LANG_C_CONFIG > +_LT_AC_TAGCONFIG > +])# AC_LIBTOOL_SETUP > + > + > +# _LT_AC_SYS_COMPILER > +# ------------------- > +AC_DEFUN([_LT_AC_SYS_COMPILER], > +[AC_REQUIRE([AC_PROG_CC])dnl > + > +# If no C compiler was specified, use CC. > +LTCC=${LTCC-"$CC"} > + > +# If no C compiler flags were specified, use CFLAGS. > +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} > + > +# Allow CC to be a program name with arguments. > +compiler=$CC > +])# _LT_AC_SYS_COMPILER > + > + > +# _LT_CC_BASENAME(CC) > +# ------------------- > +# Calculate cc_basename. Skip known compiler wrappers and > cross-prefix. > +AC_DEFUN([_LT_CC_BASENAME], > +[for cc_temp in $1""; do > + case $cc_temp in > + compile | *[[\\/]]compile | ccache | *[[\\/]]ccache ) ;; > + distcc | *[[\\/]]distcc | purify | *[[\\/]]purify ) ;; > + \-*) ;; > + *) break;; > + esac > +done > +cc_basename=`$echo "X$cc_temp" | $Xsed -e 's%.*/%%' -e > "s%^$host_alias-%%"` > +]) > + > + > +# _LT_COMPILER_BOILERPLATE > +# ------------------------ > +# Check for compiler boilerplate output or warnings with > +# the simple compiler test code. > +AC_DEFUN([_LT_COMPILER_BOILERPLATE], > +[ac_outfile=conftest.$ac_objext > +printf "$lt_simple_compile_test_code" >conftest.$ac_ext > +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' > >conftest.err > +_lt_compiler_boilerplate=`cat conftest.err` > +$rm conftest* > +])# _LT_COMPILER_BOILERPLATE > + > + > +# _LT_LINKER_BOILERPLATE > +# ---------------------- > +# Check for linker boilerplate output or warnings with > +# the simple link test code. > +AC_DEFUN([_LT_LINKER_BOILERPLATE], > +[ac_outfile=conftest.$ac_objext > +printf "$lt_simple_link_test_code" >conftest.$ac_ext > +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err > +_lt_linker_boilerplate=`cat conftest.err` > +$rm conftest* > +])# _LT_LINKER_BOILERPLATE > + > + > +# _LT_AC_SYS_LIBPATH_AIX > +# ---------------------- > +# Links a minimal program and checks the executable > +# for the system default hardcoded library path. In most cases, > +# this is /usr/lib:/lib, but when the MPI compilers are used > +# the location of the communication and MPI libs are included too. > +# If we don't find anything, use the default library path according > +# to the aix ld manual. > +AC_DEFUN([_LT_AC_SYS_LIBPATH_AIX], > +[AC_LINK_IFELSE(AC_LANG_PROGRAM,[ > +aix_libpath=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e > '/Import File Strings/,/^$/ { /^0/ { s/^0 *\(.*\)$/\1/; p; } > +}'` > +# Check for a 64-bit object if we didn't find anything. > +if test -z "$aix_libpath"; then aix_libpath=`dump -HX64 > conftest$ac_exeext 2>/dev/null | $SED -n -e '/Import File Strings/,/^$/ > { /^0/ { s/^0 *\(.*\)$/\1/; p; } > +}'`; fi],[]) > +if test -z "$aix_libpath"; then aix_libpath="/usr/lib:/lib"; fi > +])# _LT_AC_SYS_LIBPATH_AIX > + > + > +# _LT_AC_SHELL_INIT(ARG) > +# ---------------------- > +AC_DEFUN([_LT_AC_SHELL_INIT], > +[ifdef([AC_DIVERSION_NOTICE], > + [AC_DIVERT_PUSH(AC_DIVERSION_NOTICE)], > + [AC_DIVERT_PUSH(NOTICE)]) > +$1 > +AC_DIVERT_POP > +])# _LT_AC_SHELL_INIT > + > + > +# _LT_AC_PROG_ECHO_BACKSLASH > +# -------------------------- > +# Add some code to the start of the generated configure script which > +# will find an echo command which doesn't interpret backslashes. > +AC_DEFUN([_LT_AC_PROG_ECHO_BACKSLASH], > +[_LT_AC_SHELL_INIT([ > +# Check that we are running under the correct shell. > +SHELL=${CONFIG_SHELL-/bin/sh} > + > +case X$ECHO in > +X*--fallback-echo) > + # Remove one level of quotation (which was required for Make). > + ECHO=`echo "$ECHO" | sed 's,\\\\\[$]\\[$]0,'[$]0','` > + ;; > +esac > + > +echo=${ECHO-echo} > +if test "X[$]1" = X--no-reexec; then > + # Discard the --no-reexec flag, and continue. > + shift > +elif test "X[$]1" = X--fallback-echo; then > + # Avoid inline document here, it may be left over > + : > +elif test "X`($echo '\t') 2>/dev/null`" = 'X\t' ; then > + # Yippee, $echo works! > + : > +else > + # Restart under the correct shell. > + exec $SHELL "[$]0" --no-reexec ${1+"[$]@"} > +fi > + > +if test "X[$]1" = X--fallback-echo; then > + # used as fallback echo > + shift > + cat < +[$]* > +EOF > + exit 0 > +fi > + > +# The HP-UX ksh and POSIX shell print the target directory to stdout > +# if CDPATH is set. > +(unset CDPATH) >/dev/null 2>&1 && unset CDPATH > + > +if test -z "$ECHO"; then > +if test "X${echo_test_string+set}" != Xset; then > +# find a string as large as possible, as long as the shell can cope > with it > + for cmd in 'sed 50q "[$]0"' 'sed 20q "[$]0"' 'sed 10q "[$]0"' 'sed 2q > "[$]0"' 'echo test'; do > + # expected sizes: less than 2Kb, 1Kb, 512 bytes, 16 bytes, ... > + if (echo_test_string=`eval $cmd`) 2>/dev/null && > + echo_test_string=`eval $cmd` && > + (test "X$echo_test_string" = "X$echo_test_string") 2>/dev/null > + then > + break > + fi > + done > +fi > + > +if test "X`($echo '\t') 2>/dev/null`" = 'X\t' && > + echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` && > + test "X$echo_testing_string" = "X$echo_test_string"; then > + : > +else > + # The Solaris, AIX, and Digital Unix default echo programs unquote > + # backslashes. This makes it impossible to quote backslashes using > + # echo "$something" | sed 's/\\/\\\\/g' > + # > + # So, first we look for a working echo in the user's PATH. > + > + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR > + for dir in $PATH /usr/ucb; do > + IFS="$lt_save_ifs" > + if (test -f $dir/echo || test -f $dir/echo$ac_exeext) && > + test "X`($dir/echo '\t') 2>/dev/null`" = 'X\t' && > + echo_testing_string=`($dir/echo "$echo_test_string") > 2>/dev/null` && > + test "X$echo_testing_string" = "X$echo_test_string"; then > + echo="$dir/echo" > + break > + fi > + done > + IFS="$lt_save_ifs" > + > + if test "X$echo" = Xecho; then > + # We didn't find a better echo, so look for alternatives. > + if test "X`(print -r '\t') 2>/dev/null`" = 'X\t' && > + echo_testing_string=`(print -r "$echo_test_string") 2>/dev/null` > && > + test "X$echo_testing_string" = "X$echo_test_string"; then > + # This shell has a builtin print -r that does the trick. > + echo='print -r' > + elif (test -f /bin/ksh || test -f /bin/ksh$ac_exeext) && > + test "X$CONFIG_SHELL" != X/bin/ksh; then > + # If we have ksh, try running configure again with it. > + ORIGINAL_CONFIG_SHELL=${CONFIG_SHELL-/bin/sh} > + export ORIGINAL_CONFIG_SHELL > + CONFIG_SHELL=/bin/ksh > + export CONFIG_SHELL > + exec $CONFIG_SHELL "[$]0" --no-reexec ${1+"[$]@"} > + else > + # Try using printf. > + echo='printf %s\n' > + if test "X`($echo '\t') 2>/dev/null`" = 'X\t' && > + echo_testing_string=`($echo "$echo_test_string") 2>/dev/null` > && > + test "X$echo_testing_string" = "X$echo_test_string"; then > + # Cool, printf works > + : > + elif echo_testing_string=`($ORIGINAL_CONFIG_SHELL "[$]0" > --fallback-echo '\t') 2>/dev/null` && > + test "X$echo_testing_string" = 'X\t' && > + echo_testing_string=`($ORIGINAL_CONFIG_SHELL "[$]0" > --fallback-echo "$echo_test_string") 2>/dev/null` && > + test "X$echo_testing_string" = "X$echo_test_string"; then > + CONFIG_SHELL=$ORIGINAL_CONFIG_SHELL > + export CONFIG_SHELL > + SHELL="$CONFIG_SHELL" > + export SHELL > + echo="$CONFIG_SHELL [$]0 --fallback-echo" > + elif echo_testing_string=`($CONFIG_SHELL "[$]0" --fallback-echo > '\t') 2>/dev/null` && > + test "X$echo_testing_string" = 'X\t' && > + echo_testing_string=`($CONFIG_SHELL "[$]0" --fallback-echo > "$echo_test_string") 2>/dev/null` && > + test "X$echo_testing_string" = "X$echo_test_string"; then > + echo="$CONFIG_SHELL [$]0 --fallback-echo" > + else > + # maybe with a smaller string... > + prev=: > + > + for cmd in 'echo test' 'sed 2q "[$]0"' 'sed 10q "[$]0"' 'sed 20q > "[$]0"' 'sed 50q "[$]0"'; do > + if (test "X$echo_test_string" = "X`eval $cmd`") 2>/dev/null > + then > + break > + fi > + prev="$cmd" > + done > + > + if test "$prev" != 'sed 50q "[$]0"'; then > + echo_test_string=`eval $prev` > + export echo_test_string > + exec ${ORIGINAL_CONFIG_SHELL-${CONFIG_SHELL-/bin/sh}} "[$]0" > ${1+"[$]@"} > + else > + # Oops. We lost completely, so just stick with echo. > + echo=echo > + fi > + fi > + fi > + fi > +fi > +fi > + > +# Copy echo and quote the copy suitably for passing to libtool from > +# the Makefile, instead of quoting the original, which is used later. > +ECHO=$echo > +if test "X$ECHO" = "X$CONFIG_SHELL [$]0 --fallback-echo"; then > + ECHO="$CONFIG_SHELL \\\$\[$]0 --fallback-echo" > +fi > + > +AC_SUBST(ECHO) > +])])# _LT_AC_PROG_ECHO_BACKSLASH > + > + > +# _LT_AC_LOCK > +# ----------- > +AC_DEFUN([_LT_AC_LOCK], > +[AC_ARG_ENABLE([libtool-lock], > + [AC_HELP_STRING([--disable-libtool-lock], > + [avoid locking (might break parallel builds)])]) > +test "x$enable_libtool_lock" != xno && enable_libtool_lock=yes > + > +# Some flags need to be propagated to the compiler or linker for good > +# libtool support. > +case $host in > +ia64-*-hpux*) > + # Find out which ABI we are using. > + echo 'int i;' > conftest.$ac_ext > + if AC_TRY_EVAL(ac_compile); then > + case `/usr/bin/file conftest.$ac_objext` in > + *ELF-32*) > + HPUX_IA64_MODE="32" > + ;; > + *ELF-64*) > + HPUX_IA64_MODE="64" > + ;; > + esac > + fi > + rm -rf conftest* > + ;; > +*-*-irix6*) > + # Find out which ABI we are using. > + echo '[#]line __oline__ "configure"' > conftest.$ac_ext > + if AC_TRY_EVAL(ac_compile); then > + if test "$lt_cv_prog_gnu_ld" = yes; then > + case `/usr/bin/file conftest.$ac_objext` in > + *32-bit*) > + LD="${LD-ld} -melf32bsmip" > + ;; > + *N32*) > + LD="${LD-ld} -melf32bmipn32" > + ;; > + *64-bit*) > + LD="${LD-ld} -melf64bmip" > + ;; > + esac > + else > + case `/usr/bin/file conftest.$ac_objext` in > + *32-bit*) > + LD="${LD-ld} -32" > + ;; > + *N32*) > + LD="${LD-ld} -n32" > + ;; > + *64-bit*) > + LD="${LD-ld} -64" > + ;; > + esac > + fi > + fi > + rm -rf conftest* > + ;; > + > +x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*|s390*-*linux*|sparc*-*linu > x*) > + # Find out which ABI we are using. > + echo 'int i;' > conftest.$ac_ext > + if AC_TRY_EVAL(ac_compile); then > + case `/usr/bin/file conftest.o` in > + *32-bit*) > + case $host in > + x86_64-*linux*) > + LD="${LD-ld} -m elf_i386" > + ;; > + ppc64-*linux*|powerpc64-*linux*) > + LD="${LD-ld} -m elf32ppclinux" > + ;; > + s390x-*linux*) > + LD="${LD-ld} -m elf_s390" > + ;; > + sparc64-*linux*) > + LD="${LD-ld} -m elf32_sparc" > + ;; > + esac > + ;; > + *64-bit*) > + case $host in > + x86_64-*linux*) > + LD="${LD-ld} -m elf_x86_64" > + ;; > + ppc*-*linux*|powerpc*-*linux*) > + LD="${LD-ld} -m elf64ppc" > + ;; > + s390*-*linux*) > + LD="${LD-ld} -m elf64_s390" > + ;; > + sparc*-*linux*) > + LD="${LD-ld} -m elf64_sparc" > + ;; > + esac > + ;; > + esac > + fi > + rm -rf conftest* > + ;; > + > +*-*-sco3.2v5*) > + # On SCO OpenServer 5, we need -belf to get full-featured binaries. > + SAVE_CFLAGS="$CFLAGS" > + CFLAGS="$CFLAGS -belf" > + AC_CACHE_CHECK([whether the C compiler needs -belf], > lt_cv_cc_needs_belf, > + [AC_LANG_PUSH(C) > + > AC_TRY_LINK([],[],[lt_cv_cc_needs_belf=yes],[lt_cv_cc_needs_belf=no]) > + AC_LANG_POP]) > + if test x"$lt_cv_cc_needs_belf" != x"yes"; then > + # this is probably gcc 2.8.0, egcs 1.0 or newer; no need for -belf > + CFLAGS="$SAVE_CFLAGS" > + fi > + ;; > +sparc*-*solaris*) > + # Find out which ABI we are using. > + echo 'int i;' > conftest.$ac_ext > + if AC_TRY_EVAL(ac_compile); then > + case `/usr/bin/file conftest.o` in > + *64-bit*) > + case $lt_cv_prog_gnu_ld in > + yes*) LD="${LD-ld} -m elf64_sparc" ;; > + *) LD="${LD-ld} -64" ;; > + esac > + ;; > + esac > + fi > + rm -rf conftest* > + ;; > + > +AC_PROVIDE_IFELSE([AC_LIBTOOL_WIN32_DLL], > +[*-*-cygwin* | *-*-mingw* | *-*-pw32*) > + AC_CHECK_TOOL(DLLTOOL, dlltool, false) > + AC_CHECK_TOOL(AS, as, false) > + AC_CHECK_TOOL(OBJDUMP, objdump, false) > + ;; > + ]) > +esac > + > +need_locks="$enable_libtool_lock" > + > +])# _LT_AC_LOCK > + > + > +# AC_LIBTOOL_COMPILER_OPTION(MESSAGE, VARIABLE-NAME, FLAGS, > +# [OUTPUT-FILE], [ACTION-SUCCESS], [ACTION-FAILURE]) > +# ---------------------------------------------------------------- > +# Check whether the given compiler option works > +AC_DEFUN([AC_LIBTOOL_COMPILER_OPTION], > +[AC_REQUIRE([LT_AC_PROG_SED]) > +AC_CACHE_CHECK([$1], [$2], > + [$2=no > + ifelse([$4], , [ac_outfile=conftest.$ac_objext], [ac_outfile=$4]) > + printf "$lt_simple_compile_test_code" > conftest.$ac_ext > + lt_compiler_flag="$3" > + # Insert the option either (1) after the last *FLAGS variable, or > + # (2) before a word containing "conftest.", or (3) at the end. > + # Note that $ac_compile itself does not contain backslashes and > begins > + # with a dollar sign (not a hyphen), so the echo should work > correctly. > + # The option is referenced via a variable to avoid confusing sed. > + lt_compile=`echo "$ac_compile" | $SED \ > + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ > + -e 's: [[^ ]]*conftest\.: $lt_compiler_flag&:; t' \ > + -e 's:$: $lt_compiler_flag:'` > + (eval echo "\"\$as_me:__oline__: $lt_compile\"" >&AS_MESSAGE_LOG_FD) > + (eval "$lt_compile" 2>conftest.err) > + ac_status=$? > + cat conftest.err >&AS_MESSAGE_LOG_FD > + echo "$as_me:__oline__: \$? = $ac_status" >&AS_MESSAGE_LOG_FD > + if (exit $ac_status) && test -s "$ac_outfile"; then > + # The compiler can only warn and ignore the option if not > recognized > + # So say no if there are warnings other than the usual output. > + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > >conftest.exp > + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 > + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 > >/dev/null; then > + $2=yes > + fi > + fi > + $rm conftest* > +]) > + > +if test x"[$]$2" = xyes; then > + ifelse([$5], , :, [$5]) > +else > + ifelse([$6], , :, [$6]) > +fi > +])# AC_LIBTOOL_COMPILER_OPTION > + > + > +# AC_LIBTOOL_LINKER_OPTION(MESSAGE, VARIABLE-NAME, FLAGS, > +# [ACTION-SUCCESS], [ACTION-FAILURE]) > +# ------------------------------------------------------------ > +# Check whether the given compiler option works > +AC_DEFUN([AC_LIBTOOL_LINKER_OPTION], > +[AC_CACHE_CHECK([$1], [$2], > + [$2=no > + save_LDFLAGS="$LDFLAGS" > + LDFLAGS="$LDFLAGS $3" > + printf "$lt_simple_link_test_code" > conftest.$ac_ext > + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; > then > + # The linker can only warn and ignore the option if not recognized > + # So say no if there are warnings > + if test -s conftest.err; then > + # Append any errors to the config.log. > + cat conftest.err 1>&AS_MESSAGE_LOG_FD > + $echo "X$_lt_linker_boilerplate" | $Xsed -e '/^$/d' > > conftest.exp > + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 > + if diff conftest.exp conftest.er2 >/dev/null; then > + $2=yes > + fi > + else > + $2=yes > + fi > + fi > + $rm conftest* > + LDFLAGS="$save_LDFLAGS" > +]) > + > +if test x"[$]$2" = xyes; then > + ifelse([$4], , :, [$4]) > +else > + ifelse([$5], , :, [$5]) > +fi > +])# AC_LIBTOOL_LINKER_OPTION > + > + > +# AC_LIBTOOL_SYS_MAX_CMD_LEN > +# -------------------------- > +AC_DEFUN([AC_LIBTOOL_SYS_MAX_CMD_LEN], > +[# find the maximum length of command line arguments > +AC_MSG_CHECKING([the maximum length of command line arguments]) > +AC_CACHE_VAL([lt_cv_sys_max_cmd_len], [dnl > + i=0 > + teststring="ABCD" > + > + case $build_os in > + msdosdjgpp*) > + # On DJGPP, this test can blow up pretty badly due to problems in > libc > + # (any single argument exceeding 2000 bytes causes a buffer overrun > + # during glob expansion). Even if it were fixed, the result of > this > + # check would be larger than it should be. > + lt_cv_sys_max_cmd_len=12288; # 12K is about right > + ;; > + > + gnu*) > + # Under GNU Hurd, this test is not required because there is > + # no limit to the length of command line arguments. > + # Libtool will interpret -1 as no limit whatsoever > + lt_cv_sys_max_cmd_len=-1; > + ;; > + > + cygwin* | mingw*) > + # On Win9x/ME, this test blows up -- it succeeds, but takes > + # about 5 minutes as the teststring grows exponentially. > + # Worse, since 9x/ME are not pre-emptively multitasking, > + # you end up with a "frozen" computer, even though with patience > + # the test eventually succeeds (with a max line length of 256k). > + # Instead, let's just punt: use the minimum linelength reported by > + # all of the supported platforms: 8192 (on NT/2K/XP). > + lt_cv_sys_max_cmd_len=8192; > + ;; > + > + amigaos*) > + # On AmigaOS with pdksh, this test takes hours, literally. > + # So we just punt and use a minimum line length of 8192. > + lt_cv_sys_max_cmd_len=8192; > + ;; > + > + netbsd* | freebsd* | openbsd* | darwin* | dragonfly*) > + # This has been around since 386BSD, at least. Likely further. > + if test -x /sbin/sysctl; then > + lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax` > + elif test -x /usr/sbin/sysctl; then > + lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax` > + else > + lt_cv_sys_max_cmd_len=65536 # usable default for all BSDs > + fi > + # And add a safety zone > + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` > + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` > + ;; > + > + interix*) > + # We know the value 262144 and hardcode it with a safety zone (like > BSD) > + lt_cv_sys_max_cmd_len=196608 > + ;; > + > + osf*) > + # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running > configure > + # due to this test when exec_disable_arg_limit is 1 on Tru64. It is > not > + # nice to cause kernel panics so lets avoid the loop below. > + # First set a reasonable default. > + lt_cv_sys_max_cmd_len=16384 > + # > + if test -x /sbin/sysconfig; then > + case `/sbin/sysconfig -q proc exec_disable_arg_limit` in > + *1*) lt_cv_sys_max_cmd_len=-1 ;; > + esac > + fi > + ;; > + sco3.2v5*) > + lt_cv_sys_max_cmd_len=102400 > + ;; > + sysv5* | sco5v6* | sysv4.2uw2*) > + kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null` > + if test -n "$kargmax"; then > + lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[[ ]]//'` > + else > + lt_cv_sys_max_cmd_len=32768 > + fi > + ;; > + *) > + # If test is not a shell built-in, we'll probably end up computing > a > + # maximum length that is only half of the actual maximum length, > but > + # we can't tell. > + SHELL=${SHELL-${CONFIG_SHELL-/bin/sh}} > + while (test "X"`$SHELL [$]0 --fallback-echo "X$teststring" > 2>/dev/null` \ > + = "XX$teststring") >/dev/null 2>&1 && > + new_result=`expr "X$teststring" : ".*" 2>&1` && > + lt_cv_sys_max_cmd_len=$new_result && > + test $i != 17 # 1/2 MB should be enough > + do > + i=`expr $i + 1` > + teststring=$teststring$teststring > + done > + teststring= > + # Add a significant safety factor because C++ compilers can tack on > massive > + # amounts of additional arguments before passing them to the > linker. > + # It appears as though 1/2 is a usable value. > + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 2` > + ;; > + esac > +]) > +if test -n $lt_cv_sys_max_cmd_len ; then > + AC_MSG_RESULT($lt_cv_sys_max_cmd_len) > +else > + AC_MSG_RESULT(none) > +fi > +])# AC_LIBTOOL_SYS_MAX_CMD_LEN > + > + > +# _LT_AC_CHECK_DLFCN > +# ------------------ > +AC_DEFUN([_LT_AC_CHECK_DLFCN], > +[AC_CHECK_HEADERS(dlfcn.h)dnl > +])# _LT_AC_CHECK_DLFCN > + > + > +# _LT_AC_TRY_DLOPEN_SELF (ACTION-IF-TRUE, ACTION-IF-TRUE-W-USCORE, > +# ACTION-IF-FALSE, ACTION-IF-CROSS-COMPILING) > +# --------------------------------------------------------------------- > +AC_DEFUN([_LT_AC_TRY_DLOPEN_SELF], > +[AC_REQUIRE([_LT_AC_CHECK_DLFCN])dnl > +if test "$cross_compiling" = yes; then : > + [$4] > +else > + lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 > + lt_status=$lt_dlunknown > + cat > conftest.$ac_ext < +[#line __oline__ "configure" > +#include "confdefs.h" > + > +#if HAVE_DLFCN_H > +#include > +#endif > + > +#include > + > +#ifdef RTLD_GLOBAL > +# define LT_DLGLOBAL RTLD_GLOBAL > +#else > +# ifdef DL_GLOBAL > +# define LT_DLGLOBAL DL_GLOBAL > +# else > +# define LT_DLGLOBAL 0 > +# endif > +#endif > + > +/* We may have to define LT_DLLAZY_OR_NOW in the command line if we > + find out it does not work in some platform. */ > +#ifndef LT_DLLAZY_OR_NOW > +# ifdef RTLD_LAZY > +# define LT_DLLAZY_OR_NOW RTLD_LAZY > +# else > +# ifdef DL_LAZY > +# define LT_DLLAZY_OR_NOW DL_LAZY > +# else > +# ifdef RTLD_NOW > +# define LT_DLLAZY_OR_NOW RTLD_NOW > +# else > +# ifdef DL_NOW > +# define LT_DLLAZY_OR_NOW DL_NOW > +# else > +# define LT_DLLAZY_OR_NOW 0 > +# endif > +# endif > +# endif > +# endif > +#endif > + > +#ifdef __cplusplus > +extern "C" void exit (int); > +#endif > + > +void fnord() { int i=42;} > +int main () > +{ > + void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); > + int status = $lt_dlunknown; > + > + if (self) > + { > + if (dlsym (self,"fnord")) status = $lt_dlno_uscore; > + else if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; > + /* dlclose (self); */ > + } > + else > + puts (dlerror ()); > + > + exit (status); > +}] > +EOF > + if AC_TRY_EVAL(ac_link) && test -s conftest${ac_exeext} 2>/dev/null; > then > + (./conftest; exit; ) >&AS_MESSAGE_LOG_FD 2>/dev/null > + lt_status=$? > + case x$lt_status in > + x$lt_dlno_uscore) $1 ;; > + x$lt_dlneed_uscore) $2 ;; > + x$lt_dlunknown|x*) $3 ;; > + esac > + else : > + # compilation failed > + $3 > + fi > +fi > +rm -fr conftest* > +])# _LT_AC_TRY_DLOPEN_SELF > + > + > +# AC_LIBTOOL_DLOPEN_SELF > +# ---------------------- > +AC_DEFUN([AC_LIBTOOL_DLOPEN_SELF], > +[AC_REQUIRE([_LT_AC_CHECK_DLFCN])dnl > +if test "x$enable_dlopen" != xyes; then > + enable_dlopen=unknown > + enable_dlopen_self=unknown > + enable_dlopen_self_static=unknown > +else > + lt_cv_dlopen=no > + lt_cv_dlopen_libs= > + > + case $host_os in > + beos*) > + lt_cv_dlopen="load_add_on" > + lt_cv_dlopen_libs= > + lt_cv_dlopen_self=yes > + ;; > + > + mingw* | pw32*) > + lt_cv_dlopen="LoadLibrary" > + lt_cv_dlopen_libs= > + ;; > + > + cygwin*) > + lt_cv_dlopen="dlopen" > + lt_cv_dlopen_libs= > + ;; > + > + darwin*) > + # if libdl is installed we need to link against it > + AC_CHECK_LIB([dl], [dlopen], > + [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl"],[ > + lt_cv_dlopen="dyld" > + lt_cv_dlopen_libs= > + lt_cv_dlopen_self=yes > + ]) > + ;; > + > + *) > + AC_CHECK_FUNC([shl_load], > + [lt_cv_dlopen="shl_load"], > + [AC_CHECK_LIB([dld], [shl_load], > + [lt_cv_dlopen="shl_load" lt_cv_dlopen_libs="-dld"], > + [AC_CHECK_FUNC([dlopen], > + [lt_cv_dlopen="dlopen"], > + [AC_CHECK_LIB([dl], [dlopen], > + [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-ldl"], > + [AC_CHECK_LIB([svld], [dlopen], > + [lt_cv_dlopen="dlopen" lt_cv_dlopen_libs="-lsvld"], > + [AC_CHECK_LIB([dld], [dld_link], > + [lt_cv_dlopen="dld_link" lt_cv_dlopen_libs="-dld"]) > + ]) > + ]) > + ]) > + ]) > + ]) > + ;; > + esac > + > + if test "x$lt_cv_dlopen" != xno; then > + enable_dlopen=yes > + else > + enable_dlopen=no > + fi > + > + case $lt_cv_dlopen in > + dlopen) > + save_CPPFLAGS="$CPPFLAGS" > + test "x$ac_cv_header_dlfcn_h" = xyes && CPPFLAGS="$CPPFLAGS > -DHAVE_DLFCN_H" > + > + save_LDFLAGS="$LDFLAGS" > + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS > $export_dynamic_flag_spec\" > + > + save_LIBS="$LIBS" > + LIBS="$lt_cv_dlopen_libs $LIBS" > + > + AC_CACHE_CHECK([whether a program can dlopen itself], > + lt_cv_dlopen_self, [dnl > + _LT_AC_TRY_DLOPEN_SELF( > + lt_cv_dlopen_self=yes, lt_cv_dlopen_self=yes, > + lt_cv_dlopen_self=no, lt_cv_dlopen_self=cross) > + ]) > + > + if test "x$lt_cv_dlopen_self" = xyes; then > + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS > $lt_prog_compiler_static\" > + AC_CACHE_CHECK([whether a statically linked program can dlopen > itself], > + lt_cv_dlopen_self_static, [dnl > + _LT_AC_TRY_DLOPEN_SELF( > + lt_cv_dlopen_self_static=yes, lt_cv_dlopen_self_static=yes, > + lt_cv_dlopen_self_static=no, > lt_cv_dlopen_self_static=cross) > + ]) > + fi > + > + CPPFLAGS="$save_CPPFLAGS" > + LDFLAGS="$save_LDFLAGS" > + LIBS="$save_LIBS" > + ;; > + esac > + > + case $lt_cv_dlopen_self in > + yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; > + *) enable_dlopen_self=unknown ;; > + esac > + > + case $lt_cv_dlopen_self_static in > + yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; > + *) enable_dlopen_self_static=unknown ;; > + esac > +fi > +])# AC_LIBTOOL_DLOPEN_SELF > + > + > +# AC_LIBTOOL_PROG_CC_C_O([TAGNAME]) > +# --------------------------------- > +# Check to see if options -c and -o are simultaneously supported by > compiler > +AC_DEFUN([AC_LIBTOOL_PROG_CC_C_O], > +[AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl > +AC_CACHE_CHECK([if $compiler supports -c -o file.$ac_objext], > + [_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)], > + [_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=no > + $rm -r conftest 2>/dev/null > + mkdir conftest > + cd conftest > + mkdir out > + printf "$lt_simple_compile_test_code" > conftest.$ac_ext > + > + lt_compiler_flag="-o out/conftest2.$ac_objext" > + # Insert the option either (1) after the last *FLAGS variable, or > + # (2) before a word containing "conftest.", or (3) at the end. > + # Note that $ac_compile itself does not contain backslashes and > begins > + # with a dollar sign (not a hyphen), so the echo should work > correctly. > + lt_compile=`echo "$ac_compile" | $SED \ > + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ > + -e 's: [[^ ]]*conftest\.: $lt_compiler_flag&:; t' \ > + -e 's:$: $lt_compiler_flag:'` > + (eval echo "\"\$as_me:__oline__: $lt_compile\"" >&AS_MESSAGE_LOG_FD) > + (eval "$lt_compile" 2>out/conftest.err) > + ac_status=$? > + cat out/conftest.err >&AS_MESSAGE_LOG_FD > + echo "$as_me:__oline__: \$? = $ac_status" >&AS_MESSAGE_LOG_FD > + if (exit $ac_status) && test -s out/conftest2.$ac_objext > + then > + # The compiler can only warn and ignore the option if not > recognized > + # So say no if there are warnings > + $echo "X$_lt_compiler_boilerplate" | $Xsed -e '/^$/d' > > out/conftest.exp > + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 > + if test ! -s out/conftest.er2 || diff out/conftest.exp > out/conftest.er2 >/dev/null; then > + _LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=yes > + fi > + fi > + chmod u+w . 2>&AS_MESSAGE_LOG_FD > + $rm conftest* > + # SGI C++ compiler will create directory out/ii_files/ for > + # template instantiation > + test -d out/ii_files && $rm out/ii_files/* && rmdir out/ii_files > + $rm out/* && rmdir out > + cd .. > + rmdir conftest > + $rm conftest* > +]) > +])# AC_LIBTOOL_PROG_CC_C_O > + > + > +# AC_LIBTOOL_SYS_HARD_LINK_LOCKS([TAGNAME]) > +# ----------------------------------------- > +# Check to see if we can do hard links to lock some files if needed > +AC_DEFUN([AC_LIBTOOL_SYS_HARD_LINK_LOCKS], > +[AC_REQUIRE([_LT_AC_LOCK])dnl > + > +hard_links="nottested" > +if test "$_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)" = no && test > "$need_locks" != no; then > + # do not overwrite the value of need_locks provided by the user > + AC_MSG_CHECKING([if we can lock with hard links]) > + hard_links=yes > + $rm conftest* > + ln conftest.a conftest.b 2>/dev/null && hard_links=no > + touch conftest.a > + ln conftest.a conftest.b 2>&5 || hard_links=no > + ln conftest.a conftest.b 2>/dev/null && hard_links=no > + AC_MSG_RESULT([$hard_links]) > + if test "$hard_links" = no; then > + AC_MSG_WARN([`$CC' does not support `-c -o', so `make -j' may be > unsafe]) > + need_locks=warn > + fi > +else > + need_locks=no > +fi > +])# AC_LIBTOOL_SYS_HARD_LINK_LOCKS > + > + > +# AC_LIBTOOL_OBJDIR > +# ----------------- > +AC_DEFUN([AC_LIBTOOL_OBJDIR], > +[AC_CACHE_CHECK([for objdir], [lt_cv_objdir], > +[rm -f .libs 2>/dev/null > +mkdir .libs 2>/dev/null > +if test -d .libs; then > + lt_cv_objdir=.libs > +else > + # MS-DOS does not allow filenames that begin with a dot. > + lt_cv_objdir=_libs > +fi > +rmdir .libs 2>/dev/null]) > +objdir=$lt_cv_objdir > +])# AC_LIBTOOL_OBJDIR > + > + > +# AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH([TAGNAME]) > +# ---------------------------------------------- > +# Check hardcoding attributes. > +AC_DEFUN([AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH], > +[AC_MSG_CHECKING([how to hardcode library paths into programs]) > +_LT_AC_TAGVAR(hardcode_action, $1)= > +if test -n "$_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)" || \ > + test -n "$_LT_AC_TAGVAR(runpath_var, $1)" || \ > + test "X$_LT_AC_TAGVAR(hardcode_automatic, $1)" = "Xyes" ; then > + > + # We can hardcode non-existant directories. > + if test "$_LT_AC_TAGVAR(hardcode_direct, $1)" != no && > + # If the only mechanism to avoid hardcoding is shlibpath_var, we > + # have to relink, otherwise we might link with an installed > library > + # when we should be linking with a yet-to-be-installed one > + ## test "$_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)" != no && > + test "$_LT_AC_TAGVAR(hardcode_minus_L, $1)" != no; then > + # Linking always hardcodes the temporary library directory. > + _LT_AC_TAGVAR(hardcode_action, $1)=relink > + else > + # We can link without hardcoding, and we can hardcode nonexisting > dirs. > + _LT_AC_TAGVAR(hardcode_action, $1)=immediate > + fi > +else > + # We cannot hardcode anything, or else we can only hardcode existing > + # directories. > + _LT_AC_TAGVAR(hardcode_action, $1)=unsupported > +fi > +AC_MSG_RESULT([$_LT_AC_TAGVAR(hardcode_action, $1)]) > + > +if test "$_LT_AC_TAGVAR(hardcode_action, $1)" = relink; then > + # Fast installation is not supported > + enable_fast_install=no > +elif test "$shlibpath_overrides_runpath" = yes || > + test "$enable_shared" = no; then > + # Fast installation is not necessary > + enable_fast_install=needless > +fi > +])# AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH > + > + > +# AC_LIBTOOL_SYS_LIB_STRIP > +# ------------------------ > +AC_DEFUN([AC_LIBTOOL_SYS_LIB_STRIP], > +[striplib= > +old_striplib= > +AC_MSG_CHECKING([whether stripping libraries is possible]) > +if test -n "$STRIP" && $STRIP -V 2>&1 | grep "GNU strip" >/dev/null; > then > + test -z "$old_striplib" && old_striplib="$STRIP --strip-debug" > + test -z "$striplib" && striplib="$STRIP --strip-unneeded" > + AC_MSG_RESULT([yes]) > +else > +# FIXME - insert some real tests, host_os isn't really good enough > + case $host_os in > + darwin*) > + if test -n "$STRIP" ; then > + striplib="$STRIP -x" > + AC_MSG_RESULT([yes]) > + else > + AC_MSG_RESULT([no]) > +fi > + ;; > + *) > + AC_MSG_RESULT([no]) > + ;; > + esac > +fi > +])# AC_LIBTOOL_SYS_LIB_STRIP > + > + > +# AC_LIBTOOL_SYS_DYNAMIC_LINKER > +# ----------------------------- > +# PORTME Fill in your ld.so characteristics > +AC_DEFUN([AC_LIBTOOL_SYS_DYNAMIC_LINKER], > +[AC_MSG_CHECKING([dynamic linker characteristics]) > +library_names_spec= > +libname_spec='lib$name' > +soname_spec= > +shrext_cmds=".so" > +postinstall_cmds= > +postuninstall_cmds= > +finish_cmds= > +finish_eval= > +shlibpath_var= > +shlibpath_overrides_runpath=unknown > +version_type=none > +dynamic_linker="$host_os ld.so" > +sys_lib_dlsearch_path_spec="/lib /usr/lib" > +if test "$GCC" = yes; then > + sys_lib_search_path_spec=`$CC -print-search-dirs | grep "^libraries:" > | $SED -e "s/^libraries://" -e "s,=/,/,g"` > + if echo "$sys_lib_search_path_spec" | grep ';' >/dev/null ; then > + # if the path contains ";" then we assume it to be the separator > + # otherwise default to the standard path separator (i.e. ":") - it > is > + # assumed that no part of a normal pathname contains ";" but that > should > + # okay in the real world where ";" in dirpaths is itself > problematic. > + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED > -e 's/;/ /g'` > + else > + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | $SED > -e "s/$PATH_SEPARATOR/ /g"` > + fi > +else > + sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" > +fi > +need_lib_prefix=unknown > +hardcode_into_libs=no > + > +# when you set need_version to no, make sure it does not cause > -set_version > +# flags to be left without arguments > +need_version=unknown > + > +case $host_os in > +aix3*) > + version_type=linux > + library_names_spec='${libname}${release}${shared_ext}$versuffix > $libname.a' > + shlibpath_var=LIBPATH > + > + # AIX 3 has no versioning support, so we append a major version to > the name. > + soname_spec='${libname}${release}${shared_ext}$major' > + ;; > + > +aix4* | aix5*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + hardcode_into_libs=yes > + if test "$host_cpu" = ia64; then > + # AIX 5 supports IA64 > + library_names_spec='${libname}${release}${shared_ext}$major > ${libname}${release}${shared_ext}$versuffix $libname${shared_ext}' > + shlibpath_var=LD_LIBRARY_PATH > + else > + # With GCC up to 2.95.x, collect2 would create an import file > + # for dependence libraries. The import file would start with > + # the line `#! .'. This would cause the generated library to > + # depend on `.', always an invalid library. This was fixed in > + # development snapshots of GCC prior to 3.0. > + case $host_os in > + aix4 | aix4.[[01]] | aix4.[[01]].*) > + if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ > >= 97)' > + echo ' yes ' > + echo '#endif'; } | ${CC} -E - | grep yes > /dev/null; then > + : > + else > + can_build_shared=no > + fi > + ;; > + esac > + # AIX (on Power*) has no versioning support, so currently we can > not hardcode correct > + # soname into executable. Probably we can add versioning support to > + # collect2, so additional links can be useful in future. > + if test "$aix_use_runtimelinking" = yes; then > + # If using run time linking (on AIX 4.2 or later) use > lib.so > + # instead of lib.a to let people know that these are not > + # typical AIX shared libraries. > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + else > + # We preserve .a as extension for shared libraries through AIX4.2 > + # and later when we are not doing run time linking. > + library_names_spec='${libname}${release}.a $libname.a' > + soname_spec='${libname}${release}${shared_ext}$major' > + fi > + shlibpath_var=LIBPATH > + fi > + ;; > + > +amigaos*) > + library_names_spec='$libname.ixlibrary $libname.a' > + # Create ${libname}_ixlibrary.a entries in /sys/libs. > + finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do > libname=`$echo "X$lib" | $Xsed -e > '\''s%^.*/\([[^/]]*\)\.ixlibrary$%\1%'\''`; test $rm > /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib > ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib > ${libname}_ixlibrary.a || exit 1; done' > + ;; > + > +beos*) > + library_names_spec='${libname}${shared_ext}' > + dynamic_linker="$host_os ld.so" > + shlibpath_var=LIBRARY_PATH > + ;; > + > +bsdi[[45]]*) > + version_type=linux > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' > + shlibpath_var=LD_LIBRARY_PATH > + sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib > /usr/contrib/lib /lib /usr/local/lib" > + sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" > + # the default ld.so.conf also contains /usr/contrib/lib and > + # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow > + # libtool to hard-code these into programs > + ;; > + > +cygwin* | mingw* | pw32*) > + version_type=windows > + shrext_cmds=".dll" > + need_version=no > + need_lib_prefix=no > + > + case $GCC,$host_os in > + yes,cygwin* | yes,mingw* | yes,pw32*) > + library_names_spec='$libname.dll.a' > + # DLL is installed to $(libdir)/../bin by postinstall_cmds > + postinstall_cmds='base_file=`basename \${file}`~ > + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i;echo > \$dlname'\''`~ > + dldir=$destdir/`dirname \$dlpath`~ > + test -d \$dldir || mkdir -p \$dldir~ > + $install_prog $dir/$dlname \$dldir/$dlname~ > + chmod a+x \$dldir/$dlname' > + postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo > \$dlname'\''`~ > + dlpath=$dir/\$dldll~ > + $rm \$dlpath' > + shlibpath_overrides_runpath=yes > + > + case $host_os in > + cygwin*) > + # Cygwin DLLs use 'cyg' prefix rather than 'lib' > + soname_spec='`echo ${libname} | sed -e 's/^lib/cyg/'``echo > ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}' > + sys_lib_search_path_spec="/usr/lib /lib/w32api /lib > /usr/local/lib" > + ;; > + mingw*) > + # MinGW DLLs use traditional 'lib' prefix > + soname_spec='${libname}`echo ${release} | $SED -e > 's/[[.]]/-/g'`${versuffix}${shared_ext}' > + sys_lib_search_path_spec=`$CC -print-search-dirs | grep > "^libraries:" | $SED -e "s/^libraries://" -e "s,=/,/,g"` > + if echo "$sys_lib_search_path_spec" | [grep ';[c-zC-Z]:/' > >/dev/null]; then > + # It is most probably a Windows format PATH printed by > + # mingw gcc, but we are running on Cygwin. Gcc prints its > search > + # path with ; separators, and with drive letters. We can handle > the > + # drive letters (cygwin fileutils understands them), so leave > them, > + # especially as we might pass files found there to a mingw > objdump, > + # which wouldn't understand a cygwinified path. Ahh. > + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | > $SED -e 's/;/ /g'` > + else > + sys_lib_search_path_spec=`echo "$sys_lib_search_path_spec" | > $SED -e "s/$PATH_SEPARATOR/ /g"` > + fi > + ;; > + pw32*) > + # pw32 DLLs use 'pw' prefix rather than 'lib' > + library_names_spec='`echo ${libname} | sed -e 's/^lib/pw/'``echo > ${release} | $SED -e 's/[[.]]/-/g'`${versuffix}${shared_ext}' > + ;; > + esac > + ;; > + > + *) > + library_names_spec='${libname}`echo ${release} | $SED -e > 's/[[.]]/-/g'`${versuffix}${shared_ext} $libname.lib' > + ;; > + esac > + dynamic_linker='Win32 ld.exe' > + # FIXME: first we should search . and the directory the executable is > in > + shlibpath_var=PATH > + ;; > + > +darwin* | rhapsody*) > + dynamic_linker="$host_os dyld" > + version_type=darwin > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${versuffix}$shared_ext > ${libname}${release}${major}$shared_ext ${libname}$shared_ext' > + soname_spec='${libname}${release}${major}$shared_ext' > + shlibpath_overrides_runpath=yes > + shlibpath_var=DYLD_LIBRARY_PATH > + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' > + # Apple's gcc prints 'gcc -print-search-dirs' doesn't operate the > same. > + if test "$GCC" = yes; then > + sys_lib_search_path_spec=`$CC -print-search-dirs | tr "\n" > "$PATH_SEPARATOR" | sed -e 's/libraries:/@libraries:/' | tr "@" "\n" | > grep "^libraries:" | sed -e "s/^libraries://" -e "s,=/,/,g" -e > "s,$PATH_SEPARATOR, ,g" -e "s,.*,& /lib /usr/lib /usr/local/lib,g"` > + else > + sys_lib_search_path_spec='/lib /usr/lib /usr/local/lib' > + fi > + sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' > + ;; > + > +dgux*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname$shared_ext' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + ;; > + > +freebsd1*) > + dynamic_linker=no > + ;; > + > +kfreebsd*-gnu) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=no > + hardcode_into_libs=yes > + dynamic_linker='GNU ld.so' > + ;; > + > +freebsd* | dragonfly*) > + # DragonFly does not have aout. When/if they implement a new > + # versioning mechanism, adjust this. > + if test -x /usr/bin/objformat; then > + objformat=`/usr/bin/objformat` > + else > + case $host_os in > + freebsd[[123]]*) objformat=aout ;; > + *) objformat=elf ;; > + esac > + fi > + version_type=freebsd-$objformat > + case $version_type in > + freebsd-elf*) > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext} $libname${shared_ext}' > + need_version=no > + need_lib_prefix=no > + ;; > + freebsd-*) > + library_names_spec='${libname}${release}${shared_ext}$versuffix > $libname${shared_ext}$versuffix' > + need_version=yes > + ;; > + esac > + shlibpath_var=LD_LIBRARY_PATH > + case $host_os in > + freebsd2*) > + shlibpath_overrides_runpath=yes > + ;; > + freebsd3.[[01]]* | freebsdelf3.[[01]]*) > + shlibpath_overrides_runpath=yes > + hardcode_into_libs=yes > + ;; > + freebsd3.[[2-9]]* | freebsdelf3.[[2-9]]* | \ > + freebsd4.[[0-5]] | freebsdelf4.[[0-5]] | freebsd4.1.1 | > freebsdelf4.1.1) > + shlibpath_overrides_runpath=no > + hardcode_into_libs=yes > + ;; > + freebsd*) # from 4.6 on > + shlibpath_overrides_runpath=yes > + hardcode_into_libs=yes > + ;; > + esac > + ;; > + > +gnu*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + hardcode_into_libs=yes > + ;; > + > +hpux9* | hpux10* | hpux11*) > + # Give a soname corresponding to the major version so that dld.sl > refuses to > + # link against other versions. > + version_type=sunos > + need_lib_prefix=no > + need_version=no > + case $host_cpu in > + ia64*) > + shrext_cmds='.so' > + hardcode_into_libs=yes > + dynamic_linker="$host_os dld.so" > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + if test "X$HPUX_IA64_MODE" = X32; then > + sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 > /usr/local/lib" > + else > + sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" > + fi > + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec > + ;; > + hppa*64*) > + shrext_cmds='.sl' > + hardcode_into_libs=yes > + dynamic_linker="$host_os dld.sl" > + shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH > + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" > + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec > + ;; > + *) > + shrext_cmds='.sl' > + dynamic_linker="$host_os dld.sl" > + shlibpath_var=SHLIB_PATH > + shlibpath_overrides_runpath=no # +s is required to enable > SHLIB_PATH > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + ;; > + esac > + # HP-UX runs *really* slowly unless shared libraries are mode 555. > + postinstall_cmds='chmod 555 $lib' > + ;; > + > +interix3*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=no > + hardcode_into_libs=yes > + ;; > + > +irix5* | irix6* | nonstopux*) > + case $host_os in > + nonstopux*) version_type=nonstopux ;; > + *) > + if test "$lt_cv_prog_gnu_ld" = yes; then > + version_type=linux > + else > + version_type=irix > + fi ;; > + esac > + need_lib_prefix=no > + need_version=no > + soname_spec='${libname}${release}${shared_ext}$major' > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major > ${libname}${release}${shared_ext} $libname${shared_ext}' > + case $host_os in > + irix5* | nonstopux*) > + libsuff= shlibsuff= > + ;; > + *) > + case $LD in # libtool.m4 will add one of these switches to LD > + *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") > + libsuff= shlibsuff= libmagic=32-bit;; > + *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") > + libsuff=32 shlibsuff=N32 libmagic=N32;; > + *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") > + libsuff=64 shlibsuff=64 libmagic=64-bit;; > + *) libsuff= shlibsuff= libmagic=never-match;; > + esac > + ;; > + esac > + shlibpath_var=LD_LIBRARY${shlibsuff}_PATH > + shlibpath_overrides_runpath=no > + sys_lib_search_path_spec="/usr/lib${libsuff} /lib${libsuff} > /usr/local/lib${libsuff}" > + sys_lib_dlsearch_path_spec="/usr/lib${libsuff} /lib${libsuff}" > + hardcode_into_libs=yes > + ;; > + > +# No shared lib support for Linux oldld, aout, or coff. > +linux*oldld* | linux*aout* | linux*coff*) > + dynamic_linker=no > + ;; > + > +# This must be Linux ELF. > +linux*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=no > + # This implies no fast_install, which is unacceptable. > + # Some rework will be needed to allow for fast_install > + # before this can be enabled. > + hardcode_into_libs=yes > + > + # find out which ABI we are using > + libsuff= > + case "$host_cpu" in > + x86_64*|s390x*|powerpc64*) > + echo '[#]line __oline__ "configure"' > conftest.$ac_ext > + if AC_TRY_EVAL(ac_compile); then > + case `/usr/bin/file conftest.$ac_objext` in > + *64-bit*) > + libsuff=64 > + sys_lib_search_path_spec="/lib${libsuff} /usr/lib${libsuff} > /usr/local/lib${libsuff}" > + ;; > + esac > + fi > + rm -rf conftest* > + ;; > + esac > + > + # Append ld.so.conf contents to the search path > + if test -f /etc/ld.so.conf; then > + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s", > \[$]2)); skip = 1; } { if (!skip) print \[$]0; skip = 0; }' < > /etc/ld.so.conf | $SED -e 's/#.*//;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= > ]* / /g;/^$/d' | tr '\n' ' '` > + sys_lib_dlsearch_path_spec="/lib${libsuff} /usr/lib${libsuff} > $lt_ld_extra" > + fi > + > + # We used to test for /lib/ld.so.1 and disable shared libraries on > + # powerpc, because MkLinux only supported shared libraries with the > + # GNU dynamic linker. Since this was broken with cross compilers, > + # most powerpc-linux boxes support dynamic linking these days and > + # people can always --disable-shared, the test was removed, and we > + # assume the GNU/Linux dynamic linker is in use. > + dynamic_linker='GNU/Linux ld.so' > + ;; > + > +knetbsd*-gnu) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=no > + hardcode_into_libs=yes > + dynamic_linker='GNU ld.so' > + ;; > + > +netbsd*) > + version_type=sunos > + need_lib_prefix=no > + need_version=no > + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${shared_ext}$versuffix' > + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' > + dynamic_linker='NetBSD (a.out) ld.so' > + else > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major ${libname}${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + dynamic_linker='NetBSD ld.elf_so' > + fi > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=yes > + hardcode_into_libs=yes > + ;; > + > +newsos6) > + version_type=linux > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=yes > + ;; > + > +nto-qnx*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=yes > + ;; > + > +openbsd*) > + version_type=sunos > + sys_lib_dlsearch_path_spec="/usr/lib" > + need_lib_prefix=no > + # Some older versions of OpenBSD (3.3 at least) *do* need versioned > libs. > + case $host_os in > + openbsd3.3 | openbsd3.3.*) need_version=yes ;; > + *) need_version=no ;; > + esac > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${shared_ext}$versuffix' > + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' > + shlibpath_var=LD_LIBRARY_PATH > + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test > "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then > + case $host_os in > + openbsd2.[[89]] | openbsd2.[[89]].*) > + shlibpath_overrides_runpath=no > + ;; > + *) > + shlibpath_overrides_runpath=yes > + ;; > + esac > + else > + shlibpath_overrides_runpath=yes > + fi > + ;; > + > +os2*) > + libname_spec='$name' > + shrext_cmds=".dll" > + need_lib_prefix=no > + library_names_spec='$libname${shared_ext} $libname.a' > + dynamic_linker='OS/2 ld.exe' > + shlibpath_var=LIBPATH > + ;; > + > +osf3* | osf4* | osf5*) > + version_type=osf > + need_lib_prefix=no > + need_version=no > + soname_spec='${libname}${release}${shared_ext}$major' > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + shlibpath_var=LD_LIBRARY_PATH > + sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc > /usr/lib /usr/local/lib /var/shlib" > + sys_lib_dlsearch_path_spec="$sys_lib_search_path_spec" > + ;; > + > +solaris*) > + version_type=linux > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=yes > + hardcode_into_libs=yes > + # ldd complains unless libraries are executable > + postinstall_cmds='chmod +x $lib' > + ;; > + > +sunos4*) > + version_type=sunos > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${shared_ext}$versuffix' > + finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' > + shlibpath_var=LD_LIBRARY_PATH > + shlibpath_overrides_runpath=yes > + if test "$with_gnu_ld" = yes; then > + need_lib_prefix=no > + fi > + need_version=yes > + ;; > + > +sysv4 | sysv4.3*) > + version_type=linux > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + case $host_vendor in > + sni) > + shlibpath_overrides_runpath=no > + need_lib_prefix=no > + export_dynamic_flag_spec='${wl}-Blargedynsym' > + runpath_var=LD_RUN_PATH > + ;; > + siemens) > + need_lib_prefix=no > + ;; > + motorola) > + need_lib_prefix=no > + need_version=no > + shlibpath_overrides_runpath=no > + sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' > + ;; > + esac > + ;; > + > +sysv4*MP*) > + if test -d /usr/nec ;then > + version_type=linux > + library_names_spec='$libname${shared_ext}.$versuffix > $libname${shared_ext}.$major $libname${shared_ext}' > + soname_spec='$libname${shared_ext}.$major' > + shlibpath_var=LD_LIBRARY_PATH > + fi > + ;; > + > +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) > + version_type=freebsd-elf > + need_lib_prefix=no > + need_version=no > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext} $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + hardcode_into_libs=yes > + if test "$with_gnu_ld" = yes; then > + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib > /usr/lib /lib' > + shlibpath_overrides_runpath=no > + else > + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' > + shlibpath_overrides_runpath=yes > + case $host_os in > + sco3.2v5*) > + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" > + ;; > + esac > + fi > + sys_lib_dlsearch_path_spec='/usr/lib' > + ;; > + > +uts4*) > + version_type=linux > + library_names_spec='${libname}${release}${shared_ext}$versuffix > ${libname}${release}${shared_ext}$major $libname${shared_ext}' > + soname_spec='${libname}${release}${shared_ext}$major' > + shlibpath_var=LD_LIBRARY_PATH > + ;; > + > +*) > + dynamic_linker=no > + ;; > +esac > +AC_MSG_RESULT([$dynamic_linker]) > +test "$dynamic_linker" = no && can_build_shared=no > + > +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" > +if test "$GCC" = yes; then > + variables_saved_for_relink="$variables_saved_for_relink > GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" > +fi > +])# AC_LIBTOOL_SYS_DYNAMIC_LINKER > + > + > +# _LT_AC_TAGCONFIG > +# ---------------- > +AC_DEFUN([_LT_AC_TAGCONFIG], > +[AC_ARG_WITH([tags], > + [AC_HELP_STRING([--with-tags@<:@=TAGS@:>@], > + [include additional configurations @<:@automatic@:>@])], > + [tagnames="$withval"]) > + > +if test -f "$ltmain" && test -n "$tagnames"; then > + if test ! -f "${ofile}"; then > + AC_MSG_WARN([output file `$ofile' does not exist]) > + fi > + > + if test -z "$LTCC"; then > + eval "`$SHELL ${ofile} --config | grep '^LTCC='`" > + if test -z "$LTCC"; then > + AC_MSG_WARN([output file `$ofile' does not look like a libtool > script]) > + else > + AC_MSG_WARN([using `LTCC=$LTCC', extracted from `$ofile']) > + fi > + fi > + if test -z "$LTCFLAGS"; then > + eval "`$SHELL ${ofile} --config | grep '^LTCFLAGS='`" > + fi > + > + # Extract list of available tagged configurations in $ofile. > + # Note that this assumes the entire list is on one line. > + available_tags=`grep "^available_tags=" "${ofile}" | $SED -e > 's/available_tags=\(.*$\)/\1/' -e 's/\"//g'` > + > + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," > + for tagname in $tagnames; do > + IFS="$lt_save_ifs" > + # Check whether tagname contains only valid characters > + case `$echo "X$tagname" | $Xsed -e > 's:[[-_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890,/] > ]::g'` in > + "") ;; > + *) AC_MSG_ERROR([invalid tag name: $tagname]) > + ;; > + esac > + > + if grep "^# ### BEGIN LIBTOOL TAG CONFIG: $tagname$" < "${ofile}" > > /dev/null > + then > + AC_MSG_ERROR([tag name \"$tagname\" already exists]) > + fi > + > + # Update the list of available tags. > + if test -n "$tagname"; then > + echo appending configuration tag \"$tagname\" to $ofile > + > + case $tagname in > + CXX) > + if test -n "$CXX" && ( test "X$CXX" != "Xno" && > + ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) || > + (test "X$CXX" != "Xg++"))) ; then > + AC_LIBTOOL_LANG_CXX_CONFIG > + else > + tagname="" > + fi > + ;; > + > + F77) > + if test -n "$F77" && test "X$F77" != "Xno"; then > + AC_LIBTOOL_LANG_F77_CONFIG > + else > + tagname="" > + fi > + ;; > + > + GCJ) > + if test -n "$GCJ" && test "X$GCJ" != "Xno"; then > + AC_LIBTOOL_LANG_GCJ_CONFIG > + else > + tagname="" > + fi > + ;; > + > + RC) > + AC_LIBTOOL_LANG_RC_CONFIG > + ;; > + > + *) > + AC_MSG_ERROR([Unsupported tag name: $tagname]) > + ;; > + esac > + > + # Append the new tag name to the list of available tags. > + if test -n "$tagname" ; then > + available_tags="$available_tags $tagname" > + fi > + fi > + done > + IFS="$lt_save_ifs" > + > + # Now substitute the updated list of available tags. > + if eval "sed -e > 's/^available_tags=.*\$/available_tags=\"$available_tags\"/' \"$ofile\" > > \"${ofile}T\""; then > + mv "${ofile}T" "$ofile" > + chmod +x "$ofile" > + else > + rm -f "${ofile}T" > + AC_MSG_ERROR([unable to update list of available tagged > configurations.]) > + fi > +fi > +])# _LT_AC_TAGCONFIG > + > + > +# AC_LIBTOOL_DLOPEN > +# ----------------- > +# enable checks for dlopen support > +AC_DEFUN([AC_LIBTOOL_DLOPEN], > + [AC_BEFORE([$0],[AC_LIBTOOL_SETUP]) > +])# AC_LIBTOOL_DLOPEN > + > + > +# AC_LIBTOOL_WIN32_DLL > +# -------------------- > +# declare package support for building win32 DLLs > +AC_DEFUN([AC_LIBTOOL_WIN32_DLL], > +[AC_BEFORE([$0], [AC_LIBTOOL_SETUP]) > +])# AC_LIBTOOL_WIN32_DLL > + > + > +# AC_ENABLE_SHARED([DEFAULT]) > +# --------------------------- > +# implement the --enable-shared flag > +# DEFAULT is either `yes' or `no'. If omitted, it defaults to `yes'. > +AC_DEFUN([AC_ENABLE_SHARED], > +[define([AC_ENABLE_SHARED_DEFAULT], ifelse($1, no, no, yes))dnl > +AC_ARG_ENABLE([shared], > + [AC_HELP_STRING([--enable-shared@<:@=PKGS@:>@], > + [build shared libraries > @<:@default=]AC_ENABLE_SHARED_DEFAULT[@:>@])], > + [p=${PACKAGE-default} > + case $enableval in > + yes) enable_shared=yes ;; > + no) enable_shared=no ;; > + *) > + enable_shared=no > + # Look at the argument we got. We use all the common list > separators. > + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," > + for pkg in $enableval; do > + IFS="$lt_save_ifs" > + if test "X$pkg" = "X$p"; then > + enable_shared=yes > + fi > + done > + IFS="$lt_save_ifs" > + ;; > + esac], > + [enable_shared=]AC_ENABLE_SHARED_DEFAULT) > +])# AC_ENABLE_SHARED > + > + > +# AC_DISABLE_SHARED > +# ----------------- > +# set the default shared flag to --disable-shared > +AC_DEFUN([AC_DISABLE_SHARED], > +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl > +AC_ENABLE_SHARED(no) > +])# AC_DISABLE_SHARED > + > + > +# AC_ENABLE_STATIC([DEFAULT]) > +# --------------------------- > +# implement the --enable-static flag > +# DEFAULT is either `yes' or `no'. If omitted, it defaults to `yes'. > +AC_DEFUN([AC_ENABLE_STATIC], > +[define([AC_ENABLE_STATIC_DEFAULT], ifelse($1, no, no, yes))dnl > +AC_ARG_ENABLE([static], > + [AC_HELP_STRING([--enable-static@<:@=PKGS@:>@], > + [build static libraries > @<:@default=]AC_ENABLE_STATIC_DEFAULT[@:>@])], > + [p=${PACKAGE-default} > + case $enableval in > + yes) enable_static=yes ;; > + no) enable_static=no ;; > + *) > + enable_static=no > + # Look at the argument we got. We use all the common list > separators. > + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," > + for pkg in $enableval; do > + IFS="$lt_save_ifs" > + if test "X$pkg" = "X$p"; then > + enable_static=yes > + fi > + done > + IFS="$lt_save_ifs" > + ;; > + esac], > + [enable_static=]AC_ENABLE_STATIC_DEFAULT) > +])# AC_ENABLE_STATIC > + > + > +# AC_DISABLE_STATIC > +# ----------------- > +# set the default static flag to --disable-static > +AC_DEFUN([AC_DISABLE_STATIC], > +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl > +AC_ENABLE_STATIC(no) > +])# AC_DISABLE_STATIC > + > + > +# AC_ENABLE_FAST_INSTALL([DEFAULT]) > +# --------------------------------- > +# implement the --enable-fast-install flag > +# DEFAULT is either `yes' or `no'. If omitted, it defaults to `yes'. > +AC_DEFUN([AC_ENABLE_FAST_INSTALL], > +[define([AC_ENABLE_FAST_INSTALL_DEFAULT], ifelse($1, no, no, yes))dnl > +AC_ARG_ENABLE([fast-install], > + [AC_HELP_STRING([--enable-fast-install@<:@=PKGS@:>@], > + [optimize for fast installation > @<:@default=]AC_ENABLE_FAST_INSTALL_DEFAULT[@:>@])], > + [p=${PACKAGE-default} > + case $enableval in > + yes) enable_fast_install=yes ;; > + no) enable_fast_install=no ;; > + *) > + enable_fast_install=no > + # Look at the argument we got. We use all the common list > separators. > + lt_save_ifs="$IFS"; IFS="${IFS}$PATH_SEPARATOR," > + for pkg in $enableval; do > + IFS="$lt_save_ifs" > + if test "X$pkg" = "X$p"; then > + enable_fast_install=yes > + fi > + done > + IFS="$lt_save_ifs" > + ;; > + esac], > + [enable_fast_install=]AC_ENABLE_FAST_INSTALL_DEFAULT) > +])# AC_ENABLE_FAST_INSTALL > + > + > +# AC_DISABLE_FAST_INSTALL > +# ----------------------- > +# set the default to --disable-fast-install > +AC_DEFUN([AC_DISABLE_FAST_INSTALL], > +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl > +AC_ENABLE_FAST_INSTALL(no) > +])# AC_DISABLE_FAST_INSTALL > + > + > +# AC_LIBTOOL_PICMODE([MODE]) > +# -------------------------- > +# implement the --with-pic flag > +# MODE is either `yes' or `no'. If omitted, it defaults to `both'. > +AC_DEFUN([AC_LIBTOOL_PICMODE], > +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl > +pic_mode=ifelse($#,1,$1,default) > +])# AC_LIBTOOL_PICMODE > + > + > +# AC_PROG_EGREP > +# ------------- > +# This is predefined starting with Autoconf 2.54, so this conditional > +# definition can be removed once we require Autoconf 2.54 or later. > +m4_ifndef([AC_PROG_EGREP], [AC_DEFUN([AC_PROG_EGREP], > +[AC_CACHE_CHECK([for egrep], [ac_cv_prog_egrep], > + [if echo a | (grep -E '(a|b)') >/dev/null 2>&1 > + then ac_cv_prog_egrep='grep -E' > + else ac_cv_prog_egrep='egrep' > + fi]) > + EGREP=$ac_cv_prog_egrep > + AC_SUBST([EGREP]) > +])]) > + > + > +# AC_PATH_TOOL_PREFIX > +# ------------------- > +# find a file program which can recognise shared library > +AC_DEFUN([AC_PATH_TOOL_PREFIX], > +[AC_REQUIRE([AC_PROG_EGREP])dnl > +AC_MSG_CHECKING([for $1]) > +AC_CACHE_VAL(lt_cv_path_MAGIC_CMD, > +[case $MAGIC_CMD in > +[[\\/*] | ?:[\\/]*]) > + lt_cv_path_MAGIC_CMD="$MAGIC_CMD" # Let the user override the test > with a path. > + ;; > +*) > + lt_save_MAGIC_CMD="$MAGIC_CMD" > + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR > +dnl $ac_dummy forces splitting on constant user-supplied paths. > +dnl POSIX.2 word splitting is done only on the output of word > expansions, > +dnl not every word. This closes a longstanding sh security hole. > + ac_dummy="ifelse([$2], , $PATH, [$2])" > + for ac_dir in $ac_dummy; do > + IFS="$lt_save_ifs" > + test -z "$ac_dir" && ac_dir=. > + if test -f $ac_dir/$1; then > + lt_cv_path_MAGIC_CMD="$ac_dir/$1" > + if test -n "$file_magic_test_file"; then > + case $deplibs_check_method in > + "file_magic "*) > + file_magic_regex=`expr "$deplibs_check_method" : "file_magic > \(.*\)"` > + MAGIC_CMD="$lt_cv_path_MAGIC_CMD" > + if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | > + $EGREP "$file_magic_regex" > /dev/null; then > + : > + else > + cat <&2 > + > +*** Warning: the command libtool uses to detect shared libraries, > +*** $file_magic_cmd, produces output that libtool cannot recognize. > +*** The result is that libtool may fail to recognize shared libraries > +*** as such. This will affect the creation of libtool libraries that > +*** depend on shared libraries, but programs linked with such libtool > +*** libraries will work regardless of this problem. Nevertheless, you > +*** may want to report the problem to your system manager and/or to > +*** bug-libtool at gnu.org > + > +EOF > + fi ;; > + esac > + fi > + break > + fi > + done > + IFS="$lt_save_ifs" > + MAGIC_CMD="$lt_save_MAGIC_CMD" > + ;; > +esac]) > +MAGIC_CMD="$lt_cv_path_MAGIC_CMD" > +if test -n "$MAGIC_CMD"; then > + AC_MSG_RESULT($MAGIC_CMD) > +else > + AC_MSG_RESULT(no) > +fi > +])# AC_PATH_TOOL_PREFIX > + > + > +# AC_PATH_MAGIC > +# ------------- > +# find a file program which can recognise a shared library > +AC_DEFUN([AC_PATH_MAGIC], > +[AC_PATH_TOOL_PREFIX(${ac_tool_prefix}file, > /usr/bin$PATH_SEPARATOR$PATH) > +if test -z "$lt_cv_path_MAGIC_CMD"; then > + if test -n "$ac_tool_prefix"; then > + AC_PATH_TOOL_PREFIX(file, /usr/bin$PATH_SEPARATOR$PATH) > + else > + MAGIC_CMD=: > + fi > +fi > +])# AC_PATH_MAGIC > + > + > +# AC_PROG_LD > +# ---------- > +# find the pathname to the GNU or non-GNU linker > +AC_DEFUN([AC_PROG_LD], > +[AC_ARG_WITH([gnu-ld], > + [AC_HELP_STRING([--with-gnu-ld], > + [assume the C compiler uses GNU ld @<:@default=no@:>@])], > + [test "$withval" = no || with_gnu_ld=yes], > + [with_gnu_ld=no]) > +AC_REQUIRE([LT_AC_PROG_SED])dnl > +AC_REQUIRE([AC_PROG_CC])dnl > +AC_REQUIRE([AC_CANONICAL_HOST])dnl > +AC_REQUIRE([AC_CANONICAL_BUILD])dnl > +ac_prog=ld > +if test "$GCC" = yes; then > + # Check if gcc -print-prog-name=ld gives a path. > + AC_MSG_CHECKING([for ld used by $CC]) > + case $host in > + *-*-mingw*) > + # gcc leaves a trailing carriage return which upsets mingw > + ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; > + *) > + ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; > + esac > + case $ac_prog in > + # Accept absolute paths. > + [[\\/]]* | ?:[[\\/]]*) > + re_direlt='/[[^/]][[^/]]*/\.\./' > + # Canonicalize the pathname of ld > + ac_prog=`echo $ac_prog| $SED 's%\\\\%/%g'` > + while echo $ac_prog | grep "$re_direlt" > /dev/null 2>&1; do > + ac_prog=`echo $ac_prog| $SED "s%$re_direlt%/%"` > + done > + test -z "$LD" && LD="$ac_prog" > + ;; > + "") > + # If it fails, then pretend we aren't using GCC. > + ac_prog=ld > + ;; > + *) > + # If it is relative, then search for the first ld in PATH. > + with_gnu_ld=unknown > + ;; > + esac > +elif test "$with_gnu_ld" = yes; then > + AC_MSG_CHECKING([for GNU ld]) > +else > + AC_MSG_CHECKING([for non-GNU ld]) > +fi > +AC_CACHE_VAL(lt_cv_path_LD, > +[if test -z "$LD"; then > + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR > + for ac_dir in $PATH; do > + IFS="$lt_save_ifs" > + test -z "$ac_dir" && ac_dir=. > + if test -f "$ac_dir/$ac_prog" || test -f > "$ac_dir/$ac_prog$ac_exeext"; then > + lt_cv_path_LD="$ac_dir/$ac_prog" > + # Check to see if the program is GNU ld. I'd rather use > --version, > + # but apparently some variants of GNU ld only accept -v. > + # Break only if it was the GNU/non-GNU ld that we prefer. > + case `"$lt_cv_path_LD" -v 2>&1 + *GNU* | *'with BFD'*) > + test "$with_gnu_ld" != no && break > + ;; > + *) > + test "$with_gnu_ld" != yes && break > + ;; > + esac > + fi > + done > + IFS="$lt_save_ifs" > +else > + lt_cv_path_LD="$LD" # Let the user override the test with a path. > +fi]) > +LD="$lt_cv_path_LD" > +if test -n "$LD"; then > + AC_MSG_RESULT($LD) > +else > + AC_MSG_RESULT(no) > +fi > +test -z "$LD" && AC_MSG_ERROR([no acceptable ld found in \$PATH]) > +AC_PROG_LD_GNU > +])# AC_PROG_LD > + > + > +# AC_PROG_LD_GNU > +# -------------- > +AC_DEFUN([AC_PROG_LD_GNU], > +[AC_REQUIRE([AC_PROG_EGREP])dnl > +AC_CACHE_CHECK([if the linker ($LD) is GNU ld], lt_cv_prog_gnu_ld, > +[# I'd rather use --version here, but apparently some GNU lds only > accept -v. > +case `$LD -v 2>&1 +*GNU* | *'with BFD'*) > + lt_cv_prog_gnu_ld=yes > + ;; > +*) > + lt_cv_prog_gnu_ld=no > + ;; > +esac]) > +with_gnu_ld=$lt_cv_prog_gnu_ld > +])# AC_PROG_LD_GNU > + > + > +# AC_PROG_LD_RELOAD_FLAG > +# ---------------------- > +# find reload flag for linker > +# -- PORTME Some linkers may need a different reload flag. > +AC_DEFUN([AC_PROG_LD_RELOAD_FLAG], > +[AC_CACHE_CHECK([for $LD option to reload object files], > + lt_cv_ld_reload_flag, > + [lt_cv_ld_reload_flag='-r']) > +reload_flag=$lt_cv_ld_reload_flag > +case $reload_flag in > +"" | " "*) ;; > +*) reload_flag=" $reload_flag" ;; > +esac > +reload_cmds='$LD$reload_flag -o $output$reload_objs' > +case $host_os in > + darwin*) > + if test "$GCC" = yes; then > + reload_cmds='$LTCC $LTCFLAGS -nostdlib ${wl}-r -o > $output$reload_objs' > + else > + reload_cmds='$LD$reload_flag -o $output$reload_objs' > + fi > + ;; > +esac > +])# AC_PROG_LD_RELOAD_FLAG > + > + > +# AC_DEPLIBS_CHECK_METHOD > +# ----------------------- > +# how to check for library dependencies > +# -- PORTME fill in with the dynamic library characteristics > +AC_DEFUN([AC_DEPLIBS_CHECK_METHOD], > +[AC_CACHE_CHECK([how to recognise dependent libraries], > +lt_cv_deplibs_check_method, > +[lt_cv_file_magic_cmd='$MAGIC_CMD' > +lt_cv_file_magic_test_file= > +lt_cv_deplibs_check_method='unknown' > +# Need to set the preceding variable on all platforms that support > +# interlibrary dependencies. > +# 'none' -- dependencies not supported. > +# `unknown' -- same as none, but documents that we really don't know. > +# 'pass_all' -- all dependencies passed with no checks. > +# 'test_compile' -- check by making test program. > +# 'file_magic [[regex]]' -- check by looking for files in library path > +# which responds to the $file_magic_cmd with a given extended regex. > +# If you have `file' or equivalent on your system and you're not sure > +# whether `pass_all' will *always* work, you probably want this one. > + > +case $host_os in > +aix4* | aix5*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +beos*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +bsdi[[45]]*) > + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit > [[ML]]SB (shared object|dynamic lib)' > + lt_cv_file_magic_cmd='/usr/bin/file -L' > + lt_cv_file_magic_test_file=/shlib/libc.so > + ;; > + > +cygwin*) > + # func_win32_libid is a shell function defined in ltmain.sh > + lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL' > + lt_cv_file_magic_cmd='func_win32_libid' > + ;; > + > +mingw* | pw32*) > + # Base MSYS/MinGW do not provide the 'file' command needed by > + # func_win32_libid shell function, so use a weaker test based on > 'objdump'. > + lt_cv_deplibs_check_method='file_magic file format > pei*-i386(.*architecture: i386)?' > + lt_cv_file_magic_cmd='$OBJDUMP -f' > + ;; > + > +darwin* | rhapsody*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +freebsd* | kfreebsd*-gnu | dragonfly*) > + if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then > + case $host_cpu in > + i*86 ) > + # Not sure whether the presence of OpenBSD here was a mistake. > + # Let's accept both of them until this is cleared up. > + lt_cv_deplibs_check_method='file_magic > (FreeBSD|OpenBSD|DragonFly)/i[[3-9]]86 (compact )?demand paged shared > library' > + lt_cv_file_magic_cmd=/usr/bin/file > + lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*` > + ;; > + esac > + else > + lt_cv_deplibs_check_method=pass_all > + fi > + ;; > + > +gnu*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +hpux10.20* | hpux11*) > + lt_cv_file_magic_cmd=/usr/bin/file > + case $host_cpu in > + ia64*) > + lt_cv_deplibs_check_method='file_magic > (s[[0-9]][[0-9]][[0-9]]|ELF-[[0-9]][[0-9]]) shared object file - IA64' > + lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so > + ;; > + hppa*64*) > + [lt_cv_deplibs_check_method='file_magic > (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - PA-RISC > [0-9].[0-9]'] > + lt_cv_file_magic_test_file=/usr/lib/pa20_64/libc.sl > + ;; > + *) > + lt_cv_deplibs_check_method='file_magic > (s[[0-9]][[0-9]][[0-9]]|PA-RISC[[0-9]].[[0-9]]) shared library' > + lt_cv_file_magic_test_file=/usr/lib/libc.sl > + ;; > + esac > + ;; > + > +interix3*) > + # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a > here > + lt_cv_deplibs_check_method='match_pattern /lib[[^/]]+(\.so|\.a)$' > + ;; > + > +irix5* | irix6* | nonstopux*) > + case $LD in > + *-32|*"-32 ") libmagic=32-bit;; > + *-n32|*"-n32 ") libmagic=N32;; > + *-64|*"-64 ") libmagic=64-bit;; > + *) libmagic=never-match;; > + esac > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +# This must be Linux ELF. > +linux*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +netbsd*) > + if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then > + lt_cv_deplibs_check_method='match_pattern > /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$' > + else > + lt_cv_deplibs_check_method='match_pattern > /lib[[^/]]+(\.so|_pic\.a)$' > + fi > + ;; > + > +newos6*) > + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit > [[ML]]SB (executable|dynamic lib)' > + lt_cv_file_magic_cmd=/usr/bin/file > + lt_cv_file_magic_test_file=/usr/lib/libnls.so > + ;; > + > +nto-qnx*) > + lt_cv_deplibs_check_method=unknown > + ;; > + > +openbsd*) > + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test > "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then > + lt_cv_deplibs_check_method='match_pattern > /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|\.so|_pic\.a)$' > + else > + lt_cv_deplibs_check_method='match_pattern > /lib[[^/]]+(\.so\.[[0-9]]+\.[[0-9]]+|_pic\.a)$' > + fi > + ;; > + > +osf3* | osf4* | osf5*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +solaris*) > + lt_cv_deplibs_check_method=pass_all > + ;; > + > +sysv4 | sysv4.3*) > + case $host_vendor in > + motorola) > + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit > [[ML]]SB (shared object|dynamic lib) M[[0-9]][[0-9]]* Version [[0-9]]' > + lt_cv_file_magic_test_file=`echo /usr/lib/libc.so*` > + ;; > + ncr) > + lt_cv_deplibs_check_method=pass_all > + ;; > + sequent) > + lt_cv_file_magic_cmd='/bin/file' > + lt_cv_deplibs_check_method='file_magic ELF [[0-9]][[0-9]]*-bit > [[LM]]SB (shared object|dynamic lib )' > + ;; > + sni) > + lt_cv_file_magic_cmd='/bin/file' > + lt_cv_deplibs_check_method="file_magic ELF [[0-9]][[0-9]]*-bit > [[LM]]SB dynamic lib" > + lt_cv_file_magic_test_file=/lib/libc.so > + ;; > + siemens) > + lt_cv_deplibs_check_method=pass_all > + ;; > + pc) > + lt_cv_deplibs_check_method=pass_all > + ;; > + esac > + ;; > + > +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) > + lt_cv_deplibs_check_method=pass_all > + ;; > +esac > +]) > +file_magic_cmd=$lt_cv_file_magic_cmd > +deplibs_check_method=$lt_cv_deplibs_check_method > +test -z "$deplibs_check_method" && deplibs_check_method=unknown > +])# AC_DEPLIBS_CHECK_METHOD > + > + > +# AC_PROG_NM > +# ---------- > +# find the pathname to a BSD-compatible name lister > +AC_DEFUN([AC_PROG_NM], > +[AC_CACHE_CHECK([for BSD-compatible nm], lt_cv_path_NM, > +[if test -n "$NM"; then > + # Let the user override the test. > + lt_cv_path_NM="$NM" > +else > + lt_nm_to_check="${ac_tool_prefix}nm" > + if test -n "$ac_tool_prefix" && test "$build" = "$host"; then > + lt_nm_to_check="$lt_nm_to_check nm" > + fi > + for lt_tmp_nm in $lt_nm_to_check; do > + lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR > + for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do > + IFS="$lt_save_ifs" > + test -z "$ac_dir" && ac_dir=. > + tmp_nm="$ac_dir/$lt_tmp_nm" > + if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext" ; then > + # Check to see if the nm accepts a BSD-compat flag. > + # Adding the `sed 1q' prevents false positives on HP-UX, which > says: > + # nm: unknown option "B" ignored > + # Tru64's nm complains that /dev/null is an invalid object file > + case `"$tmp_nm" -B /dev/null 2>&1 | sed '1q'` in > + */dev/null* | *'Invalid file or object type'*) > + lt_cv_path_NM="$tmp_nm -B" > + break > + ;; > + *) > + case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in > + */dev/null*) > + lt_cv_path_NM="$tmp_nm -p" > + break > + ;; > + *) > + lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first > match, but > + continue # so that we can try to find one that supports BSD > flags > + ;; > + esac > + ;; > + esac > + fi > + done > + IFS="$lt_save_ifs" > + done > + test -z "$lt_cv_path_NM" && lt_cv_path_NM=nm > +fi]) > +NM="$lt_cv_path_NM" > +])# AC_PROG_NM > + > + > +# AC_CHECK_LIBM > +# ------------- > +# check for math library > +AC_DEFUN([AC_CHECK_LIBM], > +[AC_REQUIRE([AC_CANONICAL_HOST])dnl > +LIBM= > +case $host in > +*-*-beos* | *-*-cygwin* | *-*-pw32* | *-*-darwin*) > + # These system don't have libm, or don't need it > + ;; > +*-ncr-sysv4.3*) > + AC_CHECK_LIB(mw, _mwvalidcheckl, LIBM="-lmw") > + AC_CHECK_LIB(m, cos, LIBM="$LIBM -lm") > + ;; > +*) > + AC_CHECK_LIB(m, cos, LIBM="-lm") > + ;; > +esac > +])# AC_CHECK_LIBM > + > + > +# AC_LIBLTDL_CONVENIENCE([DIRECTORY]) > +# ----------------------------------- > +# sets LIBLTDL to the link flags for the libltdl convenience library > and > +# LTDLINCL to the include flags for the libltdl header and adds > +# --enable-ltdl-convenience to the configure arguments. Note that > +# AC_CONFIG_SUBDIRS is not called here. If DIRECTORY is not provided, > +# it is assumed to be `libltdl'. LIBLTDL will be prefixed with > +# '${top_builddir}/' and LTDLINCL will be prefixed with > '${top_srcdir}/' > +# (note the single quotes!). If your package is not flat and you're > not > +# using automake, define top_builddir and top_srcdir appropriately in > +# the Makefiles. > +AC_DEFUN([AC_LIBLTDL_CONVENIENCE], > +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl > + case $enable_ltdl_convenience in > + no) AC_MSG_ERROR([this package needs a convenience libltdl]) ;; > + "") enable_ltdl_convenience=yes > + ac_configure_args="$ac_configure_args --enable-ltdl-convenience" > ;; > + esac > + LIBLTDL='${top_builddir}/'ifelse($#,1,[$1],['libltdl'])/libltdlc.la > + LTDLINCL='-I${top_srcdir}/'ifelse($#,1,[$1],['libltdl']) > + # For backwards non-gettext consistent compatibility... > + INCLTDL="$LTDLINCL" > +])# AC_LIBLTDL_CONVENIENCE > + > + > +# AC_LIBLTDL_INSTALLABLE([DIRECTORY]) > +# ----------------------------------- > +# sets LIBLTDL to the link flags for the libltdl installable library > and > +# LTDLINCL to the include flags for the libltdl header and adds > +# --enable-ltdl-install to the configure arguments. Note that > +# AC_CONFIG_SUBDIRS is not called here. If DIRECTORY is not provided, > +# and an installed libltdl is not found, it is assumed to be `libltdl'. > +# LIBLTDL will be prefixed with '${top_builddir}/'# and LTDLINCL with > +# '${top_srcdir}/' (note the single quotes!). If your package is not > +# flat and you're not using automake, define top_builddir and > top_srcdir > +# appropriately in the Makefiles. > +# In the future, this macro may have to be called after > AC_PROG_LIBTOOL. > +AC_DEFUN([AC_LIBLTDL_INSTALLABLE], > +[AC_BEFORE([$0],[AC_LIBTOOL_SETUP])dnl > + AC_CHECK_LIB(ltdl, lt_dlinit, > + [test x"$enable_ltdl_install" != xyes && enable_ltdl_install=no], > + [if test x"$enable_ltdl_install" = xno; then > + AC_MSG_WARN([libltdl not installed, but installation disabled]) > + else > + enable_ltdl_install=yes > + fi > + ]) > + if test x"$enable_ltdl_install" = x"yes"; then > + ac_configure_args="$ac_configure_args --enable-ltdl-install" > + LIBLTDL='${top_builddir}/'ifelse($#,1,[$1],['libltdl'])/libltdl.la > + LTDLINCL='-I${top_srcdir}/'ifelse($#,1,[$1],['libltdl']) > + else > + ac_configure_args="$ac_configure_args --enable-ltdl-install=no" > + LIBLTDL="-lltdl" > + LTDLINCL= > + fi > + # For backwards non-gettext consistent compatibility... > + INCLTDL="$LTDLINCL" > +])# AC_LIBLTDL_INSTALLABLE > + > + > +# AC_LIBTOOL_CXX > +# -------------- > +# enable support for C++ libraries > +AC_DEFUN([AC_LIBTOOL_CXX], > +[AC_REQUIRE([_LT_AC_LANG_CXX]) > +])# AC_LIBTOOL_CXX > + > + > +# _LT_AC_LANG_CXX > +# --------------- > +AC_DEFUN([_LT_AC_LANG_CXX], > +[AC_REQUIRE([AC_PROG_CXX]) > +AC_REQUIRE([_LT_AC_PROG_CXXCPP]) > +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}CXX]) > +])# _LT_AC_LANG_CXX > + > +# _LT_AC_PROG_CXXCPP > +# ------------------ > +AC_DEFUN([_LT_AC_PROG_CXXCPP], > +[ > +AC_REQUIRE([AC_PROG_CXX]) > +if test -n "$CXX" && ( test "X$CXX" != "Xno" && > + ( (test "X$CXX" = "Xg++" && `g++ -v >/dev/null 2>&1` ) || > + (test "X$CXX" != "Xg++"))) ; then > + AC_PROG_CXXCPP > +fi > +])# _LT_AC_PROG_CXXCPP > + > +# AC_LIBTOOL_F77 > +# -------------- > +# enable support for Fortran 77 libraries > +AC_DEFUN([AC_LIBTOOL_F77], > +[AC_REQUIRE([_LT_AC_LANG_F77]) > +])# AC_LIBTOOL_F77 > + > + > +# _LT_AC_LANG_F77 > +# --------------- > +AC_DEFUN([_LT_AC_LANG_F77], > +[AC_REQUIRE([AC_PROG_F77]) > +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}F77]) > +])# _LT_AC_LANG_F77 > + > + > +# AC_LIBTOOL_GCJ > +# -------------- > +# enable support for GCJ libraries > +AC_DEFUN([AC_LIBTOOL_GCJ], > +[AC_REQUIRE([_LT_AC_LANG_GCJ]) > +])# AC_LIBTOOL_GCJ > + > + > +# _LT_AC_LANG_GCJ > +# --------------- > +AC_DEFUN([_LT_AC_LANG_GCJ], > +[AC_PROVIDE_IFELSE([AC_PROG_GCJ],[], > + [AC_PROVIDE_IFELSE([A][M_PROG_GCJ],[], > + [AC_PROVIDE_IFELSE([LT_AC_PROG_GCJ],[], > + [ifdef([AC_PROG_GCJ],[AC_REQUIRE([AC_PROG_GCJ])], > + [ifdef([A][M_PROG_GCJ],[AC_REQUIRE([A][M_PROG_GCJ])], > + [AC_REQUIRE([A][C_PROG_GCJ_OR_A][M_PROG_GCJ])])])])])]) > +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}GCJ]) > +])# _LT_AC_LANG_GCJ > + > + > +# AC_LIBTOOL_RC > +# ------------- > +# enable support for Windows resource files > +AC_DEFUN([AC_LIBTOOL_RC], > +[AC_REQUIRE([LT_AC_PROG_RC]) > +_LT_AC_SHELL_INIT([tagnames=${tagnames+${tagnames},}RC]) > +])# AC_LIBTOOL_RC > + > + > +# AC_LIBTOOL_LANG_C_CONFIG > +# ------------------------ > +# Ensure that the configuration vars for the C compiler are > +# suitably defined. Those variables are subsequently used by > +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. > +AC_DEFUN([AC_LIBTOOL_LANG_C_CONFIG], [_LT_AC_LANG_C_CONFIG]) > +AC_DEFUN([_LT_AC_LANG_C_CONFIG], > +[lt_save_CC="$CC" > +AC_LANG_PUSH(C) > + > +# Source file extension for C test sources. > +ac_ext=c > + > +# Object file extension for compiled C test sources. > +objext=o > +_LT_AC_TAGVAR(objext, $1)=$objext > + > +# Code to be used in simple compile tests > +lt_simple_compile_test_code="int some_variable = 0;\n" > + > +# Code to be used in simple link tests > +lt_simple_link_test_code='int main(){return(0);}\n' > + > +_LT_AC_SYS_COMPILER > + > +# save warnings/boilerplate of simple test code > +_LT_COMPILER_BOILERPLATE > +_LT_LINKER_BOILERPLATE > + > +AC_LIBTOOL_PROG_COMPILER_NO_RTTI($1) > +AC_LIBTOOL_PROG_COMPILER_PIC($1) > +AC_LIBTOOL_PROG_CC_C_O($1) > +AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1) > +AC_LIBTOOL_PROG_LD_SHLIBS($1) > +AC_LIBTOOL_SYS_DYNAMIC_LINKER($1) > +AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1) > +AC_LIBTOOL_SYS_LIB_STRIP > +AC_LIBTOOL_DLOPEN_SELF > + > +# Report which library types will actually be built > +AC_MSG_CHECKING([if libtool supports shared libraries]) > +AC_MSG_RESULT([$can_build_shared]) > + > +AC_MSG_CHECKING([whether to build shared libraries]) > +test "$can_build_shared" = "no" && enable_shared=no > + > +# On AIX, shared libraries and static libraries use the same namespace, > and > +# are all built from PIC. > +case $host_os in > +aix3*) > + test "$enable_shared" = yes && enable_static=no > + if test -n "$RANLIB"; then > + archive_cmds="$archive_cmds~\$RANLIB \$lib" > + postinstall_cmds='$RANLIB $lib' > + fi > + ;; > + > +aix4* | aix5*) > + if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; > then > + test "$enable_shared" = yes && enable_static=no > + fi > + ;; > +esac > +AC_MSG_RESULT([$enable_shared]) > + > +AC_MSG_CHECKING([whether to build static libraries]) > +# Make sure either enable_shared or enable_static is yes. > +test "$enable_shared" = yes || enable_static=yes > +AC_MSG_RESULT([$enable_static]) > + > +AC_LIBTOOL_CONFIG($1) > + > +AC_LANG_POP > +CC="$lt_save_CC" > +])# AC_LIBTOOL_LANG_C_CONFIG > + > + > +# AC_LIBTOOL_LANG_CXX_CONFIG > +# -------------------------- > +# Ensure that the configuration vars for the C compiler are > +# suitably defined. Those variables are subsequently used by > +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. > +AC_DEFUN([AC_LIBTOOL_LANG_CXX_CONFIG], [_LT_AC_LANG_CXX_CONFIG(CXX)]) > +AC_DEFUN([_LT_AC_LANG_CXX_CONFIG], > +[AC_LANG_PUSH(C++) > +AC_REQUIRE([AC_PROG_CXX]) > +AC_REQUIRE([_LT_AC_PROG_CXXCPP]) > + > +_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > +_LT_AC_TAGVAR(allow_undefined_flag, $1)= > +_LT_AC_TAGVAR(always_export_symbols, $1)=no > +_LT_AC_TAGVAR(archive_expsym_cmds, $1)= > +_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= > +_LT_AC_TAGVAR(hardcode_direct, $1)=no > +_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= > +_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)= > +_LT_AC_TAGVAR(hardcode_libdir_separator, $1)= > +_LT_AC_TAGVAR(hardcode_minus_L, $1)=no > +_LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported > +_LT_AC_TAGVAR(hardcode_automatic, $1)=no > +_LT_AC_TAGVAR(module_cmds, $1)= > +_LT_AC_TAGVAR(module_expsym_cmds, $1)= > +_LT_AC_TAGVAR(link_all_deplibs, $1)=unknown > +_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds > +_LT_AC_TAGVAR(no_undefined_flag, $1)= > +_LT_AC_TAGVAR(whole_archive_flag_spec, $1)= > +_LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no > + > +# Dependencies to place before and after the object being linked: > +_LT_AC_TAGVAR(predep_objects, $1)= > +_LT_AC_TAGVAR(postdep_objects, $1)= > +_LT_AC_TAGVAR(predeps, $1)= > +_LT_AC_TAGVAR(postdeps, $1)= > +_LT_AC_TAGVAR(compiler_lib_search_path, $1)= > + > +# Source file extension for C++ test sources. > +ac_ext=cpp > + > +# Object file extension for compiled C++ test sources. > +objext=o > +_LT_AC_TAGVAR(objext, $1)=$objext > + > +# Code to be used in simple compile tests > +lt_simple_compile_test_code="int some_variable = 0;\n" > + > +# Code to be used in simple link tests > +lt_simple_link_test_code='int main(int, char *[[]]) { return(0); }\n' > + > +# ltmain only uses $CC for tagged configurations so make sure $CC is > set. > +_LT_AC_SYS_COMPILER > + > +# save warnings/boilerplate of simple test code > +_LT_COMPILER_BOILERPLATE > +_LT_LINKER_BOILERPLATE > + > +# Allow CC to be a program name with arguments. > +lt_save_CC=$CC > +lt_save_LD=$LD > +lt_save_GCC=$GCC > +GCC=$GXX > +lt_save_with_gnu_ld=$with_gnu_ld > +lt_save_path_LD=$lt_cv_path_LD > +if test -n "${lt_cv_prog_gnu_ldcxx+set}"; then > + lt_cv_prog_gnu_ld=$lt_cv_prog_gnu_ldcxx > +else > + $as_unset lt_cv_prog_gnu_ld > +fi > +if test -n "${lt_cv_path_LDCXX+set}"; then > + lt_cv_path_LD=$lt_cv_path_LDCXX > +else > + $as_unset lt_cv_path_LD > +fi > +test -z "${LDCXX+set}" || LD=$LDCXX > +CC=${CXX-"c++"} > +compiler=$CC > +_LT_AC_TAGVAR(compiler, $1)=$CC > +_LT_CC_BASENAME([$compiler]) > + > +# We don't want -fno-exception wen compiling C++ code, so set the > +# no_builtin_flag separately > +if test "$GXX" = yes; then > + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=' -fno-builtin' > +else > + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)= > +fi > + > +if test "$GXX" = yes; then > + # Set up default GNU C++ configuration > + > + AC_PROG_LD > + > + # Check if GNU C++ uses GNU ld as the underlying linker, since the > + # archiving commands below assume that GNU ld is being used. > + if test "$with_gnu_ld" = yes; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared -nostdlib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o > $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' > + > + # If archive_cmds runs LD, not CC, wlarc should be empty > + # XXX I think wlarc can be eliminated in ltcf-cxx, but I need to > + # investigate it a little bit more. (MM) > + wlarc='${wl}' > + > + # ancient GNU ld didn't support --whole-archive et. al. > + if eval "`$CC -print-prog-name=ld` --help 2>&1" | \ > + grep 'no-whole-archive' > /dev/null; then > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' > + else > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= > + fi > + else > + with_gnu_ld=no > + wlarc= > + > + # A generic and very simple default shared library creation > + # command for GNU C++ for the case where it uses the native > + # linker, instead of GNU ld. If possible, this setting should > + # overridden to take advantage of the native linker features on > + # the platform it is being used on. > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o > $lib' > + fi > + > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 > | grep "\-L"' > + > +else > + GXX=no > + with_gnu_ld=no > + wlarc= > +fi > + > +# PORTME: fill in a description of your system's C++ link > characteristics > +AC_MSG_CHECKING([whether the $compiler linker ($LD) supports shared > libraries]) > +_LT_AC_TAGVAR(ld_shlibs, $1)=yes > +case $host_os in > + aix3*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + aix4* | aix5*) > + if test "$host_cpu" = ia64; then > + # On IA64, the linker does run time linking by default, so we > don't > + # have to do anything special. > + aix_use_runtimelinking=no > + exp_sym_flag='-Bexport' > + no_entry_flag="" > + else > + aix_use_runtimelinking=no > + > + # Test if we are trying to use run time linking or normal > + # AIX style linking. If -brtl is somewhere in LDFLAGS, we > + # need to do runtime linking. > + case $host_os in aix4.[[23]]|aix4.[[23]].*|aix5*) > + for ld_flag in $LDFLAGS; do > + case $ld_flag in > + *-brtl*) > + aix_use_runtimelinking=yes > + break > + ;; > + esac > + done > + ;; > + esac > + > + exp_sym_flag='-bexport' > + no_entry_flag='-bnoentry' > + fi > + > + # When large executables or shared objects are built, AIX ld can > + # have problems creating the table of contents. If linking a > library > + # or program results in "error TOC overflow" add -mminimal-toc to > + # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not > + # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. > + > + _LT_AC_TAGVAR(archive_cmds, $1)='' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + > + if test "$GXX" = yes; then > + case $host_os in aix4.[[012]]|aix4.[[012]].*) > + # We only want to do this on AIX 4.2 and lower, the check > + # below for broken collect2 doesn't work under 4.3+ > + collect2name=`${CC} -print-prog-name=collect2` > + if test -f "$collect2name" && \ > + strings "$collect2name" | grep resolve_lib_name >/dev/null > + then > + # We have reworked collect2 > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + else > + # We have old collect2 > + _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported > + # It fails to find uninstalled libraries when the uninstalled > + # path is not listed in the libpath. Setting hardcode_minus_L > + # to unsupported forces relinking > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)= > + fi > + ;; > + esac > + shared_flag='-shared' > + if test "$aix_use_runtimelinking" = yes; then > + shared_flag="$shared_flag "'${wl}-G' > + fi > + else > + # not using gcc > + if test "$host_cpu" = ia64; then > + # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 > Release > + # chokes on -Wl,-G. The following line is correct: > + shared_flag='-G' > + else > + if test "$aix_use_runtimelinking" = yes; then > + shared_flag='${wl}-G' > + else > + shared_flag='${wl}-bM:SRE' > + fi > + fi > + fi > + > + # It seems that -bexpall does not export symbols beginning with > + # underscore (_), so it is better to generate a list of symbols to > export. > + _LT_AC_TAGVAR(always_export_symbols, $1)=yes > + if test "$aix_use_runtimelinking" = yes; then > + # Warning - without using the other runtime loading flags > (-brtl), > + # -berok will link without error, but may produce a broken > library. > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='-berok' > + # Determine the default libpath from the value encoded in an > empty executable. > + _LT_AC_SYS_LIBPATH_AIX > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-blibpath:$libdir:'"$aix_libpath" > + > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC"' -o > $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' > $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo > "${wl}${allow_undefined_flag}"; else :; fi` > '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" > + else > + if test "$host_cpu" = ia64; then > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R > $libdir:/usr/lib:/lib' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)="-z nodefs" > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o > $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' > $compiler_flags ${wl}${allow_undefined_flag} > '"\${wl}$exp_sym_flag:\$export_symbols" > + else > + # Determine the default libpath from the value encoded in an > empty executable. > + _LT_AC_SYS_LIBPATH_AIX > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-blibpath:$libdir:'"$aix_libpath" > + # Warning - without using the other run time loading flags, > + # -berok will link without error, but may produce a broken > library. > + _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-bernotok' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-berok' > + # Exported symbols can be pulled into shared objects from > archives > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='$convenience' > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes > + # This is similar to how AIX traditionally builds its shared > libraries. > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o > $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags > ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS > $output_objdir/$libname$release.a $output_objdir/$soname' > + fi > + fi > + ;; > + > + beos*) > + if $LD --help 2>&1 | grep ': supported targets:.* elf' > /dev/null; > then > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + # Joseph Beckenbach says some releases of gcc > + # support --undefined. This deserves some investigation. FIXME > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -nostart $libobjs $deplibs > $compiler_flags ${wl}-soname $wl$soname -o $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + > + chorus*) > + case $cc_basename in > + *) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + ;; > + > + cygwin* | mingw* | pw32*) > + # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) is actually > meaningless, > + # as there is no search path for DLLs. > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + _LT_AC_TAGVAR(always_export_symbols, $1)=no > + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes > + > + if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o > $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker > --out-implib -Xlinker $lib' > + # If the export-symbols file already is a .def file (1st line > + # is EXPORTS), use it as is; otherwise, prepend... > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='if test "x`$SED 1q > $export_symbols`" = xEXPORTS; then > + cp $export_symbols $output_objdir/$soname.def; > + else > + echo EXPORTS > $output_objdir/$soname.def; > + cat $export_symbols >> $output_objdir/$soname.def; > + fi~ > + $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects > $libobjs $deplibs $postdep_objects $compiler_flags -o > $output_objdir/$soname ${wl}--enable-auto-image-base -Xlinker > --out-implib -Xlinker $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + darwin* | rhapsody*) > + case $host_os in > + rhapsody* | darwin1.[[012]]) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined > ${wl}suppress' > + ;; > + *) # Darwin 1.3 on > + if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then > + _LT_AC_TAGVAR(allow_undefined_flag, > $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' > + else > + case ${MACOSX_DEPLOYMENT_TARGET} in > + 10.[[012]]) > + _LT_AC_TAGVAR(allow_undefined_flag, > $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' > + ;; > + 10.*) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined > ${wl}dynamic_lookup' > + ;; > + esac > + fi > + ;; > + esac > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_automatic, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='' > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + > + if test "$GXX" = yes ; then > + lt_int_apple_cc_single_mod=no > + output_verbose_link_cmd='echo' > + if $CC -dumpspecs 2>&1 | $EGREP 'single_module' >/dev/null ; then > + lt_int_apple_cc_single_mod=yes > + fi > + if test "X$lt_int_apple_cc_single_mod" = Xyes ; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -dynamiclib -single_module > $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags > -install_name $rpath/$soname $verstring' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -r -keep_private_externs > -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib > $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags > -install_name $rpath/$soname $verstring' > + fi > + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags' > + # Don't fix this by using the ld -exported_symbols_list flag, > it doesn't exist in older darwin lds > + if test "X$lt_int_apple_cc_single_mod" = Xyes ; then > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib -single_module > $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags > -install_name $rpath/$soname $verstring~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + else > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC -r -keep_private_externs > -nostdlib -o ${lib}-master.o $libobjs~$CC -dynamiclib > $allow_undefined_flag -o $lib ${lib}-master.o $deplibs $compiler_flags > -install_name $rpath/$soname $verstring~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + fi > + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + else > + case $cc_basename in > + xlc*) > + output_verbose_link_cmd='echo' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj > ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs > $compiler_flags ${wl}-install_name ${wl}`echo $rpath/$soname` > $verstring' > + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags' > + # Don't fix this by using the ld -exported_symbols_list flag, > it doesn't exist in older darwin lds > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj > ${wl}-single_module $allow_undefined_flag -o $lib $libobjs $deplibs > $compiler_flags ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit > -s $output_objdir/${libname}-symbols.expsym ${lib}' > + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + ;; > + *) > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + fi > + ;; > + > + dgux*) > + case $cc_basename in > + ec++*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + ghcx*) > + # Green Hills C++ Compiler > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + *) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + ;; > + freebsd[[12]]*) > + # C++ shared libraries reported to be fairly broken before switch > to ELF > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + freebsd-elf*) > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + ;; > + freebsd* | kfreebsd*-gnu | dragonfly*) > + # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF > + # conventions > + _LT_AC_TAGVAR(ld_shlibs, $1)=yes > + ;; > + gnu*) > + ;; > + hpux9*) > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes # Not in the search PATH, > + # but as the default > + # location of the library. > + > + case $cc_basename in > + CC*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + aCC*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC > -b ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test > $output_objdir/$soname = $lib || mv $output_objdir/$soname $lib' > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + # > + # There doesn't appear to be a way to prevent this compiler from > + # explicitly linking system object files so we need to strip them > + # from the output so that they don't get included in the library > + # dependencies. > + output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v > conftest.$objext 2>&1) | grep "[[-]]L"`; list=""; for z in $templist; do > case $z in conftest.$objext) list="$list $z";; *.$objext);; *) > list="$list $z";;esac; done; echo $list' > + ;; > + *) > + if test "$GXX" = yes; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC > -shared -nostdlib -fPIC ${wl}+b ${wl}$install_libdir -o > $output_objdir/$soname $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags~test $output_objdir/$soname = $lib || > mv $output_objdir/$soname $lib' > + else > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + esac > + ;; > + hpux10*|hpux11*) > + if test $with_gnu_ld = no; then > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + case $host_cpu in > + hppa*64*|ia64*) > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='+b $libdir' > + ;; > + *) > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + ;; > + esac > + fi > + case $host_cpu in > + hppa*64*|ia64*) > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + *) > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes # Not in the search PATH, > + # but as the default > + # location of the library. > + ;; > + esac > + > + case $cc_basename in > + CC*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + aCC*) > + case $host_cpu in > + hppa*64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname > -o $lib $predep_objects $libobjs $deplibs $postdep_objects > $compiler_flags' > + ;; > + ia64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname > ${wl}+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname > ${wl}+b ${wl}$install_libdir -o $lib $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags' > + ;; > + esac > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + # > + # There doesn't appear to be a way to prevent this compiler from > + # explicitly linking system object files so we need to strip > them > + # from the output so that they don't get included in the library > + # dependencies. > + output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v > conftest.$objext 2>&1) | grep "\-L"`; list=""; for z in $templist; do > case $z in conftest.$objext) list="$list $z";; *.$objext);; *) > list="$list $z";;esac; done; echo $list' > + ;; > + *) > + if test "$GXX" = yes; then > + if test $with_gnu_ld = no; then > + case $host_cpu in > + hppa*64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > -fPIC ${wl}+h ${wl}$soname -o $lib $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags' > + ;; > + ia64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > -fPIC ${wl}+h ${wl}$soname ${wl}+nodefaultrpath -o $lib $predep_objects > $libobjs $deplibs $postdep_objects $compiler_flags' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > -fPIC ${wl}+h ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' > + ;; > + esac > + fi > + else > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + esac > + ;; > + interix3*) > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + # Hack: On Interix 3.x, we cannot compile PIC because of a broken > gcc. > + # Instead, shared libraries are loaded at an image base (0x10000000 > by > + # default) and relocated if they conflict, which is a slow very > memory > + # consuming and fragmenting process. To avoid this, we pick a > random, > + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at > link > + # time. Moving up from 0x10000000 also allows more sbrk(2) space. > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs > $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr > ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed "s,^,_," > $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag > $libobjs $deplibs $compiler_flags ${wl}-h,$soname > ${wl}--retain-symbols-file,$output_objdir/$soname.expsym > ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` > -o $lib' > + ;; > + irix5* | irix6*) > + case $cc_basename in > + CC*) > + # SGI C++ > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -all -multigot > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > -soname $soname `test -n "$verstring" && echo -set_version $verstring` > -update_registry ${output_objdir}/so_locations -o $lib' > + > + # Archives containing C++ object files must be created using > + # "CC -ar", where "CC" is the IRIX C++ compiler. This is > + # necessary to make sure instantiated templates are included > + # in the archive. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -ar -WR,-u -o $oldlib > $oldobjs' > + ;; > + *) > + if test "$GXX" = yes; then > + if test "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname ${wl}$soname `test -n "$verstring" && echo > ${wl}-set_version ${wl}$verstring` ${wl}-update_registry > ${wl}${output_objdir}/so_locations -o $lib' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname ${wl}$soname `test -n "$verstring" && echo > ${wl}-set_version ${wl}$verstring` -o $lib' > + fi > + fi > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + ;; > + esac > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + ;; > + linux*) > + case $cc_basename in > + KCC*) > + # Kuck and Associates, Inc. (KAI) C++ Compiler > + > + # KCC will only create a shared library if the output file > + # ends with ".so" (or ".sl" for HP-UX), so rename the library > + # to its proper name (with version) after linking. > + _LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | > $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | > $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags --soname $soname -o \$templib; mv > \$templib $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='tempext=`echo > $shared_ext | $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; > templib=`echo $lib | $SED -e "s/\${tempext}\..*/.so/"`; $CC > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > --soname $soname -o \$templib ${wl}-retain-symbols-file,$export_symbols; > mv \$templib $lib' > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + # > + # There doesn't appear to be a way to prevent this compiler from > + # explicitly linking system object files so we need to strip > them > + # from the output so that they don't get included in the library > + # dependencies. > + output_verbose_link_cmd='templist=`$CC $CFLAGS -v > conftest.$objext -o libconftest$shared_ext 2>&1 | grep "ld"`; rm -f > libconftest$shared_ext; list=""; for z in $templist; do case $z in > conftest.$objext) list="$list $z";; *.$objext);; *) list="$list > $z";;esac; done; echo $list' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}--rpath,$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, > $1)='${wl}--export-dynamic' > + > + # Archives containing C++ object files must be created using > + # "CC -Bstatic", where "CC" is the KAI C++ compiler. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -Bstatic -o $oldlib > $oldobjs' > + ;; > + icpc*) > + # Intel C++ > + with_gnu_ld=yes > + # version 8.0 and above of icpc choke on multiply defined > symbols > + # if we add $predep_objects and $postdep_objects, however 7.1 > and > + # earlier do not add the objects themselves. > + case `$CC -V 2>&1` in > + *"Version 7."*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects > $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname > $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o > $lib' > + ;; > + *) # Version 8.0 or newer > + tmp_idyn= > + case $host_cpu in > + ia64*) tmp_idyn=' -i_dynamic';; > + esac > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared'"$tmp_idyn"' > $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC > -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags ${wl}-soname > $wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib' > + ;; > + esac > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-rpath,$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, > $1)='${wl}--export-dynamic' > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)='${wl}--whole-archive$convenience ${wl}--no-whole-archive' > + ;; > + pgCC*) > + # Portland Group C++ compiler > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname ${wl}$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols > -o $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, > $1)='${wl}--export-dynamic' > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test -n > \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo > \"$new_convenience\"` ${wl}--no-whole-archive' > + ;; > + cxx*) > + # Compaq C++ > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects > $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname > $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-soname $wl$soname -o $lib ${wl}-retain-symbols-file > $wl$export_symbols' > + > + runpath_var=LD_RUN_PATH > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + # > + # There doesn't appear to be a way to prevent this compiler from > + # explicitly linking system object files so we need to strip > them > + # from the output so that they don't get included in the library > + # dependencies. > + output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v > conftest.$objext 2>&1 | grep "ld"`; templist=`echo $templist | $SED > "s/\(^.*ld.*\)\( .*ld .*$\)/\1/"`; list=""; for z in $templist; do case > $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list > $z";;esac; done; echo $list' > + ;; > + esac > + ;; > + lynxos*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + m88k*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + mvs*) > + case $cc_basename in > + cxx*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + *) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + ;; > + netbsd*) > + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib > $predep_objects $libobjs $deplibs $postdep_objects $linker_flags' > + wlarc= > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + fi > + # Workaround some broken pre-1.5 toolchains > + output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext > 2>&1 | grep conftest.$objext | $SED -e "s:-lgcc -lc -lgcc::"' > + ;; > + openbsd2*) > + # C++ shared libraries are fairly broken > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + openbsd*) > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o > $lib' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath,$libdir' > + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test > "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-retain-symbols-file,$export_symbols -o $lib' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' > + fi > + output_verbose_link_cmd='echo' > + ;; > + osf3*) > + case $cc_basename in > + KCC*) > + # Kuck and Associates, Inc. (KAI) C++ Compiler > + > + # KCC will only create a shared library if the output file > + # ends with ".so" (or ".sl" for HP-UX), so rename the library > + # to its proper name (with version) after linking. > + _LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | > $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | > $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags --soname $soname -o \$templib; mv > \$templib $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-rpath,$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Archives containing C++ object files must be created using > + # "CC -Bstatic", where "CC" is the KAI C++ compiler. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -Bstatic -o $oldlib > $oldobjs' > + > + ;; > + RCC*) > + # Rational C++ 2.4.1 > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + cxx*) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' > ${wl}-expect_unresolved ${wl}\*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC > -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags ${wl}-soname $soname `test -n > "$verstring" && echo ${wl}-set_version $verstring` -update_registry > ${output_objdir}/so_locations -o $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + # > + # There doesn't appear to be a way to prevent this compiler from > + # explicitly linking system object files so we need to strip > them > + # from the output so that they don't get included in the library > + # dependencies. > + output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v > conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo > $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in > $templist; do case $z in conftest.$objext) list="$list $z";; > *.$objext);; *) list="$list $z";;esac; done; echo $list' > + ;; > + *) > + if test "$GXX" = yes && test "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' > ${wl}-expect_unresolved ${wl}\*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > ${allow_undefined_flag} $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname `test -n > "$verstring" && echo ${wl}-set_version ${wl}$verstring` > ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used > when > + # linking a shared library. > + output_verbose_link_cmd='$CC -shared $CFLAGS -v > conftest.$objext 2>&1 | grep "\-L"' > + > + else > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + esac > + ;; > + osf4* | osf5*) > + case $cc_basename in > + KCC*) > + # Kuck and Associates, Inc. (KAI) C++ Compiler > + > + # KCC will only create a shared library if the output file > + # ends with ".so" (or ".sl" for HP-UX), so rename the library > + # to its proper name (with version) after linking. > + _LT_AC_TAGVAR(archive_cmds, $1)='tempext=`echo $shared_ext | > $SED -e '\''s/\([[^()0-9A-Za-z{}]]\)/\\\\\1/g'\''`; templib=`echo $lib | > $SED -e "s/\${tempext}\..*/.so/"`; $CC $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags --soname $soname -o \$templib; mv > \$templib $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-rpath,$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Archives containing C++ object files must be created using > + # the KAI C++ compiler. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -o $oldlib $oldobjs' > + ;; > + RCC*) > + # Rational C++ 2.4.1 > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + cxx*) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC > -shared${allow_undefined_flag} $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags -msym -soname $soname `test -n > "$verstring" && echo -set_version $verstring` -update_registry > ${output_objdir}/so_locations -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='for i in `cat > $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> > $lib.exp; done~ > + echo "-hidden">> $lib.exp~ > + $CC -shared$allow_undefined_flag $predep_objects $libobjs > $deplibs $postdep_objects $compiler_flags -msym -soname $soname > -Wl,-input -Wl,$lib.exp `test -n "$verstring" && echo -set_version > $verstring` -update_registry ${output_objdir}/so_locations -o $lib~ > + $rm $lib.exp' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used when > + # linking a shared library. > + # > + # There doesn't appear to be a way to prevent this compiler from > + # explicitly linking system object files so we need to strip > them > + # from the output so that they don't get included in the library > + # dependencies. > + output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v > conftest.$objext 2>&1 | grep "ld" | grep -v "ld:"`; templist=`echo > $templist | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list=""; for z in > $templist; do case $z in conftest.$objext) list="$list $z";; > *.$objext);; *) list="$list $z";;esac; done; echo $list' > + ;; > + *) > + if test "$GXX" = yes && test "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' > ${wl}-expect_unresolved ${wl}\*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > ${allow_undefined_flag} $predep_objects $libobjs $deplibs > $postdep_objects $compiler_flags ${wl}-msym ${wl}-soname ${wl}$soname > `test -n "$verstring" && echo ${wl}-set_version ${wl}$verstring` > ${wl}-update_registry ${wl}${output_objdir}/so_locations -o $lib' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + # Commands to make compiler produce verbose output that lists > + # what "hidden" libraries, object files and flags are used > when > + # linking a shared library. > + output_verbose_link_cmd='$CC -shared $CFLAGS -v > conftest.$objext 2>&1 | grep "\-L"' > + > + else > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + esac > + ;; > + psos*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + sunos4*) > + case $cc_basename in > + CC*) > + # Sun C++ 4.x > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + lcc*) > + # Lucid > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + *) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + ;; > + solaris*) > + case $cc_basename in > + CC*) > + # Sun C++ 4.2, 5.x and Centerline C++ > + _LT_AC_TAGVAR(archive_cmds_need_lc,$1)=yes > + _LT_AC_TAGVAR(no_undefined_flag, $1)=' -zdefs' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G${allow_undefined_flag} > -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects > $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo > "local: *; };" >> $lib.exp~ > + $CC -G${allow_undefined_flag} ${wl}-M ${wl}$lib.exp -h$soname > -o $lib $predep_objects $libobjs $deplibs $postdep_objects > $compiler_flags~$rm $lib.exp' > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + case $host_os in > + solaris2.[[0-5]] | solaris2.[[0-5]].*) ;; > + *) > + # The C++ compiler is used as linker so we must use $wl > + # flag to pass the commands to the underlying system > + # linker. We must also pass each convience library through > + # to the system linker between allextract/defaultextract. > + # The C++ compiler will combine linker options so we > + # cannot just pass the convience library names through > + # without $wl. > + # Supported since Solaris 2.6 (maybe 2.5.1?) > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}-z > ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && > new_convenience=\"$new_convenience,$conv\"; done; $echo > \"$new_convenience\"` ${wl}-z ${wl}defaultextract' > + ;; > + esac > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + > + output_verbose_link_cmd='echo' > + > + # Archives containing C++ object files must be created using > + # "CC -xar", where "CC" is the Sun C++ compiler. This is > + # necessary to make sure instantiated templates are included > + # in the archive. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC -xar -o $oldlib > $oldobjs' > + ;; > + gcx*) > + # Green Hills C++ Compiler > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $predep_objects > $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-h $wl$soname -o > $lib' > + > + # The C++ compiler must be used to create the archive. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='$CC $LDFLAGS -archive -o > $oldlib $oldobjs' > + ;; > + *) > + # GNU C++ compiler with Solaris linker > + if test "$GXX" = yes && test "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-z ${wl}defs' > + if $CC --version | grep -v '^2\.7' > /dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib > $LDFLAGS $predep_objects $libobjs $deplibs $postdep_objects > $compiler_flags ${wl}-h $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo > "local: *; };" >> $lib.exp~ > + $CC -shared -nostdlib ${wl}-M $wl$lib.exp -o $lib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm > $lib.exp' > + > + # Commands to make compiler produce verbose output that > lists > + # what "hidden" libraries, object files and flags are used > when > + # linking a shared library. > + output_verbose_link_cmd="$CC -shared $CFLAGS -v > conftest.$objext 2>&1 | grep \"\-L\"" > + else > + # g++ 2.7 appears to require `-G' NOT `-shared' on this > + # platform. > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G -nostdlib $LDFLAGS > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags > ${wl}-h $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo > "local: *; };" >> $lib.exp~ > + $CC -G -nostdlib ${wl}-M $wl$lib.exp -o $lib > $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$rm > $lib.exp' > + > + # Commands to make compiler produce verbose output that > lists > + # what "hidden" libraries, object files and flags are used > when > + # linking a shared library. > + output_verbose_link_cmd="$CC -G $CFLAGS -v conftest.$objext > 2>&1 | grep \"\-L\"" > + fi > + > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R > $wl$libdir' > + fi > + ;; > + esac > + ;; > + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[[01]].[[10]]* | > unixware7* | sco3.2v5.0.[[024]]*) > + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + runpath_var='LD_RUN_PATH' > + > + case $cc_basename in > + CC*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,$soname -o $lib > $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G > ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs > $compiler_flags' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,$soname -o > $lib $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared > ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs > $compiler_flags' > + ;; > + esac > + ;; > + sysv5* | sco3.2v5* | sco5v6*) > + # Note: We can NOT use -z defs as we might desire, because we do > not > + # link with -lc, and that would cause any symbols used from libc to > + # always be unresolved, which means just about no library would > + # ever link correctly. If we're not using GNU ld we use -z text > + # though, which does catch some bad symbols but isn't as > heavy-handed > + # as -z defs. > + # For security reasons, it is highly recommended that you always > + # use absolute paths for naming shared libraries, and exclude the > + # DT_RUNPATH tag from executables and libraries. But doing so > + # requires that you compile everything twice, which is a pain. > + # So that behaviour is only enabled if SCOABSPATH is set to a > + # non-empty value in the environment. Most likely only useful for > + # creating official distributions of packages. > + # This is a hack until libtool officially supports absolute path > + # names for shared libraries. > + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-z,nodefs' > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z > "$SCOABSPATH" && echo ${wl}-R,$libdir`' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-Bexport' > + runpath_var='LD_RUN_PATH' > + > + case $cc_basename in > + CC*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G > ${wl}-Bexport:$export_symbols > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared > ${wl}-Bexport:$export_symbols > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + ;; > + esac > + ;; > + tandem*) > + case $cc_basename in > + NCC*) > + # NonStop-UX NCC 3.20 > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + *) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + ;; > + vxworks*) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + *) > + # FIXME: insert proper C++ library support > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > +esac > +AC_MSG_RESULT([$_LT_AC_TAGVAR(ld_shlibs, $1)]) > +test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no && can_build_shared=no > + > +_LT_AC_TAGVAR(GCC, $1)="$GXX" > +_LT_AC_TAGVAR(LD, $1)="$LD" > + > +AC_LIBTOOL_POSTDEP_PREDEP($1) > +AC_LIBTOOL_PROG_COMPILER_PIC($1) > +AC_LIBTOOL_PROG_CC_C_O($1) > +AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1) > +AC_LIBTOOL_PROG_LD_SHLIBS($1) > +AC_LIBTOOL_SYS_DYNAMIC_LINKER($1) > +AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1) > + > +AC_LIBTOOL_CONFIG($1) > + > +AC_LANG_POP > +CC=$lt_save_CC > +LDCXX=$LD > +LD=$lt_save_LD > +GCC=$lt_save_GCC > +with_gnu_ldcxx=$with_gnu_ld > +with_gnu_ld=$lt_save_with_gnu_ld > +lt_cv_path_LDCXX=$lt_cv_path_LD > +lt_cv_path_LD=$lt_save_path_LD > +lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld > +lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld > +])# AC_LIBTOOL_LANG_CXX_CONFIG > + > +# AC_LIBTOOL_POSTDEP_PREDEP([TAGNAME]) > +# ------------------------------------ > +# Figure out "hidden" library dependencies from verbose > +# compiler output when linking a shared library. > +# Parse the compiler output and extract the necessary > +# objects, libraries and library flags. > +AC_DEFUN([AC_LIBTOOL_POSTDEP_PREDEP],[ > +dnl we can't use the lt_simple_compile_test_code here, > +dnl because it contains code intended for an executable, > +dnl not a library. It's possible we should let each > +dnl tag define a new lt_????_link_test_code variable, > +dnl but it's only used here... > +ifelse([$1],[],[cat > conftest.$ac_ext < +int a; > +void foo (void) { a = 0; } > +EOF > +],[$1],[CXX],[cat > conftest.$ac_ext < +class Foo > +{ > +public: > + Foo (void) { a = 0; } > +private: > + int a; > +}; > +EOF > +],[$1],[F77],[cat > conftest.$ac_ext < + subroutine foo > + implicit none > + integer*4 a > + a=0 > + return > + end > +EOF > +],[$1],[GCJ],[cat > conftest.$ac_ext < +public class foo { > + private int a; > + public void bar (void) { > + a = 0; > + } > +}; > +EOF > +]) > +dnl Parse the compiler output and extract the necessary > +dnl objects, libraries and library flags. > +if AC_TRY_EVAL(ac_compile); then > + # Parse the compiler output and extract the necessary > + # objects, libraries and library flags. > + > + # Sentinel used to keep track of whether or not we are before > + # the conftest object file. > + pre_test_object_deps_done=no > + > + # The `*' in the case matches for architectures that use `case' in > + # $output_verbose_cmd can trigger glob expansion during the loop > + # eval without this substitution. > + output_verbose_link_cmd=`$echo "X$output_verbose_link_cmd" | $Xsed -e > "$no_glob_subst"` > + > + for p in `eval $output_verbose_link_cmd`; do > + case $p in > + > + -L* | -R* | -l*) > + # Some compilers place space between "-{L,R}" and the path. > + # Remove the space. > + if test $p = "-L" \ > + || test $p = "-R"; then > + prev=$p > + continue > + else > + prev= > + fi > + > + if test "$pre_test_object_deps_done" = no; then > + case $p in > + -L* | -R*) > + # Internal compiler library paths should come after those > + # provided the user. The postdeps already come after the > + # user supplied libs so there is no need to process them. > + if test -z "$_LT_AC_TAGVAR(compiler_lib_search_path, $1)"; > then > + _LT_AC_TAGVAR(compiler_lib_search_path, $1)="${prev}${p}" > + else > + _LT_AC_TAGVAR(compiler_lib_search_path, > $1)="${_LT_AC_TAGVAR(compiler_lib_search_path, $1)} ${prev}${p}" > + fi > + ;; > + # The "-l" case would never come before the object being > + # linked, so don't bother handling this case. > + esac > + else > + if test -z "$_LT_AC_TAGVAR(postdeps, $1)"; then > + _LT_AC_TAGVAR(postdeps, $1)="${prev}${p}" > + else > + _LT_AC_TAGVAR(postdeps, $1)="${_LT_AC_TAGVAR(postdeps, $1)} > ${prev}${p}" > + fi > + fi > + ;; > + > + *.$objext) > + # This assumes that the test object file only shows up > + # once in the compiler output. > + if test "$p" = "conftest.$objext"; then > + pre_test_object_deps_done=yes > + continue > + fi > + > + if test "$pre_test_object_deps_done" = no; then > + if test -z "$_LT_AC_TAGVAR(predep_objects, $1)"; then > + _LT_AC_TAGVAR(predep_objects, $1)="$p" > + else > + _LT_AC_TAGVAR(predep_objects, > $1)="$_LT_AC_TAGVAR(predep_objects, $1) $p" > + fi > + else > + if test -z "$_LT_AC_TAGVAR(postdep_objects, $1)"; then > + _LT_AC_TAGVAR(postdep_objects, $1)="$p" > + else > + _LT_AC_TAGVAR(postdep_objects, > $1)="$_LT_AC_TAGVAR(postdep_objects, $1) $p" > + fi > + fi > + ;; > + > + *) ;; # Ignore the rest. > + > + esac > + done > + > + # Clean up. > + rm -f a.out a.exe > +else > + echo "libtool.m4: error: problem compiling $1 test program" > +fi > + > +$rm -f confest.$objext > + > +# PORTME: override above test on systems where it is broken > +ifelse([$1],[CXX], > +[case $host_os in > +interix3*) > + # Interix 3.5 installs completely hosed .la files for C++, so rather > than > + # hack all around it, let's just trust "g++" to DTRT. > + _LT_AC_TAGVAR(predep_objects,$1)= > + _LT_AC_TAGVAR(postdep_objects,$1)= > + _LT_AC_TAGVAR(postdeps,$1)= > + ;; > + > +solaris*) > + case $cc_basename in > + CC*) > + # Adding this requires a known-good setup of shared libraries for > + # Sun compiler versions before 5.6, else PIC objects from an old > + # archive will be linked into the output, leading to subtle bugs. > + _LT_AC_TAGVAR(postdeps,$1)='-lCstd -lCrun' > + ;; > + esac > + ;; > +esac > +]) > + > +case " $_LT_AC_TAGVAR(postdeps, $1) " in > +*" -lc "*) _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no ;; > +esac > +])# AC_LIBTOOL_POSTDEP_PREDEP > + > +# AC_LIBTOOL_LANG_F77_CONFIG > +# -------------------------- > +# Ensure that the configuration vars for the C compiler are > +# suitably defined. Those variables are subsequently used by > +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. > +AC_DEFUN([AC_LIBTOOL_LANG_F77_CONFIG], [_LT_AC_LANG_F77_CONFIG(F77)]) > +AC_DEFUN([_LT_AC_LANG_F77_CONFIG], > +[AC_REQUIRE([AC_PROG_F77]) > +AC_LANG_PUSH(Fortran 77) > + > +_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > +_LT_AC_TAGVAR(allow_undefined_flag, $1)= > +_LT_AC_TAGVAR(always_export_symbols, $1)=no > +_LT_AC_TAGVAR(archive_expsym_cmds, $1)= > +_LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= > +_LT_AC_TAGVAR(hardcode_direct, $1)=no > +_LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= > +_LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)= > +_LT_AC_TAGVAR(hardcode_libdir_separator, $1)= > +_LT_AC_TAGVAR(hardcode_minus_L, $1)=no > +_LT_AC_TAGVAR(hardcode_automatic, $1)=no > +_LT_AC_TAGVAR(module_cmds, $1)= > +_LT_AC_TAGVAR(module_expsym_cmds, $1)= > +_LT_AC_TAGVAR(link_all_deplibs, $1)=unknown > +_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds > +_LT_AC_TAGVAR(no_undefined_flag, $1)= > +_LT_AC_TAGVAR(whole_archive_flag_spec, $1)= > +_LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no > + > +# Source file extension for f77 test sources. > +ac_ext=f > + > +# Object file extension for compiled f77 test sources. > +objext=o > +_LT_AC_TAGVAR(objext, $1)=$objext > + > +# Code to be used in simple compile tests > +lt_simple_compile_test_code=" subroutine t\n return\n > end\n" > + > +# Code to be used in simple link tests > +lt_simple_link_test_code=" program t\n end\n" > + > +# ltmain only uses $CC for tagged configurations so make sure $CC is > set. > +_LT_AC_SYS_COMPILER > + > +# save warnings/boilerplate of simple test code > +_LT_COMPILER_BOILERPLATE > +_LT_LINKER_BOILERPLATE > + > +# Allow CC to be a program name with arguments. > +lt_save_CC="$CC" > +CC=${F77-"f77"} > +compiler=$CC > +_LT_AC_TAGVAR(compiler, $1)=$CC > +_LT_CC_BASENAME([$compiler]) > + > +AC_MSG_CHECKING([if libtool supports shared libraries]) > +AC_MSG_RESULT([$can_build_shared]) > + > +AC_MSG_CHECKING([whether to build shared libraries]) > +test "$can_build_shared" = "no" && enable_shared=no > + > +# On AIX, shared libraries and static libraries use the same namespace, > and > +# are all built from PIC. > +case $host_os in > +aix3*) > + test "$enable_shared" = yes && enable_static=no > + if test -n "$RANLIB"; then > + archive_cmds="$archive_cmds~\$RANLIB \$lib" > + postinstall_cmds='$RANLIB $lib' > + fi > + ;; > +aix4* | aix5*) > + if test "$host_cpu" != ia64 && test "$aix_use_runtimelinking" = no ; > then > + test "$enable_shared" = yes && enable_static=no > + fi > + ;; > +esac > +AC_MSG_RESULT([$enable_shared]) > + > +AC_MSG_CHECKING([whether to build static libraries]) > +# Make sure either enable_shared or enable_static is yes. > +test "$enable_shared" = yes || enable_static=yes > +AC_MSG_RESULT([$enable_static]) > + > +_LT_AC_TAGVAR(GCC, $1)="$G77" > +_LT_AC_TAGVAR(LD, $1)="$LD" > + > +AC_LIBTOOL_PROG_COMPILER_PIC($1) > +AC_LIBTOOL_PROG_CC_C_O($1) > +AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1) > +AC_LIBTOOL_PROG_LD_SHLIBS($1) > +AC_LIBTOOL_SYS_DYNAMIC_LINKER($1) > +AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1) > + > +AC_LIBTOOL_CONFIG($1) > + > +AC_LANG_POP > +CC="$lt_save_CC" > +])# AC_LIBTOOL_LANG_F77_CONFIG > + > + > +# AC_LIBTOOL_LANG_GCJ_CONFIG > +# -------------------------- > +# Ensure that the configuration vars for the C compiler are > +# suitably defined. Those variables are subsequently used by > +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. > +AC_DEFUN([AC_LIBTOOL_LANG_GCJ_CONFIG], [_LT_AC_LANG_GCJ_CONFIG(GCJ)]) > +AC_DEFUN([_LT_AC_LANG_GCJ_CONFIG], > +[AC_LANG_SAVE > + > +# Source file extension for Java test sources. > +ac_ext=java > + > +# Object file extension for compiled Java test sources. > +objext=o > +_LT_AC_TAGVAR(objext, $1)=$objext > + > +# Code to be used in simple compile tests > +lt_simple_compile_test_code="class foo {}\n" > + > +# Code to be used in simple link tests > +lt_simple_link_test_code='public class conftest { public static void > main(String[[]] argv) {}; }\n' > + > +# ltmain only uses $CC for tagged configurations so make sure $CC is > set. > +_LT_AC_SYS_COMPILER > + > +# save warnings/boilerplate of simple test code > +_LT_COMPILER_BOILERPLATE > +_LT_LINKER_BOILERPLATE > + > +# Allow CC to be a program name with arguments. > +lt_save_CC="$CC" > +CC=${GCJ-"gcj"} > +compiler=$CC > +_LT_AC_TAGVAR(compiler, $1)=$CC > +_LT_CC_BASENAME([$compiler]) > + > +# GCJ did not exist at the time GCC didn't implicitly link libc in. > +_LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + > +_LT_AC_TAGVAR(old_archive_cmds, $1)=$old_archive_cmds > + > +AC_LIBTOOL_PROG_COMPILER_NO_RTTI($1) > +AC_LIBTOOL_PROG_COMPILER_PIC($1) > +AC_LIBTOOL_PROG_CC_C_O($1) > +AC_LIBTOOL_SYS_HARD_LINK_LOCKS($1) > +AC_LIBTOOL_PROG_LD_SHLIBS($1) > +AC_LIBTOOL_SYS_DYNAMIC_LINKER($1) > +AC_LIBTOOL_PROG_LD_HARDCODE_LIBPATH($1) > + > +AC_LIBTOOL_CONFIG($1) > + > +AC_LANG_RESTORE > +CC="$lt_save_CC" > +])# AC_LIBTOOL_LANG_GCJ_CONFIG > + > + > +# AC_LIBTOOL_LANG_RC_CONFIG > +# ------------------------- > +# Ensure that the configuration vars for the Windows resource compiler > are > +# suitably defined. Those variables are subsequently used by > +# AC_LIBTOOL_CONFIG to write the compiler configuration to `libtool'. > +AC_DEFUN([AC_LIBTOOL_LANG_RC_CONFIG], [_LT_AC_LANG_RC_CONFIG(RC)]) > +AC_DEFUN([_LT_AC_LANG_RC_CONFIG], > +[AC_LANG_SAVE > + > +# Source file extension for RC test sources. > +ac_ext=rc > + > +# Object file extension for compiled RC test sources. > +objext=o > +_LT_AC_TAGVAR(objext, $1)=$objext > + > +# Code to be used in simple compile tests > +lt_simple_compile_test_code='sample MENU { MENUITEM "&Soup", 100, > CHECKED }\n' > + > +# Code to be used in simple link tests > +lt_simple_link_test_code="$lt_simple_compile_test_code" > + > +# ltmain only uses $CC for tagged configurations so make sure $CC is > set. > +_LT_AC_SYS_COMPILER > + > +# save warnings/boilerplate of simple test code > +_LT_COMPILER_BOILERPLATE > +_LT_LINKER_BOILERPLATE > + > +# Allow CC to be a program name with arguments. > +lt_save_CC="$CC" > +CC=${RC-"windres"} > +compiler=$CC > +_LT_AC_TAGVAR(compiler, $1)=$CC > +_LT_CC_BASENAME([$compiler]) > +_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1)=yes > + > +AC_LIBTOOL_CONFIG($1) > + > +AC_LANG_RESTORE > +CC="$lt_save_CC" > +])# AC_LIBTOOL_LANG_RC_CONFIG > + > + > +# AC_LIBTOOL_CONFIG([TAGNAME]) > +# ---------------------------- > +# If TAGNAME is not passed, then create an initial libtool script > +# with a default configuration from the untagged config vars. > Otherwise > +# add code to config.status for appending the configuration named by > +# TAGNAME from the matching tagged config vars. > +AC_DEFUN([AC_LIBTOOL_CONFIG], > +[# The else clause should only fire when bootstrapping the > +# libtool distribution, otherwise you forgot to ship ltmain.sh > +# with your package, and you will get complaints that there are > +# no rules to generate ltmain.sh. > +if test -f "$ltmain"; then > + # See if we are running on zsh, and set the options which allow our > commands through > + # without removal of \ escapes. > + if test -n "${ZSH_VERSION+set}" ; then > + setopt NO_GLOB_SUBST > + fi > + # Now quote all the things that may contain metacharacters while > being > + # careful not to overquote the AC_SUBSTed values. We take copies of > the > + # variables and quote the copies for generation of the libtool > script. > + for var in echo old_CC old_CFLAGS AR AR_FLAGS EGREP RANLIB LN_S LTCC > LTCFLAGS NM \ > + SED SHELL STRIP \ > + libname_spec library_names_spec soname_spec extract_expsyms_cmds \ > + old_striplib striplib file_magic_cmd finish_cmds finish_eval \ > + deplibs_check_method reload_flag reload_cmds need_locks \ > + lt_cv_sys_global_symbol_pipe lt_cv_sys_global_symbol_to_cdecl \ > + lt_cv_sys_global_symbol_to_c_name_address \ > + sys_lib_search_path_spec sys_lib_dlsearch_path_spec \ > + old_postinstall_cmds old_postuninstall_cmds \ > + _LT_AC_TAGVAR(compiler, $1) \ > + _LT_AC_TAGVAR(CC, $1) \ > + _LT_AC_TAGVAR(LD, $1) \ > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1) \ > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1) \ > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1) \ > + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1) \ > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1) \ > + _LT_AC_TAGVAR(thread_safe_flag_spec, $1) \ > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1) \ > + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1) \ > + _LT_AC_TAGVAR(old_archive_cmds, $1) \ > + _LT_AC_TAGVAR(old_archive_from_new_cmds, $1) \ > + _LT_AC_TAGVAR(predep_objects, $1) \ > + _LT_AC_TAGVAR(postdep_objects, $1) \ > + _LT_AC_TAGVAR(predeps, $1) \ > + _LT_AC_TAGVAR(postdeps, $1) \ > + _LT_AC_TAGVAR(compiler_lib_search_path, $1) \ > + _LT_AC_TAGVAR(archive_cmds, $1) \ > + _LT_AC_TAGVAR(archive_expsym_cmds, $1) \ > + _LT_AC_TAGVAR(postinstall_cmds, $1) \ > + _LT_AC_TAGVAR(postuninstall_cmds, $1) \ > + _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1) \ > + _LT_AC_TAGVAR(allow_undefined_flag, $1) \ > + _LT_AC_TAGVAR(no_undefined_flag, $1) \ > + _LT_AC_TAGVAR(export_symbols_cmds, $1) \ > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) \ > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1) \ > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1) \ > + _LT_AC_TAGVAR(hardcode_automatic, $1) \ > + _LT_AC_TAGVAR(module_cmds, $1) \ > + _LT_AC_TAGVAR(module_expsym_cmds, $1) \ > + _LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1) \ > + _LT_AC_TAGVAR(exclude_expsyms, $1) \ > + _LT_AC_TAGVAR(include_expsyms, $1); do > + > + case $var in > + _LT_AC_TAGVAR(old_archive_cmds, $1) | \ > + _LT_AC_TAGVAR(old_archive_from_new_cmds, $1) | \ > + _LT_AC_TAGVAR(archive_cmds, $1) | \ > + _LT_AC_TAGVAR(archive_expsym_cmds, $1) | \ > + _LT_AC_TAGVAR(module_cmds, $1) | \ > + _LT_AC_TAGVAR(module_expsym_cmds, $1) | \ > + _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1) | \ > + _LT_AC_TAGVAR(export_symbols_cmds, $1) | \ > + extract_expsyms_cmds | reload_cmds | finish_cmds | \ > + postinstall_cmds | postuninstall_cmds | \ > + old_postinstall_cmds | old_postuninstall_cmds | \ > + sys_lib_search_path_spec | sys_lib_dlsearch_path_spec) > + # Double-quote double-evaled strings. > + eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e > \"\$double_quote_subst\" -e \"\$sed_quote_subst\" -e > \"\$delay_variable_subst\"\`\\\"" > + ;; > + *) > + eval "lt_$var=\\\"\`\$echo \"X\$$var\" | \$Xsed -e > \"\$sed_quote_subst\"\`\\\"" > + ;; > + esac > + done > + > + case $lt_echo in > + *'\[$]0 --fallback-echo"') > + lt_echo=`$echo "X$lt_echo" | $Xsed -e 's/\\\\\\\[$]0 > --fallback-echo"[$]/[$]0 --fallback-echo"/'` > + ;; > + esac > + > +ifelse([$1], [], > + [cfgfile="${ofile}T" > + trap "$rm \"$cfgfile\"; exit 1" 1 2 15 > + $rm -f "$cfgfile" > + AC_MSG_NOTICE([creating $ofile])], > + [cfgfile="$ofile"]) > + > + cat <<__EOF__ >> "$cfgfile" > +ifelse([$1], [], > +[#! $SHELL > + > +# `$echo "$cfgfile" | sed 's%^.*/%%'` - Provide generalized > library-building support services. > +# Generated automatically by $PROGRAM (GNU $PACKAGE $VERSION$TIMESTAMP) > +# NOTE: Changes made to this file will be lost: look at ltmain.sh. > +# > +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001 > +# Free Software Foundation, Inc. > +# > +# This file is part of GNU Libtool: > +# Originally by Gordon Matzigkeit , 1996 > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 2 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, but > +# WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write to the Free Software > +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301, USA. > +# > +# As a special exception to the GNU General Public License, if you > +# distribute this file as part of a program that contains a > +# configuration script generated by Autoconf, you may include it under > +# the same distribution terms that you use for the rest of that > program. > + > +# A sed program that does not truncate output. > +SED=$lt_SED > + > +# Sed that helps us avoid accidentally triggering echo(1) options like > -n. > +Xsed="$SED -e 1s/^X//" > + > +# The HP-UX ksh and POSIX shell print the target directory to stdout > +# if CDPATH is set. > +(unset CDPATH) >/dev/null 2>&1 && unset CDPATH > + > +# The names of the tagged configurations supported by this script. > +available_tags= > + > +# ### BEGIN LIBTOOL CONFIG], > +[# ### BEGIN LIBTOOL TAG CONFIG: $tagname]) > + > +# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | > sed 1q`: > + > +# Shell to use when invoking shell scripts. > +SHELL=$lt_SHELL > + > +# Whether or not to build shared libraries. > +build_libtool_libs=$enable_shared > + > +# Whether or not to build static libraries. > +build_old_libs=$enable_static > + > +# Whether or not to add -lc for building shared libraries. > +build_libtool_need_lc=$_LT_AC_TAGVAR(archive_cmds_need_lc, $1) > + > +# Whether or not to disallow shared libs when runtime libs are static > +allow_libtool_libs_with_static_runtimes=$_LT_AC_TAGVAR(enable_shared_wi > th_static_runtimes, $1) > + > +# Whether or not to optimize for fast installation. > +fast_install=$enable_fast_install > + > +# The host system. > +host_alias=$host_alias > +host=$host > +host_os=$host_os > + > +# The build system. > +build_alias=$build_alias > +build=$build > +build_os=$build_os > + > +# An echo program that does not interpret backslashes. > +echo=$lt_echo > + > +# The archiver. > +AR=$lt_AR > +AR_FLAGS=$lt_AR_FLAGS > + > +# A C compiler. > +LTCC=$lt_LTCC > + > +# LTCC compiler flags. > +LTCFLAGS=$lt_LTCFLAGS > + > +# A language-specific compiler. > +CC=$lt_[]_LT_AC_TAGVAR(compiler, $1) > + > +# Is the compiler the GNU C compiler? > +with_gcc=$_LT_AC_TAGVAR(GCC, $1) > + > +# An ERE matcher. > +EGREP=$lt_EGREP > + > +# The linker used to build libraries. > +LD=$lt_[]_LT_AC_TAGVAR(LD, $1) > + > +# Whether we need hard or soft links. > +LN_S=$lt_LN_S > + > +# A BSD-compatible nm program. > +NM=$lt_NM > + > +# A symbol stripping program > +STRIP=$lt_STRIP > + > +# Used to examine libraries when file_magic_cmd begins "file" > +MAGIC_CMD=$MAGIC_CMD > + > +# Used on cygwin: DLL creation program. > +DLLTOOL="$DLLTOOL" > + > +# Used on cygwin: object dumper. > +OBJDUMP="$OBJDUMP" > + > +# Used on cygwin: assembler. > +AS="$AS" > + > +# The name of the directory that contains temporary libtool files. > +objdir=$objdir > + > +# How to create reloadable object files. > +reload_flag=$lt_reload_flag > +reload_cmds=$lt_reload_cmds > + > +# How to pass a linker flag through the compiler. > +wl=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) > + > +# Object file suffix (normally "o"). > +objext="$ac_objext" > + > +# Old archive suffix (normally "a"). > +libext="$libext" > + > +# Shared library suffix (normally ".so"). > +shrext_cmds='$shrext_cmds' > + > +# Executable file suffix (normally ""). > +exeext="$exeext" > + > +# Additional compiler flags for building library objects. > +pic_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) > +pic_mode=$pic_mode > + > +# What is the maximum length of a command? > +max_cmd_len=$lt_cv_sys_max_cmd_len > + > +# Does compiler simultaneously support -c and -o options? > +compiler_c_o=$lt_[]_LT_AC_TAGVAR(lt_cv_prog_compiler_c_o, $1) > + > +# Must we lock files when doing compilation? > +need_locks=$lt_need_locks > + > +# Do we need the lib prefix for modules? > +need_lib_prefix=$need_lib_prefix > + > +# Do we need a version for libraries? > +need_version=$need_version > + > +# Whether dlopen is supported. > +dlopen_support=$enable_dlopen > + > +# Whether dlopen of programs is supported. > +dlopen_self=$enable_dlopen_self > + > +# Whether dlopen of statically linked programs is supported. > +dlopen_self_static=$enable_dlopen_self_static > + > +# Compiler flag to prevent dynamic linking. > +link_static_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_static, $1) > + > +# Compiler flag to turn off builtin functions. > +no_builtin_flag=$lt_[]_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, > $1) > + > +# Compiler flag to allow reflexive dlopens. > +export_dynamic_flag_spec=$lt_[]_LT_AC_TAGVAR(export_dynamic_flag_spec, > $1) > + > +# Compiler flag to generate shared objects directly from archives. > +whole_archive_flag_spec=$lt_[]_LT_AC_TAGVAR(whole_archive_flag_spec, > $1) > + > +# Compiler flag to generate thread-safe objects. > +thread_safe_flag_spec=$lt_[]_LT_AC_TAGVAR(thread_safe_flag_spec, $1) > + > +# Library versioning type. > +version_type=$version_type > + > +# Format of library name prefix. > +libname_spec=$lt_libname_spec > + > +# List of archive names. First name is the real one, the rest are > links. > +# The last name is the one that the linker finds with -lNAME. > +library_names_spec=$lt_library_names_spec > + > +# The coded name of the library, if different from the real name. > +soname_spec=$lt_soname_spec > + > +# Commands used to build and install an old-style archive. > +RANLIB=$lt_RANLIB > +old_archive_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_cmds, $1) > +old_postinstall_cmds=$lt_old_postinstall_cmds > +old_postuninstall_cmds=$lt_old_postuninstall_cmds > + > +# Create an old-style archive from a shared archive. > +old_archive_from_new_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_from_new_cmds > , $1) > + > +# Create a temporary old-style archive to link instead of a shared > archive. > +old_archive_from_expsyms_cmds=$lt_[]_LT_AC_TAGVAR(old_archive_from_exps > yms_cmds, $1) > + > +# Commands used to build and install a shared archive. > +archive_cmds=$lt_[]_LT_AC_TAGVAR(archive_cmds, $1) > +archive_expsym_cmds=$lt_[]_LT_AC_TAGVAR(archive_expsym_cmds, $1) > +postinstall_cmds=$lt_postinstall_cmds > +postuninstall_cmds=$lt_postuninstall_cmds > + > +# Commands used to build a loadable module (assumed same as above if > empty) > +module_cmds=$lt_[]_LT_AC_TAGVAR(module_cmds, $1) > +module_expsym_cmds=$lt_[]_LT_AC_TAGVAR(module_expsym_cmds, $1) > + > +# Commands to strip libraries. > +old_striplib=$lt_old_striplib > +striplib=$lt_striplib > + > +# Dependencies to place before the objects being linked to create a > +# shared library. > +predep_objects=$lt_[]_LT_AC_TAGVAR(predep_objects, $1) > + > +# Dependencies to place after the objects being linked to create a > +# shared library. > +postdep_objects=$lt_[]_LT_AC_TAGVAR(postdep_objects, $1) > + > +# Dependencies to place before the objects being linked to create a > +# shared library. > +predeps=$lt_[]_LT_AC_TAGVAR(predeps, $1) > + > +# Dependencies to place after the objects being linked to create a > +# shared library. > +postdeps=$lt_[]_LT_AC_TAGVAR(postdeps, $1) > + > +# The library search path used internally by the compiler when linking > +# a shared library. > +compiler_lib_search_path=$lt_[]_LT_AC_TAGVAR(compiler_lib_search_path, > $1) > + > +# Method to check whether dependent libraries are shared objects. > +deplibs_check_method=$lt_deplibs_check_method > + > +# Command to use when deplibs_check_method == file_magic. > +file_magic_cmd=$lt_file_magic_cmd > + > +# Flag that allows shared libraries with undefined symbols to be built. > +allow_undefined_flag=$lt_[]_LT_AC_TAGVAR(allow_undefined_flag, $1) > + > +# Flag that forces no undefined symbols. > +no_undefined_flag=$lt_[]_LT_AC_TAGVAR(no_undefined_flag, $1) > + > +# Commands used to finish a libtool library installation in a > directory. > +finish_cmds=$lt_finish_cmds > + > +# Same as above, but a single script fragment to be evaled but not > shown. > +finish_eval=$lt_finish_eval > + > +# Take the output of nm and produce a listing of raw symbols and C > names. > +global_symbol_pipe=$lt_lt_cv_sys_global_symbol_pipe > + > +# Transform the output of nm in a proper C declaration > +global_symbol_to_cdecl=$lt_lt_cv_sys_global_symbol_to_cdecl > + > +# Transform the output of nm in a C name address pair > +global_symbol_to_c_name_address=$lt_lt_cv_sys_global_symbol_to_c_name_a > ddress > + > +# This is the shared library runtime path variable. > +runpath_var=$runpath_var > + > +# This is the shared library path variable. > +shlibpath_var=$shlibpath_var > + > +# Is shlibpath searched before the hard-coded library search path? > +shlibpath_overrides_runpath=$shlibpath_overrides_runpath > + > +# How to hardcode a shared library path into an executable. > +hardcode_action=$_LT_AC_TAGVAR(hardcode_action, $1) > + > +# Whether we should hardcode library paths into libraries. > +hardcode_into_libs=$hardcode_into_libs > + > +# Flag to hardcode \$libdir into a binary during linking. > +# This must work even if \$libdir does not exist. > +hardcode_libdir_flag_spec=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_flag_spec > , $1) > + > +# If ld is used when linking, flag to hardcode \$libdir into > +# a binary during linking. This must work even if \$libdir does > +# not exist. > +hardcode_libdir_flag_spec_ld=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_flag_s > pec_ld, $1) > + > +# Whether we need a single -rpath flag with a separated argument. > +hardcode_libdir_separator=$lt_[]_LT_AC_TAGVAR(hardcode_libdir_separator > , $1) > + > +# Set to yes if using DIR/libNAME${shared_ext} during linking hardcodes > DIR into the > +# resulting binary. > +hardcode_direct=$_LT_AC_TAGVAR(hardcode_direct, $1) > + > +# Set to yes if using the -LDIR flag during linking hardcodes DIR into > the > +# resulting binary. > +hardcode_minus_L=$_LT_AC_TAGVAR(hardcode_minus_L, $1) > + > +# Set to yes if using SHLIBPATH_VAR=DIR during linking hardcodes DIR > into > +# the resulting binary. > +hardcode_shlibpath_var=$_LT_AC_TAGVAR(hardcode_shlibpath_var, $1) > + > +# Set to yes if building a shared library automatically hardcodes DIR > into the library > +# and all subsequent libraries and executables linked against it. > +hardcode_automatic=$_LT_AC_TAGVAR(hardcode_automatic, $1) > + > +# Variables whose values should be saved in libtool wrapper scripts and > +# restored at relink time. > +variables_saved_for_relink="$variables_saved_for_relink" > + > +# Whether libtool must link a program against all its dependency > libraries. > +link_all_deplibs=$_LT_AC_TAGVAR(link_all_deplibs, $1) > + > +# Compile-time system search path for libraries > +sys_lib_search_path_spec=$lt_sys_lib_search_path_spec > + > +# Run-time system search path for libraries > +sys_lib_dlsearch_path_spec=$lt_sys_lib_dlsearch_path_spec > + > +# Fix the shell variable \$srcfile for the compiler. > +fix_srcfile_path="$_LT_AC_TAGVAR(fix_srcfile_path, $1)" > + > +# Set to yes if exported symbols are required. > +always_export_symbols=$_LT_AC_TAGVAR(always_export_symbols, $1) > + > +# The commands to list exported symbols. > +export_symbols_cmds=$lt_[]_LT_AC_TAGVAR(export_symbols_cmds, $1) > + > +# The commands to extract the exported symbol list from a shared > archive. > +extract_expsyms_cmds=$lt_extract_expsyms_cmds > + > +# Symbols that should not be listed in the preloaded symbols. > +exclude_expsyms=$lt_[]_LT_AC_TAGVAR(exclude_expsyms, $1) > + > +# Symbols that must always be exported. > +include_expsyms=$lt_[]_LT_AC_TAGVAR(include_expsyms, $1) > + > +ifelse([$1],[], > +[# ### END LIBTOOL CONFIG], > +[# ### END LIBTOOL TAG CONFIG: $tagname]) > + > +__EOF__ > + > +ifelse([$1],[], [ > + case $host_os in > + aix3*) > + cat <<\EOF >> "$cfgfile" > + > +# AIX sometimes has problems with the GCC collect2 program. For some > +# reason, if we set the COLLECT_NAMES environment variable, the > problems > +# vanish in a puff of smoke. > +if test "X${COLLECT_NAMES+set}" != Xset; then > + COLLECT_NAMES= > + export COLLECT_NAMES > +fi > +EOF > + ;; > + esac > + > + # We use sed instead of cat because bash on DJGPP gets confused if > + # if finds mixed CR/LF and LF-only lines. Since sed operates in > + # text mode, it properly converts lines to CR/LF. This bash problem > + # is reportedly fixed, but why not run on old versions too? > + sed '$q' "$ltmain" >> "$cfgfile" || (rm -f "$cfgfile"; exit 1) > + > + mv -f "$cfgfile" "$ofile" || \ > + (rm -f "$ofile" && cp "$cfgfile" "$ofile" && rm -f "$cfgfile") > + chmod +x "$ofile" > +]) > +else > + # If there is no Makefile yet, we rely on a make rule to execute > + # `config.status --recheck' to rerun these tests and create the > + # libtool script then. > + ltmain_in=`echo $ltmain | sed -e 's/\.sh$/.in/'` > + if test -f "$ltmain_in"; then > + test -f Makefile && make "$ltmain" > + fi > +fi > +])# AC_LIBTOOL_CONFIG > + > + > +# AC_LIBTOOL_PROG_COMPILER_NO_RTTI([TAGNAME]) > +# ------------------------------------------- > +AC_DEFUN([AC_LIBTOOL_PROG_COMPILER_NO_RTTI], > +[AC_REQUIRE([_LT_AC_SYS_COMPILER])dnl > + > +_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)= > + > +if test "$GCC" = yes; then > + _LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)=' -fno-builtin' > + > + AC_LIBTOOL_COMPILER_OPTION([if $compiler supports -fno-rtti > -fno-exceptions], > + lt_cv_prog_compiler_rtti_exceptions, > + [-fno-rtti -fno-exceptions], [], > + [_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, > $1)="$_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1) -fno-rtti > -fno-exceptions"]) > +fi > +])# AC_LIBTOOL_PROG_COMPILER_NO_RTTI > + > + > +# AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE > +# --------------------------------- > +AC_DEFUN([AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE], > +[AC_REQUIRE([AC_CANONICAL_HOST]) > +AC_REQUIRE([AC_PROG_NM]) > +AC_REQUIRE([AC_OBJEXT]) > +# Check for command to grab the raw symbol name followed by C symbol > from nm. > +AC_MSG_CHECKING([command to parse $NM output from $compiler object]) > +AC_CACHE_VAL([lt_cv_sys_global_symbol_pipe], > +[ > +# These are sane defaults that work on at least a few old systems. > +# [They come from Ultrix. What could be older than Ultrix?!! ;)] > + > +# Character class describing NM global symbol codes. > +symcode='[[BCDEGRST]]' > + > +# Regexp to match symbols that can be accessed directly from C. > +sympat='\([[_A-Za-z]][[_A-Za-z0-9]]*\)' > + > +# Transform an extracted symbol line into a proper C declaration > +lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^. .* \(.*\)$/extern int > \1;/p'" > + > +# Transform an extracted symbol line into symbol name and symbol > address > +lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ ]]*\) > $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode \([[^ ]]*\) \([[^ > ]]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" > + > +# Define system-specific variables. > +case $host_os in > +aix*) > + symcode='[[BCDT]]' > + ;; > +cygwin* | mingw* | pw32*) > + symcode='[[ABCDGISTW]]' > + ;; > +hpux*) # Its linker distinguishes data from code symbols > + if test "$host_cpu" = ia64; then > + symcode='[[ABCDEGRST]]' > + fi > + lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern > int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'" > + lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ > ]]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([[^ ]]*\) > \([[^ ]]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" > + ;; > +linux*) > + if test "$host_cpu" = ia64; then > + symcode='[[ABCDGIRSTW]]' > + lt_cv_sys_global_symbol_to_cdecl="sed -n -e 's/^T .* \(.*\)$/extern > int \1();/p' -e 's/^$symcode* .* \(.*\)$/extern char \1;/p'" > + lt_cv_sys_global_symbol_to_c_name_address="sed -n -e 's/^: \([[^ > ]]*\) $/ {\\\"\1\\\", (lt_ptr) 0},/p' -e 's/^$symcode* \([[^ ]]*\) > \([[^ ]]*\)$/ {\"\2\", (lt_ptr) \&\2},/p'" > + fi > + ;; > +irix* | nonstopux*) > + symcode='[[BCDEGRST]]' > + ;; > +osf*) > + symcode='[[BCDEGQRST]]' > + ;; > +solaris*) > + symcode='[[BDRT]]' > + ;; > +sco3.2v5*) > + symcode='[[DT]]' > + ;; > +sysv4.2uw2*) > + symcode='[[DT]]' > + ;; > +sysv5* | sco5v6* | unixware* | OpenUNIX*) > + symcode='[[ABDT]]' > + ;; > +sysv4) > + symcode='[[DFNSTU]]' > + ;; > +esac > + > +# Handle CRLF in mingw tool chain > +opt_cr= > +case $build_os in > +mingw*) > + opt_cr=`echo 'x\{0,1\}' | tr x '\015'` # option cr in regexp > + ;; > +esac > + > +# If we're using GNU nm, then use its standard symbol codes. > +case `$NM -V 2>&1` in > +*GNU* | *'with BFD'*) > + symcode='[[ABCDGIRSTW]]' ;; > +esac > + > +# Try without a prefix undercore, then with it. > +for ac_symprfx in "" "_"; do > + > + # Transform symcode, sympat, and symprfx into a raw symbol and a C > symbol. > + symxfrm="\\1 $ac_symprfx\\2 \\2" > + > + # Write the raw and C identifiers. > + lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[ > ]]\($symcode$symcode*\)[[ ]][[ > ]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'" > + > + # Check to see that the pipe works correctly. > + pipe_works=no > + > + rm -f conftest* > + cat > conftest.$ac_ext < +#ifdef __cplusplus > +extern "C" { > +#endif > +char nm_test_var; > +void nm_test_func(){} > +#ifdef __cplusplus > +} > +#endif > +int main(){nm_test_var='a';nm_test_func();return(0);} > +EOF > + > + if AC_TRY_EVAL(ac_compile); then > + # Now try to grab the symbols. > + nlist=conftest.nm > + if AC_TRY_EVAL(NM conftest.$ac_objext \| > $lt_cv_sys_global_symbol_pipe \> $nlist) && test -s "$nlist"; then > + # Try sorting and uniquifying the output. > + if sort "$nlist" | uniq > "$nlist"T; then > + mv -f "$nlist"T "$nlist" > + else > + rm -f "$nlist"T > + fi > + > + # Make sure that we snagged all the symbols we need. > + if grep ' nm_test_var$' "$nlist" >/dev/null; then > + if grep ' nm_test_func$' "$nlist" >/dev/null; then > + cat < conftest.$ac_ext > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +EOF > + # Now generate the symbol file. > + eval "$lt_cv_sys_global_symbol_to_cdecl"' < "$nlist" | grep -v > main >> conftest.$ac_ext' > + > + cat <> conftest.$ac_ext > +#if defined (__STDC__) && __STDC__ > +# define lt_ptr_t void * > +#else > +# define lt_ptr_t char * > +# define const > +#endif > + > +/* The mapping between symbol names and symbols. */ > +const struct { > + const char *name; > + lt_ptr_t address; > +} > +lt_preloaded_symbols[[]] = > +{ > +EOF > + $SED "s/^$symcode$symcode* \(.*\) \(.*\)$/ {\"\2\", > (lt_ptr_t) \&\2},/" < "$nlist" | grep -v main >> conftest.$ac_ext > + cat <<\EOF >> conftest.$ac_ext > + {0, (lt_ptr_t) 0} > +}; > + > +#ifdef __cplusplus > +} > +#endif > +EOF > + # Now try linking the two files. > + mv conftest.$ac_objext conftstm.$ac_objext > + lt_save_LIBS="$LIBS" > + lt_save_CFLAGS="$CFLAGS" > + LIBS="conftstm.$ac_objext" > + > CFLAGS="$CFLAGS$_LT_AC_TAGVAR(lt_prog_compiler_no_builtin_flag, $1)" > + if AC_TRY_EVAL(ac_link) && test -s conftest${ac_exeext}; then > + pipe_works=yes > + fi > + LIBS="$lt_save_LIBS" > + CFLAGS="$lt_save_CFLAGS" > + else > + echo "cannot find nm_test_func in $nlist" >&AS_MESSAGE_LOG_FD > + fi > + else > + echo "cannot find nm_test_var in $nlist" >&AS_MESSAGE_LOG_FD > + fi > + else > + echo "cannot run $lt_cv_sys_global_symbol_pipe" > >&AS_MESSAGE_LOG_FD > + fi > + else > + echo "$progname: failed program was:" >&AS_MESSAGE_LOG_FD > + cat conftest.$ac_ext >&5 > + fi > + rm -f conftest* conftst* > + > + # Do not use the global_symbol_pipe unless it works. > + if test "$pipe_works" = yes; then > + break > + else > + lt_cv_sys_global_symbol_pipe= > + fi > +done > +]) > +if test -z "$lt_cv_sys_global_symbol_pipe"; then > + lt_cv_sys_global_symbol_to_cdecl= > +fi > +if test -z > "$lt_cv_sys_global_symbol_pipe$lt_cv_sys_global_symbol_to_cdecl"; then > + AC_MSG_RESULT(failed) > +else > + AC_MSG_RESULT(ok) > +fi > +]) # AC_LIBTOOL_SYS_GLOBAL_SYMBOL_PIPE > + > + > +# AC_LIBTOOL_PROG_COMPILER_PIC([TAGNAME]) > +# --------------------------------------- > +AC_DEFUN([AC_LIBTOOL_PROG_COMPILER_PIC], > +[_LT_AC_TAGVAR(lt_prog_compiler_wl, $1)= > +_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= > +_LT_AC_TAGVAR(lt_prog_compiler_static, $1)= > + > +AC_MSG_CHECKING([for $compiler option to produce PIC]) > + ifelse([$1],[CXX],[ > + # C++ specific cases for pic, static, wl, etc. > + if test "$GXX" = yes; then > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' > + > + case $host_os in > + aix*) > + # All AIX code is PIC. > + if test "$host_cpu" = ia64; then > + # AIX 5 now supports IA64 processor > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + fi > + ;; > + amigaos*) > + # FIXME: we need at least 68020 code to build shared libraries, > but > + # adding the `-m68020' flag to GCC prevents building anything > better, > + # like `-m68040'. > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-m68020 -resident32 > -malways-restore-a4' > + ;; > + beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | > osf5*) > + # PIC is the default for these OSes. > + ;; > + mingw* | os2* | pw32*) > + # This hack is so that the source file can tell whether it is > being > + # built for inclusion in a dll (and should export symbols for > example). > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT' > + ;; > + darwin* | rhapsody*) > + # PIC is the default on this platform > + # Common symbols not allowed in MH_DYLIB files > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fno-common' > + ;; > + *djgpp*) > + # DJGPP does not support shared libraries at all > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= > + ;; > + interix3*) > + # Interix 3.x gcc -fpic/-fPIC options generate broken code. > + # Instead, we relocate shared libraries at runtime. > + ;; > + sysv4*MP*) > + if test -d /usr/nec; then > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=-Kconform_pic > + fi > + ;; > + hpux*) > + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but > + # not for PA HP-UX. > + case $host_cpu in > + hppa*64*|ia64*) > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' > + ;; > + esac > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' > + ;; > + esac > + else > + case $host_os in > + aix4* | aix5*) > + # All AIX code is PIC. > + if test "$host_cpu" = ia64; then > + # AIX 5 now supports IA64 processor > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + else > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-bnso > -bI:/lib/syscalls.exp' > + fi > + ;; > + chorus*) > + case $cc_basename in > + cxch68*) > + # Green Hills C++ Compiler > + # _LT_AC_TAGVAR(lt_prog_compiler_static, > $1)="--no_auto_instantiation -u __main -u __premain -u _abort -r > $COOL_DIR/lib/libOrb.a $MVME_DIR/lib/CC/libC.a > $MVME_DIR/lib/classix/libcx.s.a" > + ;; > + esac > + ;; > + darwin*) > + # PIC is the default on this platform > + # Common symbols not allowed in MH_DYLIB files > + case $cc_basename in > + xlc*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-qnocommon' > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + ;; > + esac > + ;; > + dgux*) > + case $cc_basename in > + ec++*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + ;; > + ghcx*) > + # Green Hills C++ Compiler > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' > + ;; > + *) > + ;; > + esac > + ;; > + freebsd* | kfreebsd*-gnu | dragonfly*) > + # FreeBSD uses GNU C++ > + ;; > + hpux9* | hpux10* | hpux11*) > + case $cc_basename in > + CC*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a > ${wl}archive' > + if test "$host_cpu" != ia64; then > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z' > + fi > + ;; > + aCC*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a > ${wl}archive' > + case $host_cpu in > + hppa*64*|ia64*) > + # +Z the default > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z' > + ;; > + esac > + ;; > + *) > + ;; > + esac > + ;; > + interix*) > + # This is c89, which is MS Visual C++ (no shared libs) > + # Anyone wants to do a port? > + ;; > + irix5* | irix6* | nonstopux*) > + case $cc_basename in > + CC*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > + # CC pic flag -KPIC is the default. > + ;; > + *) > + ;; > + esac > + ;; > + linux*) > + case $cc_basename in > + KCC*) > + # KAI C++ Compiler > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='--backend -Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' > + ;; > + icpc* | ecpc*) > + # Intel C++ > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' > + ;; > + pgCC*) > + # Portland Group C++ compiler. > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fpic' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + cxx*) > + # Compaq C++ > + # Make sure the PIC flag is empty. It appears that all > Alpha > + # Linux and Compaq Tru64 Unix objects are PIC. > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > + ;; > + *) > + ;; > + esac > + ;; > + lynxos*) > + ;; > + m88k*) > + ;; > + mvs*) > + case $cc_basename in > + cxx*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-W c,exportall' > + ;; > + *) > + ;; > + esac > + ;; > + netbsd*) > + ;; > + osf3* | osf4* | osf5*) > + case $cc_basename in > + KCC*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='--backend -Wl,' > + ;; > + RCC*) > + # Rational C++ 2.4.1 > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' > + ;; > + cxx*) > + # Digital/Compaq C++ > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + # Make sure the PIC flag is empty. It appears that all > Alpha > + # Linux and Compaq Tru64 Unix objects are PIC. > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > + ;; > + *) > + ;; > + esac > + ;; > + psos*) > + ;; > + solaris*) > + case $cc_basename in > + CC*) > + # Sun C++ 4.2, 5.x and Centerline C++ > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ' > + ;; > + gcx*) > + # Green Hills C++ Compiler > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-PIC' > + ;; > + *) > + ;; > + esac > + ;; > + sunos4*) > + case $cc_basename in > + CC*) > + # Sun C++ 4.x > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + lcc*) > + # Lucid > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' > + ;; > + *) > + ;; > + esac > + ;; > + tandem*) > + case $cc_basename in > + NCC*) > + # NonStop-UX NCC 3.20 > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + ;; > + *) > + ;; > + esac > + ;; > + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) > + case $cc_basename in > + CC*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + esac > + ;; > + vxworks*) > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no > + ;; > + esac > + fi > +], > +[ > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' > + > + case $host_os in > + aix*) > + # All AIX code is PIC. > + if test "$host_cpu" = ia64; then > + # AIX 5 now supports IA64 processor > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + fi > + ;; > + > + amigaos*) > + # FIXME: we need at least 68020 code to build shared libraries, > but > + # adding the `-m68020' flag to GCC prevents building anything > better, > + # like `-m68040'. > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-m68020 -resident32 > -malways-restore-a4' > + ;; > + > + beos* | cygwin* | irix5* | irix6* | nonstopux* | osf3* | osf4* | > osf5*) > + # PIC is the default for these OSes. > + ;; > + > + mingw* | pw32* | os2*) > + # This hack is so that the source file can tell whether it is > being > + # built for inclusion in a dll (and should export symbols for > example). > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT' > + ;; > + > + darwin* | rhapsody*) > + # PIC is the default on this platform > + # Common symbols not allowed in MH_DYLIB files > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fno-common' > + ;; > + > + interix3*) > + # Interix 3.x gcc -fpic/-fPIC options generate broken code. > + # Instead, we relocate shared libraries at runtime. > + ;; > + > + msdosdjgpp*) > + # Just because we use GCC doesn't mean we suddenly get shared > libraries > + # on systems that don't support them. > + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no > + enable_shared=no > + ;; > + > + sysv4*MP*) > + if test -d /usr/nec; then > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=-Kconform_pic > + fi > + ;; > + > + hpux*) > + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but > + # not for PA HP-UX. > + case $host_cpu in > + hppa*64*|ia64*) > + # +Z the default > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' > + ;; > + esac > + ;; > + > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC' > + ;; > + esac > + else > + # PORTME Check for flag to pass linker flags through the system > compiler. > + case $host_os in > + aix*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + if test "$host_cpu" = ia64; then > + # AIX 5 now supports IA64 processor > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + else > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-bnso > -bI:/lib/syscalls.exp' > + fi > + ;; > + darwin*) > + # PIC is the default on this platform > + # Common symbols not allowed in MH_DYLIB files > + case $cc_basename in > + xlc*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-qnocommon' > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + ;; > + esac > + ;; > + > + mingw* | pw32* | os2*) > + # This hack is so that the source file can tell whether it is > being > + # built for inclusion in a dll (and should export symbols for > example). > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-DDLL_EXPORT' > + ;; > + > + hpux9* | hpux10* | hpux11*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but > + # not for PA HP-UX. > + case $host_cpu in > + hppa*64*|ia64*) > + # +Z the default > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='+Z' > + ;; > + esac > + # Is there a better lt_prog_compiler_static that works with the > bundled CC? > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='${wl}-a ${wl}archive' > + ;; > + > + irix5* | irix6* | nonstopux*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + # PIC (with -KPIC) is the default. > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > + ;; > + > + newsos6) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + > + linux*) > + case $cc_basename in > + icc* | ecc*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-static' > + ;; > + pgcc* | pgf77* | pgf90* | pgf95*) > + # Portland Group compilers (*not* the Pentium gcc compiler, > + # which looks to be a dead project) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-fpic' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + ccc*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + # All Alpha code is PIC. > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > + ;; > + esac > + ;; > + > + osf3* | osf4* | osf5*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + # All OSF/1 code is PIC. > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > + ;; > + > + solaris*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + case $cc_basename in > + f77* | f90* | f95*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ';; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,';; > + esac > + ;; > + > + sunos4*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld ' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-PIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + > + sysv4 | sysv4.2uw2* | sysv4.3*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + > + sysv4*MP*) > + if test -d /usr/nec ;then > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-Kconform_pic' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + fi > + ;; > + > + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + > + unicos*) > + _LT_AC_TAGVAR(lt_prog_compiler_wl, $1)='-Wl,' > + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no > + ;; > + > + uts4*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)='-pic' > + _LT_AC_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic' > + ;; > + > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no > + ;; > + esac > + fi > +]) > +AC_MSG_RESULT([$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)]) > + > +# > +# Check to make sure the PIC flag actually works. > +# > +if test -n "$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)"; then > + AC_LIBTOOL_COMPILER_OPTION([if $compiler PIC flag > $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) works], > + _LT_AC_TAGVAR(lt_prog_compiler_pic_works, $1), > + [$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)ifelse([$1],[],[ > -DPIC],[ifelse([$1],[CXX],[ -DPIC],[])])], [], > + [case $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) in > + "" | " "*) ;; > + *) _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)=" > $_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)" ;; > + esac], > + [_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= > + _LT_AC_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no]) > +fi > +case $host_os in > + # For platforms which do not support PIC, -DPIC is meaningless: > + *djgpp*) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, $1)= > + ;; > + *) > + _LT_AC_TAGVAR(lt_prog_compiler_pic, > $1)="$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1)ifelse([$1],[],[ > -DPIC],[ifelse([$1],[CXX],[ -DPIC],[])])" > + ;; > +esac > + > +# > +# Check to make sure the static flag actually works. > +# > +wl=$_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) eval > lt_tmp_static_flag=\"$_LT_AC_TAGVAR(lt_prog_compiler_static, $1)\" > +AC_LIBTOOL_LINKER_OPTION([if $compiler static flag $lt_tmp_static_flag > works], > + _LT_AC_TAGVAR(lt_prog_compiler_static_works, $1), > + $lt_tmp_static_flag, > + [], > + [_LT_AC_TAGVAR(lt_prog_compiler_static, $1)=]) > +]) > + > + > +# AC_LIBTOOL_PROG_LD_SHLIBS([TAGNAME]) > +# ------------------------------------ > +# See if the linker supports building shared libraries. > +AC_DEFUN([AC_LIBTOOL_PROG_LD_SHLIBS], > +[AC_MSG_CHECKING([whether the $compiler linker ($LD) supports shared > libraries]) > +ifelse([$1],[CXX],[ > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | > $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > > $export_symbols' > + case $host_os in > + aix4* | aix5*) > + # If we're using GNU nm, then we don't want the "-C" option. > + # -C means demangle to AIX nm, but means don't demangle with GNU nm > + if $NM -V 2>&1 | grep 'GNU' > /dev/null; then > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs > $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 > == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort > -u > $export_symbols' > + else > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs > $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 > == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort > -u > $export_symbols' > + fi > + ;; > + pw32*) > + _LT_AC_TAGVAR(export_symbols_cmds, $1)="$ltdll_cmds" > + ;; > + cygwin* | mingw*) > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | > $global_symbol_pipe | $SED -e '\''/^[[BCDGRS]] /s/.* \([[^ ]]*\)/\1 > DATA/;/^.* __nm__/s/^.* __nm__\([[^ ]]*\) [[^ ]]*/\1 DATA/;/^I > /d;/^[[AITW]] /s/.* //'\'' | sort | uniq > $export_symbols' > + ;; > + *) > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | > $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > > $export_symbols' > + ;; > + esac > +],[ > + runpath_var= > + _LT_AC_TAGVAR(allow_undefined_flag, $1)= > + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=no > + _LT_AC_TAGVAR(archive_cmds, $1)= > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)= > + _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)= > + _LT_AC_TAGVAR(old_archive_from_expsyms_cmds, $1)= > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= > + _LT_AC_TAGVAR(thread_safe_flag_spec, $1)= > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)= > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)= > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported > + _LT_AC_TAGVAR(link_all_deplibs, $1)=unknown > + _LT_AC_TAGVAR(hardcode_automatic, $1)=no > + _LT_AC_TAGVAR(module_cmds, $1)= > + _LT_AC_TAGVAR(module_expsym_cmds, $1)= > + _LT_AC_TAGVAR(always_export_symbols, $1)=no > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience | > $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > > $export_symbols' > + # include_expsyms should be a list of space-separated symbols to be > *always* > + # included in the symbol list > + _LT_AC_TAGVAR(include_expsyms, $1)= > + # exclude_expsyms can be an extended regexp of symbols to exclude > + # it will be wrapped by ` (' and `)$', so one must not match > beginning or > + # end of line. Example: `a|bc|.*d.*' will exclude the symbols `a' > and `bc', > + # as well as any symbol that contains `d'. > + _LT_AC_TAGVAR(exclude_expsyms, $1)="_GLOBAL_OFFSET_TABLE_" > + # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out > + # platforms (ab)use it in PIC code, but their linkers get confused if > + # the symbol is explicitly referenced. Since portable code cannot > + # rely on this symbol name, it's probably fine to never include it in > + # preloaded symbol tables. > + extract_expsyms_cmds= > + # Just being paranoid about ensuring that cc_basename is set. > + _LT_CC_BASENAME([$compiler]) > + case $host_os in > + cygwin* | mingw* | pw32*) > + # FIXME: the MSVC++ port hasn't been tested in a loooong time > + # When not using gcc, we currently assume that we are using > + # Microsoft Visual C++. > + if test "$GCC" != yes; then > + with_gnu_ld=no > + fi > + ;; > + interix*) > + # we just hope/assume this is gcc and not c89 (= MSVC++) > + with_gnu_ld=yes > + ;; > + openbsd*) > + with_gnu_ld=no > + ;; > + esac > + > + _LT_AC_TAGVAR(ld_shlibs, $1)=yes > + if test "$with_gnu_ld" = yes; then > + # If archive_cmds runs LD, not CC, wlarc should be empty > + wlarc='${wl}' > + > + # Set some defaults for GNU ld with shared library support. These > + # are reset later if shared libraries are not supported. Putting > them > + # here allows them to be overridden if necessary. > + runpath_var=LD_RUN_PATH > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}--rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic' > + # ancient GNU ld didn't support --whole-archive et. al. > + if $LD --help 2>&1 | grep 'no-whole-archive' > /dev/null; then > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)="$wlarc"'--whole-archive$convenience '"$wlarc"'--no-whole-archive' > + else > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= > + fi > + supports_anon_versioning=no > + case `$LD -v 2>/dev/null` in > + *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.10.*) ;; # catch versions < > 2.11 > + *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... > + *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake > 8.2 ... > + *\ 2.11.*) ;; # other 2.11 versions > + *) supports_anon_versioning=yes ;; > + esac > + > + # See if GNU ld supports shared libraries. > + case $host_os in > + aix3* | aix4* | aix5*) > + # On AIX/PPC, the GNU linker is very broken > + if test "$host_cpu" != ia64; then > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + cat <&2 > + > +*** Warning: the GNU linker, at least up to release 2.9.1, is reported > +*** to be unable to reliably create shared libraries on AIX. > +*** Therefore, libtool is disabling shared libraries support. If you > +*** really care for shared libraries, you may want to modify your PATH > +*** so that a non-GNU linker is found, and then restart. > + > +EOF > + fi > + ;; > + > + amigaos*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$rm > $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> > $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> > $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> > $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB > $lib~(cd $output_objdir && a2ixlibrary -32)' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + > + # Samuel A. Falvo II reports > + # that the semantics of dynamic libraries on AmigaOS, at least up > + # to version 4, is to share data among multiple programs linked > + # with the same dynamic library. Since this doesn't match the > + # behavior of shared libraries on other platforms, we can't use > + # them. > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + > + beos*) > + if $LD --help 2>&1 | grep ': supported targets:.* elf' > > /dev/null; then > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + # Joseph Beckenbach says some releases of gcc > + # support --undefined. This deserves some investigation. FIXME > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -nostart $libobjs $deplibs > $compiler_flags ${wl}-soname $wl$soname -o $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + > + cygwin* | mingw* | pw32*) > + # _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1) is actually > meaningless, > + # as there is no search path for DLLs. > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + _LT_AC_TAGVAR(always_export_symbols, $1)=no > + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM $libobjs $convenience > | $global_symbol_pipe | $SED -e '\''/^[[BCDGRS]] /s/.* \([[^ ]]*\)/\1 > DATA/'\'' | $SED -e '\''/^[[AITW]] /s/.* //'\'' | sort | uniq > > $export_symbols' > + > + if $LD --help 2>&1 | grep 'auto-import' > /dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs > $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base > -Xlinker --out-implib -Xlinker $lib' > + # If the export-symbols file already is a .def file (1st line > + # is EXPORTS), use it as is; otherwise, prepend... > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='if test "x`$SED 1q > $export_symbols`" = xEXPORTS; then > + cp $export_symbols $output_objdir/$soname.def; > + else > + echo EXPORTS > $output_objdir/$soname.def; > + cat $export_symbols >> $output_objdir/$soname.def; > + fi~ > + $CC -shared $output_objdir/$soname.def $libobjs $deplibs > $compiler_flags -o $output_objdir/$soname ${wl}--enable-auto-image-base > -Xlinker --out-implib -Xlinker $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + > + interix3*) > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-rpath,$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + # Hack: On Interix 3.x, we cannot compile PIC because of a broken > gcc. > + # Instead, shared libraries are loaded at an image base > (0x10000000 by > + # default) and relocated if they conflict, which is a slow very > memory > + # consuming and fragmenting process. To avoid this, we pick a > random, > + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at > link > + # time. Moving up from 0x10000000 also allows more sbrk(2) > space. > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $libobjs > $deplibs $compiler_flags ${wl}-h,$soname ${wl}--image-base,`expr > ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed "s,^,_," > $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag > $libobjs $deplibs $compiler_flags ${wl}-h,$soname > ${wl}--retain-symbols-file,$output_objdir/$soname.expsym > ${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` > -o $lib' > + ;; > + > + linux*) > + if $LD --help 2>&1 | grep ': supported targets:.* elf' > > /dev/null; then > + tmp_addflag= > + case $cc_basename,$host_cpu in > + pgcc*) # Portland Group C compiler > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test -n > \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo > \"$new_convenience\"` ${wl}--no-whole-archive' > + tmp_addflag=' $pic_flag' > + ;; > + pgf77* | pgf90* | pgf95*) # Portland Group f77 and f90 > compilers > + _LT_AC_TAGVAR(whole_archive_flag_spec, > $1)='${wl}--whole-archive`for conv in $convenience\"\"; do test -n > \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; $echo > \"$new_convenience\"` ${wl}--no-whole-archive' > + tmp_addflag=' $pic_flag -Mnomain' ;; > + ecc*,ia64* | icc*,ia64*) # Intel C compiler on > ia64 > + tmp_addflag=' -i_dynamic' ;; > + efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 > + tmp_addflag=' -i_dynamic -nofor_main' ;; > + ifc* | ifort*) # Intel Fortran compiler > + tmp_addflag=' -nofor_main' ;; > + esac > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared'"$tmp_addflag"' > $libobjs $deplibs $compiler_flags ${wl}-soname $wl$soname -o $lib' > + > + if test $supports_anon_versioning = yes; then > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > > $output_objdir/$libname.ver~ > + cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> > $output_objdir/$libname.ver~ > + $echo "local: *; };" >> $output_objdir/$libname.ver~ > + $CC -shared'"$tmp_addflag"' $libobjs $deplibs $compiler_flags > ${wl}-soname $wl$soname ${wl}-version-script > ${wl}$output_objdir/$libname.ver -o $lib' > + fi > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + > + netbsd*) > + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable $libobjs > $deplibs $linker_flags -o $lib' > + wlarc= > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs > $compiler_flags ${wl}-soname $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs > $deplibs $compiler_flags ${wl}-soname $wl$soname > ${wl}-retain-symbols-file $wl$export_symbols -o $lib' > + fi > + ;; > + > + solaris*) > + if $LD -v 2>&1 | grep 'BFD 2\.8' > /dev/null; then > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + cat <&2 > + > +*** Warning: The releases 2.8.* of the GNU linker cannot reliably > +*** create shared libraries on Solaris systems. Therefore, libtool > +*** is disabling shared libraries support. We urge you to upgrade GNU > +*** binutils to release 2.9.1 or newer. Another option is to modify > +*** your PATH or compiler configuration so that the native linker is > +*** used, and then restart. > + > +EOF > + elif $LD --help 2>&1 | grep ': supported targets:.* elf' > > /dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs > $compiler_flags ${wl}-soname $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs > $deplibs $compiler_flags ${wl}-soname $wl$soname > ${wl}-retain-symbols-file $wl$export_symbols -o $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + > + sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) > + case `$LD -v 2>&1` in > + *\ [[01]].* | *\ 2.[[0-9]].* | *\ 2.1[[0-5]].*) > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + cat <<_LT_EOF 1>&2 > + > +*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 can not > +*** reliably create shared libraries on SCO systems. Therefore, > libtool > +*** is disabling shared libraries support. We urge you to upgrade GNU > +*** binutils to release 2.16.91.0.3 or newer. Another option is to > modify > +*** your PATH or compiler configuration so that the native linker is > +*** used, and then restart. > + > +_LT_EOF > + ;; > + *) > + if $LD --help 2>&1 | grep ': supported targets:.* elf' > > /dev/null; then > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z > "$SCOABSPATH" && echo ${wl}-rpath,$libdir`' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs > $deplibs $compiler_flags > ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs > $deplibs $compiler_flags > ${wl}-soname,\${SCOABSPATH:+${install_libdir}/}$soname,-retain-symbols-f > ile,$export_symbols -o $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + esac > + ;; > + > + sunos4*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -assert pure-text > -Bshareable -o $lib $libobjs $deplibs $linker_flags' > + wlarc= > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + *) > + if $LD --help 2>&1 | grep ': supported targets:.* elf' > > /dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs > $compiler_flags ${wl}-soname $wl$soname -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $libobjs > $deplibs $compiler_flags ${wl}-soname $wl$soname > ${wl}-retain-symbols-file $wl$export_symbols -o $lib' > + else > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + fi > + ;; > + esac > + > + if test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no; then > + runpath_var= > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)= > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)= > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)= > + fi > + else > + # PORTME fill in a description of your system's linker (not GNU ld) > + case $host_os in > + aix3*) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + _LT_AC_TAGVAR(always_export_symbols, $1)=yes > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$LD -o > $output_objdir/$soname $libobjs $deplibs $linker_flags > -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib > $output_objdir/$soname' > + # Note: this linker hardcodes the directories in LIBPATH if there > + # are no directories specified by -L. > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + if test "$GCC" = yes && test -z "$lt_prog_compiler_static"; then > + # Neither direct hardcoding nor static linking is supported with > a > + # broken collect2. > + _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported > + fi > + ;; > + > + aix4* | aix5*) > + if test "$host_cpu" = ia64; then > + # On IA64, the linker does run time linking by default, so we > don't > + # have to do anything special. > + aix_use_runtimelinking=no > + exp_sym_flag='-Bexport' > + no_entry_flag="" > + else > + # If we're using GNU nm, then we don't want the "-C" option. > + # -C means demangle to AIX nm, but means don't demangle with GNU > nm > + if $NM -V 2>&1 | grep 'GNU' > /dev/null; then > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -Bpg $libobjs > $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 > == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort > -u > $export_symbols' > + else > + _LT_AC_TAGVAR(export_symbols_cmds, $1)='$NM -BCpg $libobjs > $convenience | awk '\''{ if (((\[$]2 == "T") || (\[$]2 == "D") || (\[$]2 > == "B")) && ([substr](\[$]3,1,1) != ".")) { print \[$]3 } }'\'' | sort > -u > $export_symbols' > + fi > + aix_use_runtimelinking=no > + > + # Test if we are trying to use run time linking or normal > + # AIX style linking. If -brtl is somewhere in LDFLAGS, we > + # need to do runtime linking. > + case $host_os in aix4.[[23]]|aix4.[[23]].*|aix5*) > + for ld_flag in $LDFLAGS; do > + if (test $ld_flag = "-brtl" || test $ld_flag = "-Wl,-brtl"); > then > + aix_use_runtimelinking=yes > + break > + fi > + done > + ;; > + esac > + > + exp_sym_flag='-bexport' > + no_entry_flag='-bnoentry' > + fi > + > + # When large executables or shared objects are built, AIX ld can > + # have problems creating the table of contents. If linking a > library > + # or program results in "error TOC overflow" add -mminimal-toc to > + # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not > + # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. > + > + _LT_AC_TAGVAR(archive_cmds, $1)='' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + > + if test "$GCC" = yes; then > + case $host_os in aix4.[[012]]|aix4.[[012]].*) > + # We only want to do this on AIX 4.2 and lower, the check > + # below for broken collect2 doesn't work under 4.3+ > + collect2name=`${CC} -print-prog-name=collect2` > + if test -f "$collect2name" && \ > + strings "$collect2name" | grep resolve_lib_name >/dev/null > + then > + # We have reworked collect2 > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + else > + # We have old collect2 > + _LT_AC_TAGVAR(hardcode_direct, $1)=unsupported > + # It fails to find uninstalled libraries when the uninstalled > + # path is not listed in the libpath. Setting hardcode_minus_L > + # to unsupported forces relinking > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)= > + fi > + ;; > + esac > + shared_flag='-shared' > + if test "$aix_use_runtimelinking" = yes; then > + shared_flag="$shared_flag "'${wl}-G' > + fi > + else > + # not using gcc > + if test "$host_cpu" = ia64; then > + # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 > Release > + # chokes on -Wl,-G. The following line is correct: > + shared_flag='-G' > + else > + if test "$aix_use_runtimelinking" = yes; then > + shared_flag='${wl}-G' > + else > + shared_flag='${wl}-bM:SRE' > + fi > + fi > + fi > + > + # It seems that -bexpall does not export symbols beginning with > + # underscore (_), so it is better to generate a list of symbols > to export. > + _LT_AC_TAGVAR(always_export_symbols, $1)=yes > + if test "$aix_use_runtimelinking" = yes; then > + # Warning - without using the other runtime loading flags > (-brtl), > + # -berok will link without error, but may produce a broken > library. > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='-berok' > + # Determine the default libpath from the value encoded in an > empty executable. > + _LT_AC_SYS_LIBPATH_AIX > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-blibpath:$libdir:'"$aix_libpath" > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC"' -o > $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' > $compiler_flags `if test "x${allow_undefined_flag}" != "x"; then echo > "${wl}${allow_undefined_flag}"; else :; fi` > '"\${wl}$exp_sym_flag:\$export_symbols $shared_flag" > + else > + if test "$host_cpu" = ia64; then > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-R > $libdir:/usr/lib:/lib' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)="-z nodefs" > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o > $output_objdir/$soname $libobjs $deplibs '"\${wl}$no_entry_flag"' > $compiler_flags ${wl}${allow_undefined_flag} > '"\${wl}$exp_sym_flag:\$export_symbols" > + else > + # Determine the default libpath from the value encoded in an > empty executable. > + _LT_AC_SYS_LIBPATH_AIX > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-blibpath:$libdir:'"$aix_libpath" > + # Warning - without using the other run time loading flags, > + # -berok will link without error, but may produce a broken > library. > + _LT_AC_TAGVAR(no_undefined_flag, $1)=' ${wl}-bernotok' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' ${wl}-berok' > + # Exported symbols can be pulled into shared objects from > archives > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='$convenience' > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes > + # This is similar to how AIX traditionally builds its shared > libraries. > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)="\$CC $shared_flag"' -o > $output_objdir/$soname $libobjs $deplibs ${wl}-bnoentry $compiler_flags > ${wl}-bE:$export_symbols${allow_undefined_flag}~$AR $AR_FLAGS > $output_objdir/$libname$release.a $output_objdir/$soname' > + fi > + fi > + ;; > + > + amigaos*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$rm > $output_objdir/a2ixlibrary.data~$echo "#define NAME $libname" > > $output_objdir/a2ixlibrary.data~$echo "#define LIBRARY_ID 1" >> > $output_objdir/a2ixlibrary.data~$echo "#define VERSION $major" >> > $output_objdir/a2ixlibrary.data~$echo "#define REVISION $revision" >> > $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB > $lib~(cd $output_objdir && a2ixlibrary -32)' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + # see comment about different semantics on the GNU ld section > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + > + bsdi[[45]]*) > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)=-rdynamic > + ;; > + > + cygwin* | mingw* | pw32*) > + # When not using gcc, we currently assume that we are using > + # Microsoft Visual C++. > + # hardcode_libdir_flag_spec is actually meaningless, as there is > + # no search path for DLLs. > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)=' ' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + # Tell ltmain to make .lib files, not .a files. > + libext=lib > + # Tell ltmain to make .dll files, not .so files. > + shrext_cmds=".dll" > + # FIXME: Setting linknames here is a bad hack. > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -o $lib $libobjs > $compiler_flags `echo "$deplibs" | $SED -e '\''s/ -lc$//'\''` -link > -dll~linknames=' > + # The linker will automatically build a .lib file if we build a > DLL. > + _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)='true' > + # FIXME: Should let the user specify the lib program. > + _LT_AC_TAGVAR(old_archive_cmds, $1)='lib > /OUT:$oldlib$oldobjs$old_deplibs' > + _LT_AC_TAGVAR(fix_srcfile_path, $1)='`cygpath -w "$srcfile"`' > + _LT_AC_TAGVAR(enable_shared_with_static_runtimes, $1)=yes > + ;; > + > + darwin* | rhapsody*) > + case $host_os in > + rhapsody* | darwin1.[[012]]) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined > ${wl}suppress' > + ;; > + *) # Darwin 1.3 on > + if test -z ${MACOSX_DEPLOYMENT_TARGET} ; then > + _LT_AC_TAGVAR(allow_undefined_flag, > $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' > + else > + case ${MACOSX_DEPLOYMENT_TARGET} in > + 10.[[012]]) > + _LT_AC_TAGVAR(allow_undefined_flag, > $1)='${wl}-flat_namespace ${wl}-undefined ${wl}suppress' > + ;; > + 10.*) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-undefined > ${wl}dynamic_lookup' > + ;; > + esac > + fi > + ;; > + esac > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_automatic, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=unsupported > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='' > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + if test "$GCC" = yes ; then > + output_verbose_link_cmd='echo' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -dynamiclib > $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags > -install_name $rpath/$soname $verstring' > + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o $lib > -bundle $libobjs $deplibs$compiler_flags' > + # Don't fix this by using the ld -exported_symbols_list flag, it > doesn't exist in older darwin lds > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ > ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC -dynamiclib > $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags > -install_name $rpath/$soname $verstring~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e "s,^[ > ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + else > + case $cc_basename in > + xlc*) > + output_verbose_link_cmd='echo' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -qmkshrobj > $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags > ${wl}-install_name ${wl}`echo $rpath/$soname` $verstring' > + _LT_AC_TAGVAR(module_cmds, $1)='$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags' > + # Don't fix this by using the ld -exported_symbols_list flag, > it doesn't exist in older darwin lds > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC -qmkshrobj > $allow_undefined_flag -o $lib $libobjs $deplibs $compiler_flags > ${wl}-install_name ${wl}$rpath/$soname $verstring~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + _LT_AC_TAGVAR(module_expsym_cmds, $1)='sed -e "s,#.*,," -e > "s,^[ ]*,," -e "s,^\(..*\),_&," < $export_symbols > > $output_objdir/${libname}-symbols.expsym~$CC $allow_undefined_flag -o > $lib -bundle $libobjs $deplibs$compiler_flags~nmedit -s > $output_objdir/${libname}-symbols.expsym ${lib}' > + ;; > + *) > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + fi > + ;; > + > + dgux*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + freebsd1*) > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + > + # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ > constructor > + # support. Future versions do this automatically, but an explicit > c++rt0.o > + # does not break anything, and helps significantly (at the cost of > a little > + # extra space). > + freebsd2.2*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs > $deplibs $linker_flags /usr/lib/c++rt0.o' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + # Unfortunately, older versions of FreeBSD 2 do not have this > feature. > + freebsd2*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib $libobjs > $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + # FreeBSD 3 and greater uses gcc -shared to do shared libraries. > + freebsd* | kfreebsd*-gnu | dragonfly*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -o $lib $libobjs > $deplibs $compiler_flags' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + hpux9*) > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$CC > -shared -fPIC ${wl}+b ${wl}$install_libdir -o $output_objdir/$soname > $libobjs $deplibs $compiler_flags~test $output_objdir/$soname = $lib || > mv $output_objdir/$soname $lib' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$rm $output_objdir/$soname~$LD > -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs > $linker_flags~test $output_objdir/$soname = $lib || mv > $output_objdir/$soname $lib' > + fi > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + > + # hardcode_minus_L: Not really in the search PATH, > + # but as the default location of the library. > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + ;; > + > + hpux10*) > + if test "$GCC" = yes -a "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -fPIC ${wl}+h > ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs > $compiler_flags' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -b +h $soname +b > $install_libdir -o $lib $libobjs $deplibs $linker_flags' > + fi > + if test "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + > + # hardcode_minus_L: Not really in the search PATH, > + # but as the default location of the library. > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + fi > + ;; > + > + hpux11*) > + if test "$GCC" = yes -a "$with_gnu_ld" = no; then > + case $host_cpu in > + hppa*64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}+h > ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' > + ;; > + ia64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}+h > ${wl}$soname ${wl}+nodefaultrpath -o $lib $libobjs $deplibs > $compiler_flags' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared -fPIC ${wl}+h > ${wl}$soname ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs > $compiler_flags' > + ;; > + esac > + else > + case $host_cpu in > + hppa*64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname > -o $lib $libobjs $deplibs $compiler_flags' > + ;; > + ia64*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname > ${wl}+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -b ${wl}+h ${wl}$soname > ${wl}+b ${wl}$install_libdir -o $lib $libobjs $deplibs $compiler_flags' > + ;; > + esac > + fi > + if test "$with_gnu_ld" = no; then > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}+b > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + > + case $host_cpu in > + hppa*64*|ia64*) > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='+b $libdir' > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + *) > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + > + # hardcode_minus_L: Not really in the search PATH, > + # but as the default location of the library. > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + ;; > + esac > + fi > + ;; > + > + irix5* | irix6* | nonstopux*) > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $libobjs $deplibs > $compiler_flags ${wl}-soname ${wl}$soname `test -n "$verstring" && echo > ${wl}-set_version ${wl}$verstring` ${wl}-update_registry > ${wl}${output_objdir}/so_locations -o $lib' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared $libobjs $deplibs > $linker_flags -soname $soname `test -n "$verstring" && echo -set_version > $verstring` -update_registry ${output_objdir}/so_locations -o $lib' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec_ld, $1)='-rpath $libdir' > + fi > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + ;; > + > + netbsd*) > + if echo __ELF__ | $CC -E - | grep __ELF__ >/dev/null; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib > $libobjs $deplibs $linker_flags' # a.out > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -shared -o $lib $libobjs > $deplibs $linker_flags' # ELF > + fi > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + newsos6) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + openbsd*) > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`" || test > "$host_os-$host_cpu" = "openbsd2.8-powerpc"; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o $lib > $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared $pic_flag -o > $lib $libobjs $deplibs $compiler_flags > ${wl}-retain-symbols-file,$export_symbols' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-rpath,$libdir' > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-E' > + else > + case $host_os in > + openbsd[[01]].* | openbsd2.[[0-7]] | openbsd2.[[0-7]].*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -Bshareable -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + ;; > + *) > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag -o > $lib $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, > $1)='${wl}-rpath,$libdir' > + ;; > + esac > + fi > + ;; > + > + os2*) > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=unsupported > + _LT_AC_TAGVAR(archive_cmds, $1)='$echo "LIBRARY $libname > INITINSTANCE" > $output_objdir/$libname.def~$echo "DESCRIPTION > \"$libname\"" >> $output_objdir/$libname.def~$echo DATA >> > $output_objdir/$libname.def~$echo " SINGLE NONSHARED" >> > $output_objdir/$libname.def~$echo EXPORTS >> > $output_objdir/$libname.def~emxexp $libobjs >> > $output_objdir/$libname.def~$CC -Zdll -Zcrtdll -o $lib $libobjs $deplibs > $compiler_flags $output_objdir/$libname.def' > + _LT_AC_TAGVAR(old_archive_From_new_cmds, $1)='emximp -o > $output_objdir/$libname.a $output_objdir/$libname.def' > + ;; > + > + osf3*) > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' > ${wl}-expect_unresolved ${wl}\*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC > -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags > ${wl}-soname ${wl}$soname `test -n "$verstring" && echo > ${wl}-set_version ${wl}$verstring` ${wl}-update_registry > ${wl}${output_objdir}/so_locations -o $lib' > + else > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD > -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -soname > $soname `test -n "$verstring" && echo -set_version $verstring` > -update_registry ${output_objdir}/so_locations -o $lib' > + fi > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + ;; > + > + osf4* | osf5*) # as osf3* with the addition of -msym flag > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' > ${wl}-expect_unresolved ${wl}\*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC > -shared${allow_undefined_flag} $libobjs $deplibs $compiler_flags > ${wl}-msym ${wl}-soname ${wl}$soname `test -n "$verstring" && echo > ${wl}-set_version ${wl}$verstring` ${wl}-update_registry > ${wl}${output_objdir}/so_locations -o $lib' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath > ${wl}$libdir' > + else > + _LT_AC_TAGVAR(allow_undefined_flag, $1)=' -expect_unresolved \*' > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD > -shared${allow_undefined_flag} $libobjs $deplibs $linker_flags -msym > -soname $soname `test -n "$verstring" && echo -set_version $verstring` > -update_registry ${output_objdir}/so_locations -o $lib' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='for i in `cat > $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> > $lib.exp; done; echo "-hidden">> $lib.exp~ > + $LD -shared${allow_undefined_flag} -input $lib.exp $linker_flags > $libobjs $deplibs -soname $soname `test -n "$verstring" && echo > -set_version $verstring` -update_registry ${output_objdir}/so_locations > -o $lib~$rm $lib.exp' > + > + # Both c and cxx compiler support -rpath directly > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-rpath $libdir' > + fi > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=: > + ;; > + > + solaris*) > + _LT_AC_TAGVAR(no_undefined_flag, $1)=' -z text' > + if test "$GCC" = yes; then > + wlarc='${wl}' > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h > ${wl}$soname -o $lib $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo > "local: *; };" >> $lib.exp~ > + $CC -shared ${wl}-M ${wl}$lib.exp ${wl}-h ${wl}$soname -o $lib > $libobjs $deplibs $compiler_flags~$rm $lib.exp' > + else > + wlarc='' > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G${allow_undefined_flag} > -h $soname -o $lib $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$echo "{ global:" > > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~$echo > "local: *; };" >> $lib.exp~ > + $LD -G${allow_undefined_flag} -M $lib.exp -h $soname -o $lib > $libobjs $deplibs $linker_flags~$rm $lib.exp' > + fi > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-R$libdir' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + case $host_os in > + solaris2.[[0-5]] | solaris2.[[0-5]].*) ;; > + *) > + # The compiler driver will combine linker options so we > + # cannot just pass the convience library names through > + # without $wl, iff we do not link with $LD. > + # Luckily, gcc supports the same syntax we need for Sun Studio. > + # Supported since Solaris 2.6 (maybe 2.5.1?) > + case $wlarc in > + '') > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='-z > allextract$convenience -z defaultextract' ;; > + *) > + _LT_AC_TAGVAR(whole_archive_flag_spec, $1)='${wl}-z > ${wl}allextract`for conv in $convenience\"\"; do test -n \"$conv\" && > new_convenience=\"$new_convenience,$conv\"; done; $echo > \"$new_convenience\"` ${wl}-z ${wl}defaultextract' ;; > + esac ;; > + esac > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + ;; > + > + sunos4*) > + if test "x$host_vendor" = xsequent; then > + # Use $CC to link under sequent, because it throws in some extra > .o > + # files that make .init and .fini sections work. > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h $soname -o $lib > $libobjs $deplibs $compiler_flags' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -assert pure-text -Bstatic > -o $lib $libobjs $deplibs $linker_flags' > + fi > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes > + _LT_AC_TAGVAR(hardcode_minus_L, $1)=yes > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + sysv4) > + case $host_vendor in > + sni) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_direct, $1)=yes # is this really > true??? > + ;; > + siemens) > + ## LD is ld it makes a PLAMLIB > + ## CC just makes a GrossModule. > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -o $lib $libobjs > $deplibs $linker_flags' > + _LT_AC_TAGVAR(reload_cmds, $1)='$CC -r -o $output$reload_objs' > + _LT_AC_TAGVAR(hardcode_direct, $1)=no > + ;; > + motorola) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_direct, $1)=no #Motorola manual says > yes, but my tests say they lie > + ;; > + esac > + runpath_var='LD_RUN_PATH' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + sysv4.3*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='-Bexport' > + ;; > + > + sysv4*MP*) > + if test -d /usr/nec; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + runpath_var=LD_RUN_PATH > + hardcode_runpath_var=yes > + _LT_AC_TAGVAR(ld_shlibs, $1)=yes > + fi > + ;; > + > + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[[01]].[[10]]* | > unixware7*) > + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + runpath_var='LD_RUN_PATH' > + > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared ${wl}-h,$soname -o > $lib $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared > ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs > $compiler_flags' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G ${wl}-h,$soname -o $lib > $libobjs $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G > ${wl}-Bexport:$export_symbols ${wl}-h,$soname -o $lib $libobjs $deplibs > $compiler_flags' > + fi > + ;; > + > + sysv5* | sco3.2v5* | sco5v6*) > + # Note: We can NOT use -z defs as we might desire, because we do > not > + # link with -lc, and that would cause any symbols used from libc > to > + # always be unresolved, which means just about no library would > + # ever link correctly. If we're not using GNU ld we use -z text > + # though, which does catch some bad symbols but isn't as > heavy-handed > + # as -z defs. > + _LT_AC_TAGVAR(no_undefined_flag, $1)='${wl}-z,text' > + _LT_AC_TAGVAR(allow_undefined_flag, $1)='${wl}-z,nodefs' > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='`test -z > "$SCOABSPATH" && echo ${wl}-R,$libdir`' > + _LT_AC_TAGVAR(hardcode_libdir_separator, $1)=':' > + _LT_AC_TAGVAR(link_all_deplibs, $1)=yes > + _LT_AC_TAGVAR(export_dynamic_flag_spec, $1)='${wl}-Bexport' > + runpath_var='LD_RUN_PATH' > + > + if test "$GCC" = yes; then > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -shared > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -shared > ${wl}-Bexport:$export_symbols > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + else > + _LT_AC_TAGVAR(archive_cmds, $1)='$CC -G > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + _LT_AC_TAGVAR(archive_expsym_cmds, $1)='$CC -G > ${wl}-Bexport:$export_symbols > ${wl}-h,\${SCOABSPATH:+${install_libdir}/}$soname -o $lib $libobjs > $deplibs $compiler_flags' > + fi > + ;; > + > + uts4*) > + _LT_AC_TAGVAR(archive_cmds, $1)='$LD -G -h $soname -o $lib > $libobjs $deplibs $linker_flags' > + _LT_AC_TAGVAR(hardcode_libdir_flag_spec, $1)='-L$libdir' > + _LT_AC_TAGVAR(hardcode_shlibpath_var, $1)=no > + ;; > + > + *) > + _LT_AC_TAGVAR(ld_shlibs, $1)=no > + ;; > + esac > + fi > +]) > +AC_MSG_RESULT([$_LT_AC_TAGVAR(ld_shlibs, $1)]) > +test "$_LT_AC_TAGVAR(ld_shlibs, $1)" = no && can_build_shared=no > + > +# > +# Do we need to explicitly link libc? > +# > +case "x$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)" in > +x|xyes) > + # Assume -lc should be added > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes > + > + if test "$enable_shared" = yes && test "$GCC" = yes; then > + case $_LT_AC_TAGVAR(archive_cmds, $1) in > + *'~'*) > + # FIXME: we may have to deal with multi-command sequences. > + ;; > + '$CC '*) > + # Test whether the compiler implicitly links with -lc since on > some > + # systems, -lgcc has to come before -lc. If gcc already passes > -lc > + # to ld, don't add -lc before -lgcc. > + AC_MSG_CHECKING([whether -lc should be explicitly linked in]) > + $rm conftest* > + printf "$lt_simple_compile_test_code" > conftest.$ac_ext > + > + if AC_TRY_EVAL(ac_compile) 2>conftest.err; then > + soname=conftest > + lib=conftest > + libobjs=conftest.$ac_objext > + deplibs= > + wl=$_LT_AC_TAGVAR(lt_prog_compiler_wl, $1) > + pic_flag=$_LT_AC_TAGVAR(lt_prog_compiler_pic, $1) > + compiler_flags=-v > + linker_flags=-v > + verstring= > + output_objdir=. > + libname=conftest > + > lt_save_allow_undefined_flag=$_LT_AC_TAGVAR(allow_undefined_flag, $1) > + _LT_AC_TAGVAR(allow_undefined_flag, $1)= > + if AC_TRY_EVAL(_LT_AC_TAGVAR(archive_cmds, $1) 2\>\&1 \| grep > \" -lc \" \>/dev/null 2\>\&1) > + then > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=no > + else > + _LT_AC_TAGVAR(archive_cmds_need_lc, $1)=yes > + fi > + _LT_AC_TAGVAR(allow_undefined_flag, > $1)=$lt_save_allow_undefined_flag > + else > + cat conftest.err 1>&5 > + fi > + $rm conftest* > + AC_MSG_RESULT([$_LT_AC_TAGVAR(archive_cmds_need_lc, $1)]) > + ;; > + esac > + fi > + ;; > +esac > +])# AC_LIBTOOL_PROG_LD_SHLIBS > + > + > +# _LT_AC_FILE_LTDLL_C > +# ------------------- > +# Be careful that the start marker always follows a newline. > +AC_DEFUN([_LT_AC_FILE_LTDLL_C], [ > +# /* ltdll.c starts here */ > +# #define WIN32_LEAN_AND_MEAN > +# #include > +# #undef WIN32_LEAN_AND_MEAN > +# #include > +# > +# #ifndef __CYGWIN__ > +# # ifdef __CYGWIN32__ > +# # define __CYGWIN__ __CYGWIN32__ > +# # endif > +# #endif > +# > +# #ifdef __cplusplus > +# extern "C" { > +# #endif > +# BOOL APIENTRY DllMain (HINSTANCE hInst, DWORD reason, LPVOID > reserved); > +# #ifdef __cplusplus > +# } > +# #endif > +# > +# #ifdef __CYGWIN__ > +# #include > +# DECLARE_CYGWIN_DLL( DllMain ); > +# #endif > +# HINSTANCE __hDllInstance_base; > +# > +# BOOL APIENTRY > +# DllMain (HINSTANCE hInst, DWORD reason, LPVOID reserved) > +# { > +# __hDllInstance_base = hInst; > +# return TRUE; > +# } > +# /* ltdll.c ends here */ > +])# _LT_AC_FILE_LTDLL_C > + > + > +# _LT_AC_TAGVAR(VARNAME, [TAGNAME]) > +# --------------------------------- > +AC_DEFUN([_LT_AC_TAGVAR], [ifelse([$2], [], [$1], [$1_$2])]) > + > + > +# old names > +AC_DEFUN([AM_PROG_LIBTOOL], [AC_PROG_LIBTOOL]) > +AC_DEFUN([AM_ENABLE_SHARED], [AC_ENABLE_SHARED($@)]) > +AC_DEFUN([AM_ENABLE_STATIC], [AC_ENABLE_STATIC($@)]) > +AC_DEFUN([AM_DISABLE_SHARED], [AC_DISABLE_SHARED($@)]) > +AC_DEFUN([AM_DISABLE_STATIC], [AC_DISABLE_STATIC($@)]) > +AC_DEFUN([AM_PROG_LD], [AC_PROG_LD]) > +AC_DEFUN([AM_PROG_NM], [AC_PROG_NM]) > + > +# This is just to silence aclocal about the macro not being used > +ifelse([AC_DISABLE_FAST_INSTALL]) > + > +AC_DEFUN([LT_AC_PROG_GCJ], > +[AC_CHECK_TOOL(GCJ, gcj, no) > + test "x${GCJFLAGS+set}" = xset || GCJFLAGS="-g -O2" > + AC_SUBST(GCJFLAGS) > +]) > + > +AC_DEFUN([LT_AC_PROG_RC], > +[AC_CHECK_TOOL(RC, windres, no) > +]) > + > +# NOTE: This macro has been submitted for inclusion into # > +# GNU Autoconf as AC_PROG_SED. When it is available in # > +# a released version of Autoconf we should remove this # > +# macro and use it instead. # > +# LT_AC_PROG_SED > +# -------------- > +# Check for a fully-functional sed program, that truncates > +# as few characters as possible. Prefer GNU sed if found. > +AC_DEFUN([LT_AC_PROG_SED], > +[AC_MSG_CHECKING([for a sed that does not truncate output]) > +AC_CACHE_VAL(lt_cv_path_SED, > +[# Loop through the user's path and test for sed and gsed. > +# Then use that list of sed's as ones to test for truncation. > +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR > +for as_dir in $PATH > +do > + IFS=$as_save_IFS > + test -z "$as_dir" && as_dir=. > + for lt_ac_prog in sed gsed; do > + for ac_exec_ext in '' $ac_executable_extensions; do > + if $as_executable_p "$as_dir/$lt_ac_prog$ac_exec_ext"; then > + lt_ac_sed_list="$lt_ac_sed_list > $as_dir/$lt_ac_prog$ac_exec_ext" > + fi > + done > + done > +done > +lt_ac_max=0 > +lt_ac_count=0 > +# Add /usr/xpg4/bin/sed as it is typically found on Solaris > +# along with /bin/sed that truncates output. > +for lt_ac_sed in $lt_ac_sed_list /usr/xpg4/bin/sed; do > + test ! -f $lt_ac_sed && continue > + cat /dev/null > conftest.in > + lt_ac_count=0 > + echo $ECHO_N "0123456789$ECHO_C" >conftest.in > + # Check for GNU sed and select it if it is found. > + if "$lt_ac_sed" --version 2>&1 < /dev/null | grep 'GNU' > /dev/null; > then > + lt_cv_path_SED=$lt_ac_sed > + break > + fi > + while true; do > + cat conftest.in conftest.in >conftest.tmp > + mv conftest.tmp conftest.in > + cp conftest.in conftest.nl > + echo >>conftest.nl > + $lt_ac_sed -e 's/a$//' < conftest.nl >conftest.out || break > + cmp -s conftest.out conftest.nl || break > + # 10000 chars as input seems more than enough > + test $lt_ac_count -gt 10 && break > + lt_ac_count=`expr $lt_ac_count + 1` > + if test $lt_ac_count -gt $lt_ac_max; then > + lt_ac_max=$lt_ac_count > + lt_cv_path_SED=$lt_ac_sed > + fi > + done > +done > +]) > +SED=$lt_cv_path_SED > +AC_MSG_RESULT([$SED]) > +]) > + > +# Copyright (C) 2002, 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# AM_AUTOMAKE_VERSION(VERSION) > +# ---------------------------- > +# Automake X.Y traces this macro to ensure aclocal.m4 has been > +# generated from the m4 files accompanying Automake X.Y. > +AC_DEFUN([AM_AUTOMAKE_VERSION], [am__api_version="1.9"]) > + > +# AM_SET_CURRENT_AUTOMAKE_VERSION > +# ------------------------------- > +# Call AM_AUTOMAKE_VERSION so it can be traced. > +# This function is AC_REQUIREd by AC_INIT_AUTOMAKE. > +AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION], > + [AM_AUTOMAKE_VERSION([1.9.6])]) > + > +# AM_AUX_DIR_EXPAND -*- > Autoconf -*- > + > +# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# For projects using AC_CONFIG_AUX_DIR([foo]), Autoconf sets > +# $ac_aux_dir to `$srcdir/foo'. In other projects, it is set to > +# `$srcdir', `$srcdir/..', or `$srcdir/../..'. > +# > +# Of course, Automake must honor this variable whenever it calls a > +# tool from the auxiliary directory. The problem is that $srcdir (and > +# therefore $ac_aux_dir as well) can be either absolute or relative, > +# depending on how configure is run. This is pretty annoying, since > +# it makes $ac_aux_dir quite unusable in subdirectories: in the top > +# source directory, any form will work fine, but in subdirectories a > +# relative path needs to be adjusted first. > +# > +# $ac_aux_dir/missing > +# fails when called from a subdirectory if $ac_aux_dir is relative > +# $top_srcdir/$ac_aux_dir/missing > +# fails if $ac_aux_dir is absolute, > +# fails when called from a subdirectory in a VPATH build with > +# a relative $ac_aux_dir > +# > +# The reason of the latter failure is that $top_srcdir and $ac_aux_dir > +# are both prefixed by $srcdir. In an in-source build this is usually > +# harmless because $srcdir is `.', but things will broke when you > +# start a VPATH build or use an absolute $srcdir. > +# > +# So we could use something similar to $top_srcdir/$ac_aux_dir/missing, > +# iff we strip the leading $srcdir from $ac_aux_dir. That would be: > +# am_aux_dir='\$(top_srcdir)/'`expr "$ac_aux_dir" : > "$srcdir//*\(.*\)"` > +# and then we would define $MISSING as > +# MISSING="\${SHELL} $am_aux_dir/missing" > +# This will work as long as MISSING is not called from configure, > because > +# unfortunately $(top_srcdir) has no meaning in configure. > +# However there are other variables, like CC, which are often used in > +# configure, and could therefore not use this "fixed" $ac_aux_dir. > +# > +# Another solution, used here, is to always expand $ac_aux_dir to an > +# absolute PATH. The drawback is that using absolute paths prevent a > +# configured tree to be moved without reconfiguration. > + > +AC_DEFUN([AM_AUX_DIR_EXPAND], > +[dnl Rely on autoconf to set up CDPATH properly. > +AC_PREREQ([2.50])dnl > +# expand $ac_aux_dir to an absolute path > +am_aux_dir=`cd $ac_aux_dir && pwd` > +]) > + > +# AM_CONDITIONAL -*- > Autoconf -*- > + > +# Copyright (C) 1997, 2000, 2001, 2003, 2004, 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 7 > + > +# AM_CONDITIONAL(NAME, SHELL-CONDITION) > +# ------------------------------------- > +# Define a conditional. > +AC_DEFUN([AM_CONDITIONAL], > +[AC_PREREQ(2.52)dnl > + ifelse([$1], [TRUE], [AC_FATAL([$0: invalid condition: $1])], > + [$1], [FALSE], [AC_FATAL([$0: invalid condition: $1])])dnl > +AC_SUBST([$1_TRUE]) > +AC_SUBST([$1_FALSE]) > +if $2; then > + $1_TRUE= > + $1_FALSE='#' > +else > + $1_TRUE='#' > + $1_FALSE= > +fi > +AC_CONFIG_COMMANDS_PRE( > +[if test -z "${$1_TRUE}" && test -z "${$1_FALSE}"; then > + AC_MSG_ERROR([[conditional "$1" was never defined. > +Usually this means the macro was only invoked conditionally.]]) > +fi])]) > + > + > +# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 8 > + > +# There are a few dirty hacks below to avoid letting `AC_PROG_CC' be > +# written in clear, in which case automake, when reading aclocal.m4, > +# will think it sees a *use*, and therefore will trigger all it's > +# C support machinery. Also note that it means that autoscan, seeing > +# CC etc. in the Makefile, will ask for an AC_PROG_CC use... > + > + > +# _AM_DEPENDENCIES(NAME) > +# ---------------------- > +# See how the compiler implements dependency checking. > +# NAME is "CC", "CXX", "GCJ", or "OBJC". > +# We try a few techniques and use that to set a single cache variable. > +# > +# We don't AC_REQUIRE the corresponding AC_PROG_CC since the latter was > +# modified to invoke _AM_DEPENDENCIES(CC); we would have a circular > +# dependency, and given that the user is not expected to run this > macro, > +# just rely on AC_PROG_CC. > +AC_DEFUN([_AM_DEPENDENCIES], > +[AC_REQUIRE([AM_SET_DEPDIR])dnl > +AC_REQUIRE([AM_OUTPUT_DEPENDENCY_COMMANDS])dnl > +AC_REQUIRE([AM_MAKE_INCLUDE])dnl > +AC_REQUIRE([AM_DEP_TRACK])dnl > + > +ifelse([$1], CC, [depcc="$CC" am_compiler_list=], > + [$1], CXX, [depcc="$CXX" am_compiler_list=], > + [$1], OBJC, [depcc="$OBJC" am_compiler_list='gcc3 gcc'], > + [$1], GCJ, [depcc="$GCJ" am_compiler_list='gcc3 gcc'], > + [depcc="$$1" am_compiler_list=]) > + > +AC_CACHE_CHECK([dependency style of $depcc], > + [am_cv_$1_dependencies_compiler_type], > +[if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then > + # We make a subdir and do the tests there. Otherwise we can end up > + # making bogus files that we don't know about and never remove. For > + # instance it was reported that on HP-UX the gcc test will end up > + # making a dummy file named `D' -- because `-MD' means `put the > output > + # in D'. > + mkdir conftest.dir > + # Copy depcomp to subdir because otherwise we won't find it if we're > + # using a relative directory. > + cp "$am_depcomp" conftest.dir > + cd conftest.dir > + # We will build objects and dependencies in a subdirectory because > + # it helps to detect inapplicable dependency modes. For instance > + # both Tru64's cc and ICC support -MD to output dependencies as a > + # side effect of compilation, but ICC will put the dependencies in > + # the current directory while Tru64 will put them in the object > + # directory. > + mkdir sub > + > + am_cv_$1_dependencies_compiler_type=none > + if test "$am_compiler_list" = ""; then > + am_compiler_list=`sed -n ['s/^#*\([a-zA-Z0-9]*\))$/\1/p'] < > ./depcomp` > + fi > + for depmode in $am_compiler_list; do > + # Setup a source with many dependencies, because some compilers > + # like to wrap large dependency lists on column 80 (with \), and > + # we should not choose a depcomp mode which is confused by this. > + # > + # We need to recreate these files for each test, as the compiler > may > + # overwrite some of them when testing with obscure command lines. > + # This happens at least with the AIX C compiler. > + : > sub/conftest.c > + for i in 1 2 3 4 5 6; do > + echo '#include "conftst'$i'.h"' >> sub/conftest.c > + # Using `: > sub/conftst$i.h' creates only sub/conftst1.h with > + # Solaris 8's {/usr,}/bin/sh. > + touch sub/conftst$i.h > + done > + echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > > confmf > + > + case $depmode in > + nosideeffect) > + # after this tag, mechanisms are not by side-effect, so they'll > + # only be used when explicitly requested > + if test "x$enable_dependency_tracking" = xyes; then > + continue > + else > + break > + fi > + ;; > + none) break ;; > + esac > + # We check with `-c' and `-o' for the sake of the "dashmstdout" > + # mode. It turns out that the SunPro C++ compiler does not > properly > + # handle `-M -o', and we need to detect this. > + if depmode=$depmode \ > + source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \ > + depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ > + $SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} > sub/conftest.c \ > + >/dev/null 2>conftest.err && > + grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && > + grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 > && > + ${MAKE-make} -s -f confmf > /dev/null 2>&1; then > + # icc doesn't choke on unknown options, it will just issue > warnings > + # or remarks (even with -Werror). So we grep stderr for any > message > + # that says an option was ignored or not supported. > + # When given -MP, icc 7.0 and 7.1 complain thusly: > + # icc: Command line warning: ignoring option '-M'; no argument > required > + # The diagnosis changed in icc 8.0: > + # icc: Command line remark: option '-MP' not supported > + if (grep 'ignoring option' conftest.err || > + grep 'not supported' conftest.err) >/dev/null 2>&1; then :; > else > + am_cv_$1_dependencies_compiler_type=$depmode > + break > + fi > + fi > + done > + > + cd .. > + rm -rf conftest.dir > +else > + am_cv_$1_dependencies_compiler_type=none > +fi > +]) > +AC_SUBST([$1DEPMODE], [depmode=$am_cv_$1_dependencies_compiler_type]) > +AM_CONDITIONAL([am__fastdep$1], [ > + test "x$enable_dependency_tracking" != xno \ > + && test "$am_cv_$1_dependencies_compiler_type" = gcc3]) > +]) > + > + > +# AM_SET_DEPDIR > +# ------------- > +# Choose a directory name for dependency files. > +# This macro is AC_REQUIREd in _AM_DEPENDENCIES > +AC_DEFUN([AM_SET_DEPDIR], > +[AC_REQUIRE([AM_SET_LEADING_DOT])dnl > +AC_SUBST([DEPDIR], ["${am__leading_dot}deps"])dnl > +]) > + > + > +# AM_DEP_TRACK > +# ------------ > +AC_DEFUN([AM_DEP_TRACK], > +[AC_ARG_ENABLE(dependency-tracking, > +[ --disable-dependency-tracking speeds up one-time build > + --enable-dependency-tracking do not reject slow dependency > extractors]) > +if test "x$enable_dependency_tracking" != xno; then > + am_depcomp="$ac_aux_dir/depcomp" > + AMDEPBACKSLASH='\' > +fi > +AM_CONDITIONAL([AMDEP], [test "x$enable_dependency_tracking" != xno]) > +AC_SUBST([AMDEPBACKSLASH]) > +]) > + > +# Generate code to set up dependency tracking. -*- > Autoconf -*- > + > +# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +#serial 3 > + > +# _AM_OUTPUT_DEPENDENCY_COMMANDS > +# ------------------------------ > +AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS], > +[for mf in $CONFIG_FILES; do > + # Strip MF so we end up with the name of the file. > + mf=`echo "$mf" | sed -e 's/:.*$//'` > + # Check whether this is an Automake generated Makefile or not. > + # We used to match only the files named `Makefile.in', but > + # some people rename them; so instead we look at the file content. > + # Grep'ing the first line is not enough: some people post-process > + # each Makefile.in and add a new line on top of each file to say so. > + # So let's grep whole file. > + if grep '^#.*generated by automake' $mf > /dev/null 2>&1; then > + dirpart=`AS_DIRNAME("$mf")` > + else > + continue > + fi > + # Extract the definition of DEPDIR, am__include, and am__quote > + # from the Makefile without running `make'. > + DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"` > + test -z "$DEPDIR" && continue > + am__include=`sed -n 's/^am__include = //p' < "$mf"` > + test -z "am__include" && continue > + am__quote=`sed -n 's/^am__quote = //p' < "$mf"` > + # When using ansi2knr, U may be empty or an underscore; expand it > + U=`sed -n 's/^U = //p' < "$mf"` > + # Find all dependency output files, they are included files with > + # $(DEPDIR) in their names. We invoke sed twice because it is the > + # simplest approach to changing $(DEPDIR) to its actual value in the > + # expansion. > + for file in `sed -n " > + s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' > <"$mf" | \ > + sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g' -e 's/\$U/'"$U"'/g'`; do > + # Make sure the directory exists. > + test -f "$dirpart/$file" && continue > + fdir=`AS_DIRNAME(["$file"])` > + AS_MKDIR_P([$dirpart/$fdir]) > + # echo "creating $dirpart/$file" > + echo '# dummy' > "$dirpart/$file" > + done > +done > +])# _AM_OUTPUT_DEPENDENCY_COMMANDS > + > + > +# AM_OUTPUT_DEPENDENCY_COMMANDS > +# ----------------------------- > +# This macro should only be invoked once -- use via AC_REQUIRE. > +# > +# This code is only required when automatic dependency tracking > +# is enabled. FIXME. This creates each `.P' file that we will > +# need in order to bootstrap the dependency handling code. > +AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS], > +[AC_CONFIG_COMMANDS([depfiles], > + [test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS], > + [AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"]) > +]) > + > +# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 8 > + > +# AM_CONFIG_HEADER is obsolete. It has been replaced by > AC_CONFIG_HEADERS. > +AU_DEFUN([AM_CONFIG_HEADER], [AC_CONFIG_HEADERS($@)]) > + > +# Do all the work for Automake. -*- > Autoconf -*- > + > +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, > 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 12 > + > +# This macro actually does too much. Some checks are only needed if > +# your package does certain things. But this isn't really a big deal. > + > +# AM_INIT_AUTOMAKE(PACKAGE, VERSION, [NO-DEFINE]) > +# AM_INIT_AUTOMAKE([OPTIONS]) > +# ----------------------------------------------- > +# The call with PACKAGE and VERSION arguments is the old style > +# call (pre autoconf-2.50), which is being phased out. PACKAGE > +# and VERSION should now be passed to AC_INIT and removed from > +# the call to AM_INIT_AUTOMAKE. > +# We support both call styles for the transition. After > +# the next Automake release, Autoconf can make the AC_INIT > +# arguments mandatory, and then we can depend on a new Autoconf > +# release and drop the old call support. > +AC_DEFUN([AM_INIT_AUTOMAKE], > +[AC_PREREQ([2.58])dnl > +dnl Autoconf wants to disallow AM_ names. We explicitly allow > +dnl the ones we care about. > +m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl > +AC_REQUIRE([AM_SET_CURRENT_AUTOMAKE_VERSION])dnl > +AC_REQUIRE([AC_PROG_INSTALL])dnl > +# test to see if srcdir already configured > +if test "`cd $srcdir && pwd`" != "`pwd`" && > + test -f $srcdir/config.status; then > + AC_MSG_ERROR([source directory already configured; run "make > distclean" there first]) > +fi > + > +# test whether we have cygpath > +if test -z "$CYGPATH_W"; then > + if (cygpath --version) >/dev/null 2>/dev/null; then > + CYGPATH_W='cygpath -w' > + else > + CYGPATH_W=echo > + fi > +fi > +AC_SUBST([CYGPATH_W]) > + > +# Define the identity of the package. > +dnl Distinguish between old-style and new-style calls. > +m4_ifval([$2], > +[m4_ifval([$3], [_AM_SET_OPTION([no-define])])dnl > + AC_SUBST([PACKAGE], [$1])dnl > + AC_SUBST([VERSION], [$2])], > +[_AM_SET_OPTIONS([$1])dnl > + AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl > + AC_SUBST([VERSION], ['AC_PACKAGE_VERSION'])])dnl > + > +_AM_IF_OPTION([no-define],, > +[AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE", [Name of package]) > + AC_DEFINE_UNQUOTED(VERSION, "$VERSION", [Version number of > package])])dnl > + > +# Some tools Automake needs. > +AC_REQUIRE([AM_SANITY_CHECK])dnl > +AC_REQUIRE([AC_ARG_PROGRAM])dnl > +AM_MISSING_PROG(ACLOCAL, aclocal-${am__api_version}) > +AM_MISSING_PROG(AUTOCONF, autoconf) > +AM_MISSING_PROG(AUTOMAKE, automake-${am__api_version}) > +AM_MISSING_PROG(AUTOHEADER, autoheader) > +AM_MISSING_PROG(MAKEINFO, makeinfo) > +AM_PROG_INSTALL_SH > +AM_PROG_INSTALL_STRIP > +AC_REQUIRE([AM_PROG_MKDIR_P])dnl > +# We need awk for the "check" target. The system "awk" is bad on > +# some platforms. > +AC_REQUIRE([AC_PROG_AWK])dnl > +AC_REQUIRE([AC_PROG_MAKE_SET])dnl > +AC_REQUIRE([AM_SET_LEADING_DOT])dnl > +_AM_IF_OPTION([tar-ustar], [_AM_PROG_TAR([ustar])], > + [_AM_IF_OPTION([tar-pax], [_AM_PROG_TAR([pax])], > + [_AM_PROG_TAR([v7])])]) > +_AM_IF_OPTION([no-dependencies],, > +[AC_PROVIDE_IFELSE([AC_PROG_CC], > + [_AM_DEPENDENCIES(CC)], > + [define([AC_PROG_CC], > + > defn([AC_PROG_CC])[_AM_DEPENDENCIES(CC)])])dnl > +AC_PROVIDE_IFELSE([AC_PROG_CXX], > + [_AM_DEPENDENCIES(CXX)], > + [define([AC_PROG_CXX], > + > defn([AC_PROG_CXX])[_AM_DEPENDENCIES(CXX)])])dnl > +]) > +]) > + > + > +# When config.status generates a header, we must update the stamp-h > file. > +# This file resides in the same directory as the config header > +# that is generated. The stamp files are numbered to have different > names. > + > +# Autoconf calls _AC_AM_CONFIG_HEADER_HOOK (when defined) in the > +# loop where config.status creates the headers, so we can generate > +# our stamp files there. > +AC_DEFUN([_AC_AM_CONFIG_HEADER_HOOK], > +[# Compute $1's index in $config_headers. > +_am_stamp_count=1 > +for _am_header in $config_headers :; do > + case $_am_header in > + $1 | $1:* ) > + break ;; > + * ) > + _am_stamp_count=`expr $_am_stamp_count + 1` ;; > + esac > +done > +echo "timestamp for $1" >`AS_DIRNAME([$1])`/stamp-h[]$_am_stamp_count]) > + > +# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# AM_PROG_INSTALL_SH > +# ------------------ > +# Define $install_sh. > +AC_DEFUN([AM_PROG_INSTALL_SH], > +[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl > +install_sh=${install_sh-"$am_aux_dir/install-sh"} > +AC_SUBST(install_sh)]) > + > +# Copyright (C) 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 2 > + > +# Check whether the underlying file-system supports filenames > +# with a leading dot. For instance MS-DOS doesn't. > +AC_DEFUN([AM_SET_LEADING_DOT], > +[rm -rf .tst 2>/dev/null > +mkdir .tst 2>/dev/null > +if test -d .tst; then > + am__leading_dot=. > +else > + am__leading_dot=_ > +fi > +rmdir .tst 2>/dev/null > +AC_SUBST([am__leading_dot])]) > + > +# Check to see how 'make' treats includes. -*- Autoconf > -*- > + > +# Copyright (C) 2001, 2002, 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 3 > + > +# AM_MAKE_INCLUDE() > +# ----------------- > +# Check to see how make treats includes. > +AC_DEFUN([AM_MAKE_INCLUDE], > +[am_make=${MAKE-make} > +cat > confinc << 'END' > +am__doit: > + @echo done > +.PHONY: am__doit > +END > +# If we don't find an include directive, just comment out the code. > +AC_MSG_CHECKING([for style of include used by $am_make]) > +am__include="#" > +am__quote= > +_am_result=none > +# First try GNU make style include. > +echo "include confinc" > confmf > +# We grep out `Entering directory' and `Leaving directory' > +# messages which can occur if `w' ends up in MAKEFLAGS. > +# In particular we don't look at `^make:' because GNU make might > +# be invoked under some other name (usually "gmake"), in which > +# case it prints its new name instead of `make'. > +if test "`$am_make -s -f confmf 2> /dev/null | grep -v 'ing > directory'`" = "done"; then > + am__include=include > + am__quote= > + _am_result=GNU > +fi > +# Now try BSD make style include. > +if test "$am__include" = "#"; then > + echo '.include "confinc"' > confmf > + if test "`$am_make -s -f confmf 2> /dev/null`" = "done"; then > + am__include=.include > + am__quote="\"" > + _am_result=BSD > + fi > +fi > +AC_SUBST([am__include]) > +AC_SUBST([am__quote]) > +AC_MSG_RESULT([$_am_result]) > +rm -f confinc confmf > +]) > + > +# Fake the existence of programs that GNU maintainers use. -*- > Autoconf -*- > + > +# Copyright (C) 1997, 1999, 2000, 2001, 2003, 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 4 > + > +# AM_MISSING_PROG(NAME, PROGRAM) > +# ------------------------------ > +AC_DEFUN([AM_MISSING_PROG], > +[AC_REQUIRE([AM_MISSING_HAS_RUN]) > +$1=${$1-"${am_missing_run}$2"} > +AC_SUBST($1)]) > + > + > +# AM_MISSING_HAS_RUN > +# ------------------ > +# Define MISSING if not defined so far and test if it supports --run. > +# If it does, set am_missing_run to use it, otherwise, to nothing. > +AC_DEFUN([AM_MISSING_HAS_RUN], > +[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl > +test x"${MISSING+set}" = xset || MISSING="\${SHELL} > $am_aux_dir/missing" > +# Use eval to expand $SHELL > +if eval "$MISSING --run true"; then > + am_missing_run="$MISSING --run " > +else > + am_missing_run= > + AC_MSG_WARN([`missing' script is too old or missing]) > +fi > +]) > + > +# Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# AM_PROG_MKDIR_P > +# --------------- > +# Check whether `mkdir -p' is supported, fallback to mkinstalldirs > otherwise. > +# > +# Automake 1.8 used `mkdir -m 0755 -p --' to ensure that directories > +# created by `make install' are always world readable, even if the > +# installer happens to have an overly restrictive umask (e.g. 077). > +# This was a mistake. There are at least two reasons why we must not > +# use `-m 0755': > +# - it causes special bits like SGID to be ignored, > +# - it may be too restrictive (some setups expect 775 directories). > +# > +# Do not use -m 0755 and let people choose whatever they expect by > +# setting umask. > +# > +# We cannot accept any implementation of `mkdir' that recognizes `-p'. > +# Some implementations (such as Solaris 8's) are not thread-safe: if a > +# parallel make tries to run `mkdir -p a/b' and `mkdir -p a/c' > +# concurrently, both version can detect that a/ is missing, but only > +# one can create it and the other will error out. Consequently we > +# restrict ourselves to GNU make (using the --version option ensures > +# this.) > +AC_DEFUN([AM_PROG_MKDIR_P], > +[if mkdir -p --version . >/dev/null 2>&1 && test ! -d ./--version; then > + # We used to keeping the `.' as first argument, in order to > + # allow $(mkdir_p) to be used without argument. As in > + # $(mkdir_p) $(somedir) > + # where $(somedir) is conditionally defined. However this is wrong > + # for two reasons: > + # 1. if the package is installed by a user who cannot write `.' > + # make install will fail, > + # 2. the above comment should most certainly read > + # $(mkdir_p) $(DESTDIR)$(somedir) > + # so it does not work when $(somedir) is undefined and > + # $(DESTDIR) is not. > + # To support the latter case, we have to write > + # test -z "$(somedir)" || $(mkdir_p) $(DESTDIR)$(somedir), > + # so the `.' trick is pointless. > + mkdir_p='mkdir -p --' > +else > + # On NextStep and OpenStep, the `mkdir' command does not > + # recognize any option. It will interpret all options as > + # directories to create, and then abort because `.' already > + # exists. > + for d in ./-p ./--version; > + do > + test -d $d && rmdir $d > + done > + # $(mkinstalldirs) is defined by Automake if mkinstalldirs exists. > + if test -f "$ac_aux_dir/mkinstalldirs"; then > + mkdir_p='$(mkinstalldirs)' > + else > + mkdir_p='$(install_sh) -d' > + fi > +fi > +AC_SUBST([mkdir_p])]) > + > +# Helper functions for option handling. -*- > Autoconf -*- > + > +# Copyright (C) 2001, 2002, 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 3 > + > +# _AM_MANGLE_OPTION(NAME) > +# ----------------------- > +AC_DEFUN([_AM_MANGLE_OPTION], > +[[_AM_OPTION_]m4_bpatsubst($1, [[^a-zA-Z0-9_]], [_])]) > + > +# _AM_SET_OPTION(NAME) > +# ------------------------------ > +# Set option NAME. Presently that only means defining a flag for this > option. > +AC_DEFUN([_AM_SET_OPTION], > +[m4_define(_AM_MANGLE_OPTION([$1]), 1)]) > + > +# _AM_SET_OPTIONS(OPTIONS) > +# ---------------------------------- > +# OPTIONS is a space-separated list of Automake options. > +AC_DEFUN([_AM_SET_OPTIONS], > +[AC_FOREACH([_AM_Option], [$1], [_AM_SET_OPTION(_AM_Option)])]) > + > +# _AM_IF_OPTION(OPTION, IF-SET, [IF-NOT-SET]) > +# ------------------------------------------- > +# Execute IF-SET if OPTION is set, IF-NOT-SET otherwise. > +AC_DEFUN([_AM_IF_OPTION], > +[m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])]) > + > +# Check to make sure that the build environment is sane. -*- > Autoconf -*- > + > +# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005 > +# Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 4 > + > +# AM_SANITY_CHECK > +# --------------- > +AC_DEFUN([AM_SANITY_CHECK], > +[AC_MSG_CHECKING([whether build environment is sane]) > +# Just in case > +sleep 1 > +echo timestamp > conftest.file > +# Do `set' in a subshell so we don't clobber the current shell's > +# arguments. Must try -L first in case configure is actually a > +# symlink; some systems play weird games with the mod time of symlinks > +# (eg FreeBSD returns the mod time of the symlink's containing > +# directory). > +if ( > + set X `ls -Lt $srcdir/configure conftest.file 2> /dev/null` > + if test "$[*]" = "X"; then > + # -L didn't work. > + set X `ls -t $srcdir/configure conftest.file` > + fi > + rm -f conftest.file > + if test "$[*]" != "X $srcdir/configure conftest.file" \ > + && test "$[*]" != "X conftest.file $srcdir/configure"; then > + > + # If neither matched, then we have a broken ls. This can happen > + # if, for instance, CONFIG_SHELL is bash and it inherits a > + # broken ls alias from the environment. This has actually > + # happened. Such a system could not be considered "sane". > + AC_MSG_ERROR([ls -t appears to fail. Make sure there is not a > broken > +alias in your environment]) > + fi > + > + test "$[2]" = conftest.file > + ) > +then > + # Ok. > + : > +else > + AC_MSG_ERROR([newly created file is older than distributed files! > +Check your system clock]) > +fi > +AC_MSG_RESULT(yes)]) > + > +# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# AM_PROG_INSTALL_STRIP > +# --------------------- > +# One issue with vendor `install' (even GNU) is that you can't > +# specify the program used to strip binaries. This is especially > +# annoying in cross-compiling environments, where the build's strip > +# is unlikely to handle the host's binaries. > +# Fortunately install-sh will honor a STRIPPROG variable, so we > +# always use install-sh in `make install-strip', and initialize > +# STRIPPROG with the value of the STRIP variable (set by the user). > +AC_DEFUN([AM_PROG_INSTALL_STRIP], > +[AC_REQUIRE([AM_PROG_INSTALL_SH])dnl > +# Installed binaries are usually stripped using `strip' when the user > +# run `make install-strip'. However `strip' might not be the right > +# tool to use in cross-compilation environments, therefore Automake > +# will honor the `STRIP' environment variable to overrule this program. > +dnl Don't test for $cross_compiling = yes, because it might be `maybe'. > +if test "$cross_compiling" != no; then > + AC_CHECK_TOOL([STRIP], [strip], :) > +fi > +INSTALL_STRIP_PROGRAM="\${SHELL} \$(install_sh) -c -s" > +AC_SUBST([INSTALL_STRIP_PROGRAM])]) > + > +# Check how to create a tarball. -*- > Autoconf -*- > + > +# Copyright (C) 2004, 2005 Free Software Foundation, Inc. > +# > +# This file is free software; the Free Software Foundation > +# gives unlimited permission to copy and/or distribute it, > +# with or without modifications, as long as this notice is preserved. > + > +# serial 2 > + > +# _AM_PROG_TAR(FORMAT) > +# -------------------- > +# Check how to create a tarball in format FORMAT. > +# FORMAT should be one of `v7', `ustar', or `pax'. > +# > +# Substitute a variable $(am__tar) that is a command > +# writing to stdout a FORMAT-tarball containing the directory > +# $tardir. > +# tardir=directory && $(am__tar) > result.tar > +# > +# Substitute a variable $(am__untar) that extract such > +# a tarball read from stdin. > +# $(am__untar) < result.tar > +AC_DEFUN([_AM_PROG_TAR], > +[# Always define AMTAR for backward compatibility. > +AM_MISSING_PROG([AMTAR], [tar]) > +m4_if([$1], [v7], > + [am__tar='${AMTAR} chof - "$$tardir"'; am__untar='${AMTAR} xf -'], > + [m4_case([$1], [ustar],, [pax],, > + [m4_fatal([Unknown tar format])]) > +AC_MSG_CHECKING([how to create a $1 tar archive]) > +# Loop over all known methods to create a tar archive until one works. > +_am_tools='gnutar m4_if([$1], [ustar], [plaintar]) pax cpio none' > +_am_tools=${am_cv_prog_tar_$1-$_am_tools} > +# Do not fold the above two line into one, because Tru64 sh and > +# Solaris sh will not grok spaces in the rhs of `-'. > +for _am_tool in $_am_tools > +do > + case $_am_tool in > + gnutar) > + for _am_tar in tar gnutar gtar; > + do > + AM_RUN_LOG([$_am_tar --version]) && break > + done > + am__tar="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - > "'"$$tardir"' > + am__tar_="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf > - "'"$tardir"' > + am__untar="$_am_tar -xf -" > + ;; > + plaintar) > + # Must skip GNU tar: if it does not support --format= it doesn't > create > + # ustar tarball either. > + (tar --version) >/dev/null 2>&1 && continue > + am__tar='tar chf - "$$tardir"' > + am__tar_='tar chf - "$tardir"' > + am__untar='tar xf -' > + ;; > + pax) > + am__tar='pax -L -x $1 -w "$$tardir"' > + am__tar_='pax -L -x $1 -w "$tardir"' > + am__untar='pax -r' > + ;; > + cpio) > + am__tar='find "$$tardir" -print | cpio -o -H $1 -L' > + am__tar_='find "$tardir" -print | cpio -o -H $1 -L' > + am__untar='cpio -i -H $1 -d' > + ;; > + none) > + am__tar=false > + am__tar_=false > + am__untar=false > + ;; > + esac > + > + # If the value was cached, stop now. We just wanted to have am__tar > + # and am__untar set. > + test -n "${am_cv_prog_tar_$1}" && break > + > + # tar/untar a dummy directory, and stop if the command works > + rm -rf conftest.dir > + mkdir conftest.dir > + echo GrepMe > conftest.dir/file > + AM_RUN_LOG([tardir=conftest.dir && eval $am__tar_ >conftest.tar]) > + rm -rf conftest.dir > + if test -s conftest.tar; then > + AM_RUN_LOG([$am__untar + grep GrepMe conftest.dir/file >/dev/null 2>&1 && break > + fi > +done > +rm -rf conftest.dir > + > +AC_CACHE_VAL([am_cv_prog_tar_$1], [am_cv_prog_tar_$1=$_am_tool]) > +AC_MSG_RESULT([$am_cv_prog_tar_$1])]) > +AC_SUBST([am__tar]) > +AC_SUBST([am__untar]) > +]) # _AM_PROG_TAR > + > diff -ruNp old/src/userspace/libnes/autogen.sh > new/src/userspace/libnes/autogen.sh > --- old/src/userspace/libnes/autogen.sh 1969-12-31 18:00:00.000000000 > -0600 > +++ new/src/userspace/libnes/autogen.sh 2006-10-23 14:41:36.000000000 > -0500 > @@ -0,0 +1,8 @@ > +#! /bin/sh > + > +set -x > +aclocal -I config > +libtoolize --force --copy > +autoheader > +automake --foreign --add-missing --copy > +autoconf > diff -ruNp old/src/userspace/libnes/configure.in > new/src/userspace/libnes/configure.in > --- old/src/userspace/libnes/configure.in 1969-12-31 > 18:00:00.000000000 -0600 > +++ new/src/userspace/libnes/configure.in 2006-10-25 > 11:11:16.000000000 -0500 > @@ -0,0 +1,39 @@ > +dnl Process this file with autoconf to produce a configure script. > + > +AC_PREREQ(2.57) > +AC_INIT(libnes, 0.1, openib-general at openib.org) > +AC_CONFIG_SRCDIR([src/nes_umain.h]) > +AC_CONFIG_AUX_DIR(config) > +AM_CONFIG_HEADER(config.h) > +AM_INIT_AUTOMAKE(libnes, 0.1) > +AM_PROG_LIBTOOL > + > +dnl Checks for programs > +AC_PROG_CC > + > +dnl Checks for libraries > + > +dnl Checks for header files. > +AC_CHECK_HEADERS(sysfs/libsysfs.h) > +AC_CHECK_HEADER(infiniband/driver.h, [], > + AC_MSG_ERROR([ not found. Is libibverbs > installed?])) > +AC_HEADER_STDC > + > +dnl Checks for typedefs, structures, and compiler characteristics. > +AC_C_CONST > +AC_CHECK_SIZEOF(long) > + > +dnl Checks for library functions > +AC_CHECK_FUNCS(ibv_read_sysfs_file) > + > +AC_CACHE_CHECK(whether ld accepts --version-script, > ac_cv_version_script, > + if test -n "`$LD --help < /dev/null 2>/dev/null | grep > version-script`"; then > + ac_cv_version_script=yes > + else > + ac_cv_version_script=no > + fi) > + > +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = > "yes") > + > +AC_CONFIG_FILES([Makefile libnes.spec]) > +AC_OUTPUT > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jlentini at netapp.com Fri Oct 27 07:39:15 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 27 Oct 2006 10:39:15 -0400 (EDT) Subject: [openib-general] [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EAF@venom2> Message-ID: On Thu, 26 Oct 2006, Glenn Grundstrom wrote: > diff -ruNp old/drivers/infiniband/hw/nes/Makefile > new/drivers/infiniband/hw/nes/Makefile > --- old/drivers/infiniband/hw/nes/Makefile 1969-12-31 > 18:00:00.000000000 -0600 > +++ new/drivers/infiniband/hw/nes/Makefile 2006-10-25 > 11:10:26.000000000 -0500 > @@ -0,0 +1,27 @@ > +EXTRA_CFLAGS += -Idrivers/infiniband/include > -Idrivers/infiniband/hw/nes/nes_tcpip/include > + > +ifdef CONFIG_INFINIBAND_NES_DEBUG > +EXTRA_CFLAGS += -DNES_DEBUG > +endif The NES_DEBUG flag is unnecessary. You can check for CONFIG_INFINIBAND_NES_DEBUG in the code. See CONFIG_INFINIBAND_MTHCA_DEBUG for an example. > + > +ifneq ($(KERNELRELEASE),) > + obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o > + > + iw_nes-objs := \ > + nes.o \ > + nes_hw.o \ > + nes_nic.o \ > + nes_cm.o \ > + nes_utils.o \ > + nes_verbs.o > +else > + KERNELDIR ?= /usr/src/linux > + PWD := $(shell pwd) > + > +default: > + $(MAKE) -C $(KERNELDIR) M=$(PWD) modules > + > +clean: > + $(MAKE) -C $(KERNELDIR) M=$(PWD) clean > + > +endif In tree drivers don't provide support for out-of-tree builds. See drivers/infiniband/hw/mthca/Makefile for an example of how to simplify this. From swise at opengridcomputing.com Fri Oct 27 07:43:18 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 09:43:18 -0500 Subject: [openib-general] problem with 2.6.19? In-Reply-To: References: <1161901218.4280.55.camel@stevo-desktop> Message-ID: <1161960198.14333.16.camel@stevo-desktop> On Thu, 2006-10-26 at 15:26 -0700, Roland Dreier wrote: > Steve> The adapter seems to be dma'ing into the wrong memory. The > Steve> patch below backs the usage of dma_map_single() back to > Steve> using __pa() for converting kernel virtual addresses (from > Steve> kmalloc) into bus addresses, and things work ok. > > Hmm. It might be interesting to hack the driver to print the result > of both dma_map_single() and __pa() and see if they're different. > Here's a dump. They are different. The __pa()'s are what I expect. I dunno if the dma_addr's returned from dma_map_single() look good or not (they certainly don't work :) c2_alloc_mqsp_chunk mqsp_chunk va ffff810148b1b000 dma_addr 3a56000 __pa 148b1b000 c2_alloc_mqsp addr ffff810148b1b01a dma_addr 3a5601a c2_alloc_mqsp addr ffff810148b1b01c dma_addr 3a5601c c2_alloc_mqsp addr ffff810148b1b01e dma_addr 3a5601e c2_alloc_mqsp addr ffff810148b1b020 dma_addr 3a56020 c2_rnic_init rep_vq va ffff810147e78000 dma 3a57000 __pa 147e78000 c2_rnic_init aeq va ffff810147d48000 dma 3a5f000 __pa 147d48000 > Are you running on a 32-bit (i386) or 64-bit (x86_64) kernel? How > much RAM do you have? 64b/X86_64. 4GB RAM. The CPUs are Dempsey class XEONs - Dual CPU, Dual core. So with HT on linux sees 8 CPUs. > Is the kernel using swiotlb? If so then you > need to make sure your DMA_{TO,FROM} directions and dma_unmap calls > are right, since otherwise the DMAed data won't be copied to/from the > bounce buffer at the right time. All these mappings are for the device to DMA into the host memory, and I'm using DMA_FROM_DEVICE in my calls to dma_map_single(). How do I know if the kernel is using swiotlb? > Another thing to do if you're patient would be to use git-bisect and > figure out exactly which patch made amso1100 stop working. > I added these calls as part of the review for submission into the kernel, and I originally tested them on dual CPU opteron systems with 1GB of memory. But maybe they weren't using the IOMMU? Dunno. From jlentini at netapp.com Fri Oct 27 07:42:39 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 27 Oct 2006 10:42:39 -0400 (EDT) Subject: [openib-general] uDAPL in OFED 1.1 In-Reply-To: References: Message-ID: You can obtain the sources for dapltest from the OFA svn repository. Let me know if you need help building them. Unfortuantely, it's unlikely that dapltest will shed any light on your problem. Are the OFA userspace verbs are installed and working properly on your new system? On Thu, 26 Oct 2006, Scott Weitzenkamp (sweitzen) wrote: > AFAIK dapltest was never part of OFED 1.0, at least it never got built > in the RPMs. > > Scott > > > ________________________________ > > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of SoBeBike > Sent: Thursday, October 26, 2006 3:58 PM > To: openib-general at openib.org > Subject: [openib-general] uDAPL in OFED 1.1 > > > Just installed OFED 1.1 on SLES 10 (2.6.16.21-0.8-smp) x86_64. I > have existing uDAPL code which runs fine on OFED 1.0 (SLES 9). It does > not work on OFED 1.1 SLES 10. dat_evd_create fails with > DAT_INSUFFICIENT_RESOURCES even if I only create a single EVD. In order > to have a common test case, I attempted to run dapltest, but it does not > appear to be part of the OFED 1.1 package. > > Was dapltest removed from the OFED package? If so, why? Is there > some other common test in the OFED package that I should run to validate > basic uDAPL functionality? > > thanks. > > From jlentini at netapp.com Fri Oct 27 07:50:54 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 27 Oct 2006 10:50:54 -0400 (EDT) Subject: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17 In-Reply-To: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> References: <20061015210330.XSMZ22191.rrcs-fep-12.hrndva.rr.com@telerio44fea95> Message-ID: Bill, In addition to the OFA registration, do we need to register for SC06 to attend the summit? james On Sun, 15 Oct 2006, Bill Boas wrote: > To all in the OpenFabrics Community > > > > We will be holding our first Developer Summit in the Tampa Convention Center > courtesy of SC06 starting at 1.30PM in Room 17 on Thursday November 16, > 2006. On Friday November 17, we will start in Room 13 at 8.00 AM and > continue till 5.00PM. We have had to schedule into these time slots because > no other usable space is available at any other times during the week of > SC06! > > > > OpenFabrics will cater food and beverages for afternoon break and supper on > Thursday, breakfast, lunch and two breaks on Friday. We will set up a > registration site at Acteva to collect $$ to cover our out of pocket > expenses ďż˝ Iďż˝ll email out the URL for that site in the next day or two. > > > > Please review attached Strawman purposes, suggested attendees and agenda. > Any changes or comments, please email them to the community for all to > comment on please. > > > > The Summit has several dimensions and themes throughout our work there: > > 1) ďż˝ consistency and robustness of the Linux and Windows software stacks for > Release 2.0 of OpenFabrics; > > 2) - feature selection, development resources and timelines for Release 2.0; > > 3) - activities, features and processes of the Enterprise Working Group on > OFED 1.x until Release 2.0 is ready hand-off to the EWG; > > 4) ďż˝ enhancing the resources of the EWG to be ready for 2.0 it so that it > may be subsequently be distributed as OFED 2.0. and adopted by the > OpenFabrics vendor and customer communities for production use. > > > > This is a far too much work for just a day and half! PLEASE START NOW > exchanging ideas for additional features, contact peer engineers from > companies and customers to discuss work sizing, development resources, > identify volunteer developers for items so that when we meet on the 16th > weďż˝re not starting from a blank sheet! > > > > Sujal Das, Johann George, Matt Leininger, Pramod Srivatsa, Hal Rosenstock, > Tom Tucker and Bob Woodruff are leading the pre-meeting, STRAWMAN collation > of requirements, feature prioritization, developer assignments, sizing and > processes so that we have the list largely complete prior to the meeting and > people know has already volunteered for items from the list. > > > > Bill Boas > > VP, Business Development | System Fabric Works > > bboas at systemfabricworks.com | 510-375-8840 > > > > From sashak at voltaire.com Fri Oct 27 07:59:09 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Fri, 27 Oct 2006 16:59:09 +0200 Subject: [openib-general] [openfabrics-ewg] new server up and running In-Reply-To: <20061027080002.GA31235@mellanox.co.il> References: <20061026185532.GE11425@sashak.voltaire.com> <20061027080002.GA31235@mellanox.co.il> Message-ID: <20061027145909.GC8467@sashak.voltaire.com> On 10:00 Fri 27 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > > I even think we should replace --base-path with --user-path=scm. > > > > > > As it is, we have e.g. > > > /pub/scm/management.git -> /home/sashak/management.git > > > which is just confusing. > > > > It is accessable just as git://staging.openfabrics.org/management , > > what is confusing here? > > It's not clear from the name who's tree it is. > ~sashak/management/git would be better, I think you are meaning "don't put your personal stuff" there, right? If so, I agree. And this management.git tree which you can see under /pub/scm is not my personal tree , but mirrored to git "official" src/userspace/management SVN tree. (For technical reasons I prefer to do conversion on non-privileged ~sashak account). > - let's everyone keep his > tree uner his home directory. Yes for personal trees. But I think we also will want to have "official" tree for each sub-project and then we will place it into central /pub/scm (or will create symbolic link to the maintainer's tree). Make sense? Sasha From jsquyres at cisco.com Fri Oct 27 08:05:17 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Fri, 27 Oct 2006 11:05:17 -0400 Subject: [openib-general] [mvapich] Announcing the release of MVAPICH2 0.9.6 with on-demand connection management, multi-core optimized shared memory communication and memory hook support In-Reply-To: <8A563782-679E-411E-80C2-5A31D40C40AE@cisco.com> References: <200610230353.k9N3r4de015233@xi.cse.ohio-state.edu> <8A563782-679E-411E-80C2-5A31D40C40AE@cisco.com> Message-ID: Any response from the OSU crew? Can someone provide a reason why MVAPICH is still in OpenIB's Subversion repository? Please see my original mail, below, for more detailed questions. Thanks. On Oct 23, 2006, at 7:36 AM, Jeff Squyres wrote: > On Oct 22, 2006, at 11:53 PM, Dhabaleswar Panda wrote: > >> A stripped down version of this release is also available at the >> OpenIB SVN. > > I see this statement in every MVAPICH release notice and it > continues to puzzle me. > > I understand that there was a use for an alternate distribution > source before MVAPICH became open source. But now that the MVAPICH > code bases are freely available from OSU via multiple mechanisms > (anonymous SVN, tarball download, etc.), why is a "stripped down > version" maintained in the OpenIB SVN? > > 1. What, exactly, is the difference between the MVAPICH available > from OSU and the "stripped down version" in the OpenIB SVN? > > 2. Why would someone choose to download the "stripped down version" > from the OpenIB SVN? Have any real users/customers done so? > > 3. What is the point of maintaining yet more flavors of MVAPICH -- > aren't there enough already (multiple versions from OSU, more > versions available from each IB vendor)? > > DK -- can you please explain? Thanks. > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From tom at opengridcomputing.com Fri Oct 27 08:12:31 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 27 Oct 2006 10:12:31 -0500 Subject: [openib-general] [PATCH 3/9] NetEffect 10Gb RNIC Driver: openfabrics connection manager c file In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EB8@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EB8@venom2> Message-ID: <1161961951.2748.22.camel@trinity.ogc.int> [...snip...] > +extern void set_interface( > + UINT32 ip_addr, These should probably be the standard linux types u32, or uint32 > + UINT32 mask, > + UINT32 bcastaddr, > + UINT32 type > + ); [...snip...] > + struct NES_sockaddr_in inet_addr; > + struct sockaddr_in kinet_addr; Is there some reason why you need your own sockaddr and sockaddr_in structures? [...snip...] > + > +/** > + * nes_disconnect > + * > + * @param cm_id > + * @param abrupt > + * > + * @return int > + */ > +int nes_disconnect(struct iw_cm_id *cm_id, int abrupt) > +{ > + struct ib_qp_attr attr; > + struct ib_qp *ibqp; > + struct nes_qp *nesqp; > + struct nes_dev *nesdev = to_nesdev(cm_id->device); > + int err = 0; > + u8 u8temp; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + /* If the qp was already destroyed, then there's no QP */ > + if (cm_id->provider_data == 0) > + return 0; > + > + nesqp = (struct nes_qp *)cm_id->provider_data; > + ibqp = &nesqp->ibqp; > + > + /* Disassociate the QP from this cm_id */ > + cm_id->provider_data = 0; > + cm_id->rem_ref(cm_id); > + nesqp->cm_id = 0; > + > + stack_ops_p->decelerate_socket(nesqp->socket, > + (struct nes_uploaded_qp_context *) > + nesqp->nesqp_context); > + > + if (nesqp->active_conn) { > + u8temp = 1 << (ntohs(cm_id->local_addr.sin_port)&7); > + nesdev->apbv_table[ntohs(cm_id->local_addr.sin_port)>>3] &= > ~(u8temp); > + } else { > + dev_put(nesdev->netdev); > + /* Need to free the Last Streaming Mode Message */ > + pci_free_consistent(nesdev->pcidev, > + > nesqp->private_data_len+sizeof(*nesqp->ietf_frame), > + nesqp->ietf_frame, > + nesqp->ietf_frame_pbase); This is mailer perversion. You need to turn off wrapping in your mailer. It makes it hard to review the patch never mind apply it. > + } > + > + if (nesqp->ksock) sock_release(nesqp->ksock); > + stack_ops_p->sock_ops_p->close( nesqp->socket ); > + nesqp->ksock = 0; > + nesqp->socket = 0; > + if (nesqp->wq) { > + destroy_workqueue(nesqp->wq); This will deadlock if this function is called from a workqueue thread and CONFIG_HOTPLUG_CPU is enabled. > + nesqp->wq = NULL; > + } > + > + memset(&attr, 0, sizeof(struct ib_qp_attr)); > + if (abrupt) > + attr.qp_state = IB_QPS_ERR; > + else > + attr.qp_state = IB_QPS_SQD; > + > + return err; > +} > + > + > +/** > + * nes_accept > + * > + * @param cm_id > + * @param conn_param > + * > + * @return int > + */ > +int nes_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param > *conn_param) > +{ > + struct nes_qp *nesqp; > + struct nes_dev *nesdev; > + struct nes_adapter *nesadapter; > + struct ib_qp *ibqp; > + struct nes_hw_qp_wqe *wqe; > + struct nes_v4_quad nes_quad; > + struct ib_qp_attr attr; > + struct iw_cm_event cm_event; > + > + dprintk("%s:%s:%u: data len = %u\n", > + __FILE__, __FUNCTION__, __LINE__, > conn_param->private_data_len); > + > + ibqp = nes_get_qp(cm_id->device, conn_param->qpn); > + if (!ibqp) > + return -EINVAL; > + nesqp = to_nesqp(ibqp); > + nesdev = to_nesdev(nesqp->ibqp.device); > + nesadapter = nesdev->nesadapter; > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + nesqp->ietf_frame = pci_alloc_consistent(nesdev->pcidev, > + > sizeof(*nesqp->ietf_frame)+conn_param->private_data_len, > + &nesqp->ietf_frame_pbase); > + if (!nesqp->ietf_frame) { > + dprintk(KERN_ERR PFX "%s: Unable to allocate memory for private > data\n", __FUNCTION__); > + return -ENOMEM; > + } > + dprintk(PFX "%s: PCI consistent memory for " > + "private data located @ %p (pa = 0x%08lX.) size = %u.\n", > + __FUNCTION__, nesqp->ietf_frame, (unsigned > long)nesqp->ietf_frame_pbase, > + conn_param->private_data_len+sizeof(*nesqp->ietf_frame)); > + nesqp->private_data_len = conn_param->private_data_len; > + > + strcpy(&nesqp->ietf_frame->key[0], IEFT_MPA_KEY_REP); > + memcpy(&nesqp->ietf_frame->private_data, conn_param->private_data, > conn_param->private_data_len); > + nesqp->ietf_frame->private_data_size = > cpu_to_be16(conn_param->private_data_len); > + nesqp->ietf_frame->rev = mpa_version; > + nesqp->ietf_frame->flags = IETF_MPA_FLAGS_CRC; > + > + wqe = &nesqp->hwqp.sq_vbase[0]; > + *((struct nes_qp > **)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; > + *((u64 *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) |= > NES_SW_CONTEXT_ALIGN>>1; > + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = > cpu_to_le32(NES_IWARP_SQ_WQE_STREAMING); > + wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = > cpu_to_le32(conn_param->private_data_len+sizeof(*nesqp->ietf_frame)); > + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX] = > cpu_to_le32((u32)nesqp->ietf_frame_pbase); > + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX] = > cpu_to_le32((u32)((u64)nesqp->ietf_frame_pbase>>32)); > + wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX] = > cpu_to_le32(conn_param->private_data_len+sizeof(*nesqp->ietf_frame)); These are way over 80 columns wide. > + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX] = 0; > + > + nesqp->nesqp_context->ird_ord_sizes |= > NES_QPCONTEXT_ORDIRD_LSMM_PRESENT | NES_QPCONTEXT_ORDIRD_WRPDU; > + nesqp->skip_lsmm = 1; > + > + /* Cache the cm_id in the qp */ > + nesqp->cm_id = cm_id; This should all be reformatted with standard 8 character wide tabs. I think these were formatted with ts=4 > + nesqp->socket = (u32)cm_id->provider_data; [...snip...] From swise at opengridcomputing.com Fri Oct 27 08:27:04 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 10:27:04 -0500 Subject: [openib-general] [PATCH 8/9] NetEffect 10Gb RNIC Driver: openfabrics verbs interface c file In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EC3@venom2> References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9EC3@venom2> Message-ID: <1161962824.14333.33.camel@stevo-desktop> General comments: The patches are all messed up due to your mailer wrapping. It makes it hard to review. There are lots of comments saying "Catch the error cases". You'll need to address these. Formatting: read the linux kernel coding guidelines. More below... ... > +static struct ib_ucontext *nes_alloc_ucontext(struct ib_device *ibdev, > + > struct ib_udata *udata) { > + struct nes_dev *nesdev = to_nesdev(ibdev); > + struct nes_alloc_ucontext_resp uresp; > + struct nes_ucontext *nes_ucontext; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + memset(&uresp, 0, sizeof uresp); > + > + uresp.max_qps = nesdev->nesadapter->max_qp; > + uresp.max_pds = nesdev->nesadapter->max_pd; > + uresp.wq_size = nesdev->nesadapter->max_qp_wr*2; > + > + nes_ucontext = kmalloc(sizeof *nes_ucontext, GFP_KERNEL); > + if (!nes_ucontext) > + return ERR_PTR(-ENOMEM); > + > + memset(nes_ucontext, 0, sizeof(*nes_ucontext)); > + kzalloc() will kmalloc and initialize the memory to zeros. > + nes_ucontext->nesdev = nesdev; > + /* TODO: much better ways to manage this area */ > + /* TODO: cqs should be user buffers */ > + nes_ucontext->mmap_wq_offset = ((uresp.max_pds * > 4096)+PAGE_SIZE-1)/PAGE_SIZE; > + nes_ucontext->mmap_cq_offset = nes_ucontext->mmap_wq_offset + > + > ((sizeof(struct nes_hw_qp_wqe) * uresp.max_qps * > 2)+PAGE_SIZE-1)/PAGE_SIZE; > + I think you can use PAGE_ALIGN() here... > + if (ib_copy_to_udata(udata, &uresp, sizeof uresp)) { > + kfree(nes_ucontext); > + return ERR_PTR(-EFAULT); > + } > + > + INIT_LIST_HEAD(&nes_ucontext->cq_reg_mem_list); > + return &nes_ucontext->ibucontext; > +} > + > + > +/** > + * nes_dealloc_ucontext > + * > + * @param context > + * > + * @return int > + */ > +static int nes_dealloc_ucontext(struct ib_ucontext *context) > +{ > +// struct nes_dev *nesdev = to_nesdev(context->device); > + struct nes_ucontext *nes_ucontext = to_nesucontext(context); > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + kfree(nes_ucontext); > + return 0; > +} > + > + > +/** > + * nes_mmap > + * > + * @param context > + * @param vma > + * > + * @return int > + */ > +static int nes_mmap(struct ib_ucontext *context, struct vm_area_struct > *vma) > +{ > + unsigned long index; > + struct nes_dev *nesdev = to_nesdev(context->device); > +// struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_ucontext *nes_ucontext; > + struct nes_qp *nesqp; > + > + nes_ucontext = to_nesucontext(context); > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + > + if (vma->vm_pgoff >= nes_ucontext->mmap_wq_offset) { > + index = (vma->vm_pgoff - nes_ucontext->mmap_wq_offset) * > PAGE_SIZE; > + index /= ((sizeof(struct nes_hw_qp_wqe) * > nesdev->nesadapter->max_qp_wr * 2)+PAGE_SIZE-1)&(~(PAGE_SIZE-1)); Is there a way to do this without division? > if (!test_bit(index, nes_ucontext->allocated_wqs)) { > + dprintk("%s: wq %lu not > allocated\n",__FUNCTION__, index); > + return -EFAULT; > + } > + nesqp = nes_ucontext->mmap_nesqp[index]; > + if (NULL == nesqp) { > + dprintk("%s: wq %lu has a NULL QP > base.\n",__FUNCTION__, index); > + return -EFAULT; > + } > + if (remap_pfn_range(vma, vma->vm_start, > + > nesqp->hwqp.sq_pbase>>PAGE_SHIFT, > + > vma->vm_end-vma->vm_start, > + > vma->vm_page_prot)) { > + return(-EAGAIN); > + } > + vma->vm_private_data = nesqp; > + return 0; > + } else { > + index = vma->vm_pgoff; > + if (!test_bit(index, nes_ucontext->allocated_doorbells)) > + return -EFAULT; > + > + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > + if ( io_remap_pfn_range(vma, vma->vm_start, > + > (nesdev->nesadapter->doorbell_start+ > + > ((nes_ucontext->mmap_db_index[index]-nesdev->base_doorbell_index)*4096)) > + >> > PAGE_SHIFT, PAGE_SIZE, vma->vm_page_prot)) > + return -EAGAIN; > + vma->vm_private_data = nes_ucontext; > + return 0; > + } > + > + return -ENOSYS; > + return 0; > +} > + > + > +/** > + * nes_alloc_pd > + * > + * @param ibdev > + * @param context > + * @param udata > + * > + * @return struct ib_pd* > + */ > +static struct ib_pd *nes_alloc_pd(struct ib_device *ibdev, > + struct > ib_ucontext *context, > + struct > ib_udata *udata) { > + struct nes_pd *nespd; > + struct nes_dev *nesdev = to_nesdev(ibdev); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_ucontext *nes_ucontext; > + struct nes_alloc_pd_resp uresp; > + u32 pd_num = 0; > + int err; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + err = nes_alloc_resource(nesadapter, nesadapter->allocated_pds, > + > nesadapter->max_pd, &pd_num, &nesadapter->next_pd); > + if (err) { > + return ERR_PTR(err); > + } > + > + nespd = kmalloc(sizeof *nespd, GFP_KERNEL); > + if (!nespd) { > + nes_free_resource(nesadapter, nesadapter->allocated_pds, > pd_num); > + return ERR_PTR(-ENOMEM); > + } > + dprintk("Allocating PD (%p) for ib device %s\n", nespd, > nesdev->ibdev.name); > + > + memset(nespd, 0, sizeof(*nespd)); > + kzalloc()... > + /* TODO: consider per function considerations */ > + nespd->pd_id = pd_num+nesadapter->base_pd; > + err = 0; > + if (err) { > + nes_free_resource(nesadapter, nesadapter->allocated_pds, > pd_num); > + kfree(nespd); > + return ERR_PTR(err); > + } > + > + if (context) { > + nes_ucontext = to_nesucontext(context); > + nespd->mmap_db_index = > find_next_zero_bit(nes_ucontext->allocated_doorbells, > + > NES_MAX_USER_DB_REGIONS, nes_ucontext->first_free_db ); > + dprintk("find_first_zero_biton doorbells returned %u, > mapping pd_id %u.\n", nespd->mmap_db_index, nespd->pd_id); > + if (nespd->mmap_db_index > NES_MAX_USER_DB_REGIONS) { > + nes_free_resource(nesadapter, > nesadapter->allocated_pds, pd_num); > + kfree(nespd); > + return ERR_PTR(-ENOMEM); > + } > + > + uresp.pd_id = nespd->pd_id; > + uresp.mmap_db_index = nespd->mmap_db_index; > + if (ib_copy_to_udata(udata, &uresp, sizeof uresp)) { > + nes_free_resource(nesadapter, > nesadapter->allocated_pds, pd_num); > + kfree(nespd); > + return ERR_PTR(-EFAULT); > + } > + set_bit(nespd->mmap_db_index, > nes_ucontext->allocated_doorbells); > + nes_ucontext->mmap_db_index[nespd->mmap_db_index] = > nespd->pd_id; > + nes_ucontext->first_free_db = nespd->mmap_db_index + 1; > + } > + > + dprintk("%s: PD%u structure located @%p.\n", __FUNCTION__, > nespd->pd_id, nespd); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + return (&nespd->ibpd); > +} > + > + > +/** > + * nes_dealloc_pd > + * > + * @param ibpd > + * > + * @return int > + */ > +static int nes_dealloc_pd(struct ib_pd *ibpd) > +{ > + struct nes_ucontext *nes_ucontext; > + struct nes_pd *nespd = to_nespd(ibpd); > + struct nes_dev *nesdev = to_nesdev(ibpd->device); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + // TODO: Do work here. What work? > + if ((ibpd->uobject)&&(ibpd->uobject->context)) { > + nes_ucontext = to_nesucontext(ibpd->uobject->context); > + dprintk("%s: Clearing bit %u from allocated > doorbells\n", __FUNCTION__, nespd->mmap_db_index); > + clear_bit(nespd->mmap_db_index, > nes_ucontext->allocated_doorbells); > + nes_ucontext->mmap_db_index[nespd->mmap_db_index] = 0; > + if (nes_ucontext->first_free_db > nespd->mmap_db_index) > { > + nes_ucontext->first_free_db = > nespd->mmap_db_index; > + } > + } > + > + dprintk("%s: Deallocating PD%u structure located @%p.\n", > __FUNCTION__, nespd->pd_id, nespd); > + nes_free_resource(nesadapter, nesadapter->allocated_pds, > nespd->pd_id-nesadapter->base_pd); > + kfree(nespd); > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return 0; > +} > + > + > +/** > + * nes_create_ah > + * > + * @param pd > + * @param ah_attr > + * > + * @return struct ib_ah* > + */ > +static struct ib_ah *nes_create_ah(struct ib_pd *pd, struct ib_ah_attr > *ah_attr) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + return ERR_PTR(-ENOSYS); > +} > + > + > +/** > + * nes_destroy_ah > + * > + * @param ah > + * > + * @return int > + */ > +static int nes_destroy_ah(struct ib_ah *ah) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return -ENOSYS; > +} > + > + > +/** > + * nes_create_qp > + * > + * @param ib_pd > + * @param init_attr > + * @param udata > + * > + * @return struct ib_qp* > + */ > +static struct ib_qp *nes_create_qp(struct ib_pd *ib_pd, > + > struct ib_qp_init_attr *init_attr, > + > struct ib_udata *udata) { > + u64 u64temp= 0, u64nesqp = 0; > + struct nes_pd *nespd = to_nespd(ib_pd); > + struct nes_dev *nesdev = to_nesdev(ib_pd->device); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_qp *nesqp; > + struct nes_cq *nescq; > + struct nes_ucontext *nes_ucontext; > + struct nes_hw_cqp_wqe *cqp_wqe; > + struct nes_create_qp_resp uresp; > + u32 cqp_head = 0; > + u32 qp_num = 0; > +// u32 counter = 0; > + void *mem; > + > + unsigned long flags; > + int ret; > + int err; > + int sq_size; > + int rq_size; > + u8 sq_encoded_size; > + u8 rq_encoded_size; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + switch (init_attr->qp_type) { > + case IB_QPT_RC: > + /* TODO: */ > + init_attr->cap.max_inline_data = 0; > + > + if (init_attr->cap.max_send_wr < 32) { > + sq_size = 32; > + sq_encoded_size = 1; > + } else if (init_attr->cap.max_send_wr < 128) { > + sq_size = 128; > + sq_encoded_size = 2; > + } else if (init_attr->cap.max_send_wr < 512) { > + sq_size = 512; > + sq_encoded_size = 3; > + } else { > + printk(KERN_ERR PFX "%s: SQ size (%u) too > large.\n", __FUNCTION__, init_attr->cap.max_send_wr); > + return ERR_PTR(-EINVAL); > + } > + init_attr->cap.max_send_wr = sq_size - 2; > + if (init_attr->cap.max_recv_wr < 32) { > + rq_size = 32; > + rq_encoded_size = 1; > + } else if (init_attr->cap.max_recv_wr < 128) { > + rq_size = 128; > + rq_encoded_size = 2; > + } else if (init_attr->cap.max_recv_wr < 512) { > + rq_size = 512; > + rq_encoded_size = 3; > + } else { > + printk(KERN_ERR PFX "%s: RQ size (%u) too > large.\n", __FUNCTION__, init_attr->cap.max_recv_wr); > + return ERR_PTR(-EINVAL); > + } > + init_attr->cap.max_recv_wr = rq_size -1; > + dprintk("%s: RQ size = %u, SQ Size = %u.\n", > __FUNCTION__, rq_size, sq_size); > + > + ret = nes_alloc_resource(nesadapter, > nesadapter->allocated_qps, nesadapter->max_qp, &qp_num, > &nesadapter->next_qp); > + if (ret) { > + return ERR_PTR(ret); > + } > + > + /* Need 512 (actually now 1024) byte alignment on this > structure */ > + mem = kzalloc(sizeof(*nesqp)+NES_SW_CONTEXT_ALIGN-1, > GFP_KERNEL); > + if (!mem) { > + nes_free_resource(nesadapter, > nesadapter->allocated_qps, qp_num); > + dprintk("%s: Unable to allocate QP\n", > __FUNCTION__); > + return ERR_PTR(-ENOMEM); > + } > + u64nesqp = (u64)mem; //u64nesqp = (u64)((uint)mem); > + u64nesqp += ((u64)NES_SW_CONTEXT_ALIGN) - 1; > + u64temp = ((u64)NES_SW_CONTEXT_ALIGN) - 1; > + u64nesqp &= ~u64temp; > + nesqp = (struct nes_qp *)u64nesqp; > + dprintk("nesqp = %p, allocated buffer = %p. Rounded to > closest %u\n", nesqp, mem, NES_SW_CONTEXT_ALIGN); > + nesqp->allocated_buffer = mem; > + > + if (udata) { > + if ((ib_pd->uobject)&&(ib_pd->uobject->context)) > { > + nesqp->user_mode = 1; > + nes_ucontext = > to_nesucontext(ib_pd->uobject->context); > + nesqp->mmap_sq_db_index = > find_next_zero_bit(nes_ucontext->allocated_wqs, > + > NES_MAX_USER_WQ_REGIONS, nes_ucontext->first_free_wq); > + dprintk("find_first_zero_biton wqs > returned %u\n", nespd->mmap_db_index); > + if > (nesqp->mmap_sq_db_index>NES_MAX_USER_WQ_REGIONS) { > + dprintk("%s: db index is greater > than max user reqions, failing create QP\n", __FUNCTION__); > + nes_free_resource(nesadapter, > nesadapter->allocated_qps, qp_num); > + kfree(nesqp->allocated_buffer); > + return ERR_PTR(-ENOMEM); > + } > + set_bit(nesqp->mmap_sq_db_index, > nes_ucontext->allocated_wqs); > + > nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = nesqp; > + nes_ucontext->first_free_wq = > nesqp->mmap_sq_db_index + 1; > + } else { > + nes_free_resource(nesadapter, > nesadapter->allocated_qps, qp_num); > + kfree(nesqp->allocated_buffer); > + return ERR_PTR(-EFAULT); > + } > + } > + > + // Allocate Memory > + nesqp->qp_mem_size = (sizeof(struct > nes_hw_qp_wqe)*sq_size) + /* needs 512 byte alignment */ > + (sizeof(struct > nes_hw_qp_wqe)*rq_size) + /* needs 512 > byte alignment */ > + > max((u32)sizeof(struct nes_qp_context),((u32)256)) + /* needs > 8 byte alignment */ > + 256; > /* this is Q2 */ > + /* Round up to a multiple of a page */ > + nesqp->qp_mem_size += PAGE_SIZE - 1; > + nesqp->qp_mem_size &= ~(PAGE_SIZE - 1); > + > + /* TODO: Need to separate out nesqp_context at that > point too!!!! */ > + mem = pci_alloc_consistent(nesdev->pcidev, > nesqp->qp_mem_size, > + > &nesqp->hwqp.sq_pbase); > + if (!mem) { > + nes_free_resource(nesadapter, > nesadapter->allocated_qps, qp_num); > + dprintk(KERN_ERR PFX "Unable to allocate memory > for host descriptor rings\n"); > + kfree(nesqp->allocated_buffer); > + return ERR_PTR(-ENOMEM); > + } > + dprintk(PFX "%s: PCI consistent memory for " > + "host descriptor rings located @ %p (pa > = 0x%08lX.) size = %u.\n", > + __FUNCTION__, mem, (unsigned > long)nesqp->hwqp.sq_pbase, > + nesqp->qp_mem_size); > + memset(mem,0, nesqp->qp_mem_size); > + > + nesqp->hwqp.sq_vbase = mem; > + nesqp->hwqp.sq_size = sq_size; > + nesqp->hwqp.sq_encoded_size = sq_encoded_size; > + nesqp->hwqp.sq_head = 1; > + mem += sizeof(struct nes_hw_qp_wqe)*sq_size; > + > + nesqp->hwqp.rq_vbase = mem; > + nesqp->hwqp.rq_size = rq_size; > + nesqp->hwqp.rq_encoded_size = rq_encoded_size; > + nesqp->hwqp.rq_pbase = nesqp->hwqp.sq_pbase + > sizeof(struct nes_hw_qp_wqe)*sq_size; > + mem += sizeof(struct nes_hw_qp_wqe)*rq_size; > + > + nesqp->hwqp.q2_vbase = mem; > + nesqp->hwqp.q2_pbase = nesqp->hwqp.rq_pbase + > sizeof(struct nes_hw_qp_wqe)*rq_size; > + mem += 256; > + memset(nesqp->hwqp.q2_vbase, 0, 256); > + > + nesqp->nesqp_context = mem; > + nesqp->nesqp_context_pbase = nesqp->hwqp.q2_pbase + 256; > + memset(nesqp->nesqp_context, 0, > sizeof(*nesqp->nesqp_context)); > + > + nesqp->hwqp.qp_id = qp_num; > + nesqp->ibqp.qp_num = nesqp->hwqp.qp_id; > + nesqp->nespd = nespd; > + > + nescq = to_nescq(init_attr->send_cq); > + nesqp->nesscq = nescq; > + nescq = to_nescq(init_attr->recv_cq); > + nesqp->nesrcq = nescq; > + > + /* TODO: account for these things already being filled > in over in the CM code */ > + nesqp->nesqp_context->misc |= > (u32)PCI_FUNC(nesdev->pcidev->devfn) << > NES_QPCONTEXT_MISC_PCI_FCN_SHIFT; > + nesqp->nesqp_context->misc |= > (u32)nesqp->hwqp.rq_encoded_size << NES_QPCONTEXT_MISC_RQ_SIZE_SHIFT; > + nesqp->nesqp_context->misc |= > (u32)nesqp->hwqp.sq_encoded_size << NES_QPCONTEXT_MISC_SQ_SIZE_SHIFT; > + if (!udata) { > + nesqp->nesqp_context->misc |= > NES_QPCONTEXT_MISC_PRIV_EN; > + } > + //NES_QPCONTEXT_MISC_IWARP_VER_SHIFT > + nesqp->nesqp_context->cqs = > nesqp->nesscq->hw_cq.cq_number + ((u32)nesqp->nesrcq->hw_cq.cq_number << > 16); > + u64temp = (u64)nesqp->hwqp.sq_pbase; > + nesqp->nesqp_context->sq_addr_low = (u32)u64temp; > + nesqp->nesqp_context->sq_addr_high = (u32)(u64temp>>32); > + u64temp = (u64)nesqp->hwqp.rq_pbase; > + nesqp->nesqp_context->rq_addr_low = (u32)u64temp; > + nesqp->nesqp_context->rq_addr_high = (u32)(u64temp>>32); > + /* TODO: create a nic index value and a ip index in > nes_dev */ > + if (qp_num & 1) { > + nesqp->nesqp_context->misc2 |= > (u32)PCI_FUNC(nesdev->pcidev->devfn+1) << > NES_QPCONTEXT_MISC2_NIC_INDEX_SHIFT; > + } else { > + nesqp->nesqp_context->misc2 |= > (u32)PCI_FUNC(nesdev->pcidev->devfn) << > NES_QPCONTEXT_MISC2_NIC_INDEX_SHIFT; > + } > + nesqp->nesqp_context->pd_index_wscale |= > (u32)nesqp->nespd->pd_id << 16; > + u64temp = (u64)nesqp->hwqp.q2_pbase; > + nesqp->nesqp_context->q2_addr_low = (u32)u64temp; > + nesqp->nesqp_context->q2_addr_high = (u32)(u64temp>>32); > + *((struct nes_qp > **)&nesqp->nesqp_context->aeq_token_low) = nesqp; > + nesqp->nesqp_context->ird_ord_sizes = > NES_QPCONTEXT_ORDIRD_ALSMM | > + > ((((u32)nesadapter->max_irrq_wr)< S_QPCONTEXT_ORDIRD_IRDSIZE_MASK); > + if (disable_mpa_crc) { > + dprintk("%s Disabling MPA crc checking due to > module option.\n", __FUNCTION__); > + nesqp->nesqp_context->ird_ord_sizes |= > NES_QPCONTEXT_ORDIRD_RNMC; > + } > + > + /* Create the QP */ > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > NES_CQP_CREATE_QP | NES_CQP_QP_TYPE_IWARP | NES_CQP_QP_IWARP_STATE_IDLE; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_QP_CQS_VALID; > + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = > nesqp->hwqp.qp_id; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = > 0; > + u64temp = (u64)nesqp->nesqp_context_pbase; > + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = > (u32)u64temp; > + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = > (u32)(u64temp>>32); > + > + barrier(); > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + dprintk("Waiting for create iWARP QP%u to complete.\n", > nesqp->hwqp.qp_id); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("Create iwarp QP completed, wait_event_timeout > ret = %u.\n", ret); > + /* TODO: Catch error code... */ > + Catch the error code. > + if (ib_pd->uobject) { > + uresp.mmap_sq_db_index = > nesqp->mmap_sq_db_index; > + uresp.actual_sq_size = sq_size; > + uresp.actual_rq_size = rq_size; > + uresp.qp_id = nesqp->hwqp.qp_id; > + if (ib_copy_to_udata(udata, &uresp, sizeof > uresp)) { > + /* TODO: Much more clean up to do here > */ > + Do the cleanup. > nes_free_resource(nesadapter, > nesadapter->allocated_qps, qp_num); > + kfree(nesqp->allocated_buffer); > + return ERR_PTR(-EFAULT); > + } > + } > + > + > + dprintk("%s: QP%u structure located @%p.Size = %u.\n", > __FUNCTION__, nesqp->hwqp.qp_id, nesqp, (u32)sizeof(*nesqp)); > + spin_lock_init(&nesqp->lock); > + init_waitqueue_head( &nesqp->state_waitq ); > + nes_add_ref(&nesqp->ibqp); > + nesqp->aewq = > create_singlethread_workqueue("NesDisconnectWQ"); > + break; > + default: > + dprintk("%s: Invalid QP type: %d\n", __FUNCTION__, > + init_attr->qp_type); > + return ERR_PTR(-EINVAL); > + break; > + } > + > + /* update the QP table */ > + nesdev->nesadapter->qp_table[nesqp->hwqp.qp_id-NES_FIRST_QPN] = > nesqp; > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + return &nesqp->ibqp; > +} > + > + > +/** > + * nes_destroy_qp > + * > + * @param ib_qp > + * > + * @return int > + */ > +static int nes_destroy_qp(struct ib_qp *ib_qp) > +{ > + u64 u64temp; > + struct nes_qp *nesqp = to_nesqp(ib_qp); > + struct nes_dev *nesdev = to_nesdev(ib_qp->device); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_hw_cqp_wqe *cqp_wqe; > + struct nes_ucontext *nes_ucontext; > + struct ib_qp_attr attr; > + unsigned long flags; > + int ret; > + u32 cqp_head; > + > + dprintk("%s:%s:%u: Destroying QP%u\n", __FILE__, __FUNCTION__, > __LINE__, nesqp->hwqp.qp_id); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + /* Blow away the connection if it exists. */ > + if (nesqp->cm_id && nesqp->cm_id->provider_data) { > + /* TODO: Probably want to use error as the state */ > + attr.qp_state = IB_QPS_SQD; > + nes_modify_qp(&nesqp->ibqp, &attr, IB_QP_STATE ); > + } > + > + destroy_workqueue(nesqp->aewq); > + /* TODO: Add checks... MW bound count, others ? */ > + > + /* Destroy the QP */ > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > cpu_to_le32(NES_CQP_DESTROY_QP | NES_CQP_QP_TYPE_IWARP); > + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = > cpu_to_le32(nesqp->hwqp.qp_id); > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; > + u64temp = (u64)nesqp->nesqp_context_pbase; > + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = > cpu_to_le32((u32)u64temp); > + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = > cpu_to_le32((u32)(u64temp>>32)); > + > + barrier(); > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + > + /* Wait for CQP */ > + dprintk("Waiting for destroy iWARP QP%u to complete.\n", > nesqp->hwqp.qp_id); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("Destroy iwarp QP completed, wait_event_timeout ret = > %u.\n", ret); > + > + /* TODO: Catch error cases */ > + Catch error cases. > + if (nesqp->user_mode) { > + if ((ib_qp->uobject)&&(ib_qp->uobject->context)) { > + nes_ucontext = > to_nesucontext(ib_qp->uobject->context); > + clear_bit(nesqp->mmap_sq_db_index, > nes_ucontext->allocated_wqs); > + > nes_ucontext->mmap_nesqp[nesqp->mmap_sq_db_index] = NULL; > + if (nes_ucontext->first_free_wq > > nesqp->mmap_sq_db_index) { > + nes_ucontext->first_free_wq = > nesqp->mmap_sq_db_index; > + } > + } > + } > + // Free the control structures > + pci_free_consistent(nesdev->pcidev, nesqp->qp_mem_size, > nesqp->hwqp.sq_vbase, > + nesqp->hwqp.sq_pbase); > + > + nesadapter->qp_table[nesqp->hwqp.qp_id-NES_FIRST_QPN] = NULL; > + nes_free_resource(nesadapter, nesadapter->allocated_qps, > nesqp->hwqp.qp_id); > + > + nes_rem_ref(&nesqp->ibqp); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + return 0; > +} > + > + > +/** > + * nes_create_cq > + * > + * @param ibdev > + * @param entries > + * @param context > + * @param udata > + * > + * @return struct ib_cq* > + */ > +static struct ib_cq *nes_create_cq(struct ib_device *ibdev, int > entries, > + > struct ib_ucontext *context, > + > struct ib_udata *udata) { > + u64 u64temp; > + struct nes_dev *nesdev = to_nesdev(ibdev); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_cq *nescq; > + struct nes_ucontext *nes_ucontext = NULL; > + void *mem; > + struct nes_hw_cqp_wqe *cqp_wqe; > + struct nes_pbl *nespbl = NULL; > + struct nes_create_cq_req req; > + struct nes_create_cq_resp resp; > + u32 cqp_head; > + u32 cq_num= 0; > + u32 pbl_entries = 1; > + int err = -ENOSYS; > + unsigned long flags; > + int ret; > + > + dprintk("%s:%s:%u: entries = %u\n", __FILE__, __FUNCTION__, > __LINE__, entries); > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + err = nes_alloc_resource(nesadapter, nesadapter->allocated_cqs, > nesadapter->max_cq, &cq_num, &nesadapter->next_cq); > + if (err) { > + return ERR_PTR(err); > + } > + > + nescq = kmalloc(sizeof(*nescq), GFP_KERNEL); > + if (!nescq) { > + dprintk("%s: Unable to allocate CQ\n", __FUNCTION__); > + return ERR_PTR(-ENOMEM); > + } > + > + memset(nescq, 0, sizeof *nescq); kzalloc() > + nescq->hw_cq.cq_size = max(entries+1,5); /* four usable entries > seems like a reasonable min */ > + nescq->hw_cq.cq_number = cq_num; > + nescq->ibcq.cqe = nescq->hw_cq.cq_size - 1; > + > + if (context) { > + nes_ucontext = to_nesucontext(context); > + if (ib_copy_from_udata(&req, udata, sizeof(req))) > + return ERR_PTR(-EFAULT); > + dprintk("%s: CQ Virtual Address = %08lX, size = %u.\n", > + __FUNCTION__, (unsigned > long)req.user_cq_buffer, entries); > + list_for_each_entry(nespbl, > &nes_ucontext->cq_reg_mem_list, list) { > + if (nespbl->user_base == (unsigned long > )req.user_cq_buffer) { > + list_del(&nespbl->list); > + err = 0; > + dprintk("%s: Found PBL for virtual CQ. > nespbl=%p.\n", __FUNCTION__, nespbl); > + break; > + } > + } > + if (err) { > + nes_free_resource(nesadapter, > nesadapter->allocated_cqs, cq_num); > + kfree(nescq); > + return ERR_PTR(err); > + } > + pbl_entries = nespbl->pbl_size >> 3; > + nescq->cq_mem_size = 0; > + } else { > + nescq->cq_mem_size = nescq->hw_cq.cq_size * > sizeof(struct nes_hw_cqe); > + dprintk("%s: Attempting to allocate pci memory (%u > entries, %u bytes) for CQ%u.\n", > + __FUNCTION__, entries, > nescq->cq_mem_size, nescq->hw_cq.cq_number); > + > + /* allocate the physical buffer space */ > + /* TODO: look into how to allocate this memory to be > used for user space */ > + mem = pci_alloc_consistent(nesdev->pcidev, > nescq->cq_mem_size, > + > &nescq->hw_cq.cq_pbase); > + if (!mem) { > + nes_free_resource(nesadapter, > nesadapter->allocated_cqs, cq_num); > + dprintk(KERN_ERR PFX "Unable to allocate pci > memory for cq\n"); > + return ERR_PTR(-ENOMEM); > + } > + > + memset(mem, 0, nescq->cq_mem_size); > + nescq->hw_cq.cq_vbase = mem; > + nescq->hw_cq.cq_head = 0; > + dprintk("%s: CQ%u virtual address @ %p, phys = 0x%08X > .\n", > + __FUNCTION__, nescq->hw_cq.cq_number, > nescq->hw_cq.cq_vbase, (u32)nescq->hw_cq.cq_pbase); > + } > + > + nescq->hw_cq.ce_handler = iwarp_ce_handler; > + spin_lock_init(&nescq->lock); > + > + /* Send CreateCQ request to CQP */ > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_CREATE_CQ > | NES_CQP_CQ_CEQ_VALID | > + > NES_CQP_CQ_CEQE_MASK |(nescq->hw_cq.cq_size<<16); > + if (1 != pbl_entries) { > + if (0 == nesadapter->free_256pbl) { > + /* TODO: need to backout */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, > flags); > + nes_free_resource(nesadapter, > nesadapter->allocated_cqs, cq_num); > + kfree(nescq); > + return ERR_PTR(-ENOMEM); > + } else { > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_CQ_VIRT; > + nescq->virtual_cq = 1; > + nesadapter->free_256pbl--; > + } > + } > + > + /* TODO: Separate iWARP from to its own CEQ? */ > + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nescq->hw_cq.cq_number > | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<16); > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; > + if (context) { > + if (1 != pbl_entries) > + u64temp = (u64)nespbl->pbl_pbase; > + else > + u64temp = nespbl->pbl_vbase[0]; > + > cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = > nes_ucontext->mmap_db_index[0]; > + } else { > + u64temp = (u64)nescq->hw_cq.cq_pbase; > + > cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = 0; > + } > + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_LOW_IDX] = (u32)u64temp; > + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_HIGH_IDX] = > (u32)(u64temp>>32); > + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX] = 0; > + *((struct nes_hw_cq > **)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) = > &nescq->hw_cq; > + *((u64 *)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) > >>= 1; > + > + barrier(); > + dprintk("%s: CQ%u context = 0x%08X:0x%08X.\n", __FUNCTION__, > nescq->hw_cq.cq_number, > + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX], > + cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]); > + > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + dprintk("Waiting for create iWARP CQ%u to complete.\n", > nescq->hw_cq.cq_number); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("Create iwarp CQ completed, wait_event_timeout ret = > %d.\n", ret); > + /* TODO: Catch error cases */ > + Catch error cases. > + if (context) { > + /* free the nespbl */ > + pci_free_consistent(nesdev->pcidev, nespbl->pbl_size, > + > nespbl->pbl_vbase, nespbl->pbl_pbase); > + kfree(nespbl); > + /* write back the parameters */ > + resp.cq_id = nescq->hw_cq.cq_number; > + resp.cq_size = nescq->hw_cq.cq_size; > + resp.mmap_db_index = 0; > + if (ib_copy_to_udata(udata, &resp, sizeof resp)) { > + nes_free_resource(nesadapter, > nesadapter->allocated_cqs, cq_num); > + kfree(nescq); > + return ERR_PTR(-EFAULT); > + } > + } > + > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + return &nescq->ibcq; > +} > + > + > +/** > + * nes_destroy_cq > + * > + * @param ib_cq > + * > + * @return int > + */ > +static int nes_destroy_cq(struct ib_cq *ib_cq) > +{ > + struct nes_cq *nescq; > + struct nes_dev *nesdev; > + struct nes_adapter *nesadapter; > + struct nes_hw_cqp_wqe *cqp_wqe; > + u32 cqp_head; > + unsigned long flags; > + int ret; > + > + dprintk("%s:%s:%u: %p.\n", __FILE__, __FUNCTION__, __LINE__, > ib_cq); > + > + if (ib_cq == NULL) > + return 0; > + > + nescq = to_nescq(ib_cq); > + nesdev = to_nesdev(ib_cq->device); > + nesadapter = nesdev->nesadapter; > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + > + /* Send DestroyCQ request to CQP */ > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + if (nescq->virtual_cq) { > + nesadapter->free_256pbl++; > + if (nesadapter->free_256pbl > nesadapter->max_256pbl) { > + printk(KERN_ERR PFX "%s: free 256B PBLs(%u) has > exceeded the max(%u)\n", > + __FUNCTION__, > nesadapter->free_256pbl, nesadapter->max_256pbl); > + } > + } > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = NES_CQP_DESTROY_CQ > | (nescq->hw_cq.cq_size<<16); > + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = nescq->hw_cq.cq_number > | ((u32)PCI_FUNC(nesdev->pcidev->devfn)<<16); > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; > + > + barrier(); > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + dprintk("Waiting for destroy iWARP CQ%u to complete.\n", > nescq->hw_cq.cq_number); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("Destroy iwarp CQ completed, wait_event_timeout ret = > %u.\n", ret); > + /* TODO: catch CQP error cases */ > + Catch error cases. > + if (nescq->cq_mem_size) > + pci_free_consistent(nesdev->pcidev, nescq->cq_mem_size, > (void *)nescq->hw_cq.cq_vbase, > + > nescq->hw_cq.cq_pbase); > + nes_free_resource(nesadapter, nesadapter->allocated_cqs, > nescq->hw_cq.cq_number); > + kfree(nescq); > + > + dprintk("%s: netdev refcnt = %u.\n", __FUNCTION__, > atomic_read(&nesdev->netdev->refcnt)); > + return 0; > +} > + > + > +/** > + * nes_reg_mr > + * > + * @param nesdev > + * @param nespd > + * @param stag > + * @param region_length > + * @param root_vpbl > + * @param single_buffer > + * @param pbl_count > + * @param residual_page_count > + * @param acc > + * @param iova_start > + * > + * @return int > + */ > +static int nes_reg_mr(struct nes_dev *nesdev, > + struct nes_pd *nespd, > + u32 stag, > + u64 region_length, > + struct nes_root_vpbl > *root_vpbl, > + dma_addr_t single_buffer, > + u16 pbl_count, > + u16 residual_page_count, > + int acc, > + u64 * iova_start) > +{ > + struct nes_hw_cqp_wqe *cqp_wqe; > + unsigned long flags; > + u32 cqp_head; > + int ret; > + struct nes_adapter *nesadapter = nesdev->nesadapter; > +// int count; > + > + /* Register the region with the adapter */ > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + > + /* track PBL resources */ > + if (pbl_count != 0) { > + if (pbl_count > 1) { > + /* Two level PBL */ > + if ((pbl_count+1) > nesadapter->free_4kpbl) { > + > spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + return (-ENOMEM); > + } else { > + nesadapter->free_4kpbl -= pbl_count+1; > + } > + } else if (residual_page_count > 32) { > + if (pbl_count > nesadapter->free_4kpbl) { > + > spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + return -ENOMEM; > + } else { > + nesadapter->free_4kpbl -= pbl_count; > + } > + } else { > + if (pbl_count > nesadapter->free_256pbl) { > + > spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + return -ENOMEM; > + } else { > + nesadapter->free_256pbl -= pbl_count; > + } > + } > + } > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > NES_CQP_REGISTER_STAG | NES_CQP_STAG_RIGHTS_LOCAL_READ; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_STAG_VA_TO | NES_CQP_STAG_MR; > + if (acc & IB_ACCESS_LOCAL_WRITE) { > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_STAG_RIGHTS_LOCAL_WRITE; > + } > + if (acc & IB_ACCESS_REMOTE_WRITE) { > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_STAG_RIGHTS_REMOTE_WRITE | NES_CQP_STAG_REM_ACC_EN; > + } > + if (acc & IB_ACCESS_REMOTE_READ) { > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_STAG_RIGHTS_REMOTE_READ | NES_CQP_STAG_REM_ACC_EN; > + } > + if (acc & IB_ACCESS_MW_BIND) { > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_STAG_RIGHTS_WINDOW_BIND | NES_CQP_STAG_REM_ACC_EN; > + } > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_VA_LOW_IDX] = > cpu_to_le32((u32)*iova_start); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_VA_HIGH_IDX] = > cpu_to_le32((u32)((((u64)*iova_start)>>32))); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_LEN_LOW_IDX] = > cpu_to_le32((u32)region_length); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_LEN_HIGH_PD_IDX] = > cpu_to_le32((u32)(region_length>>8)&0xff000000); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_LEN_HIGH_PD_IDX] |= > cpu_to_le32(nespd->pd_id&0x00007fff); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_STAG_IDX] = > cpu_to_le32(stag); > + > + if (pbl_count == 0) { > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_LOW_IDX] = > cpu_to_le32((u32)single_buffer); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_HIGH_IDX] = > cpu_to_le32((u32)((((u64)single_buffer)>>32))); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX] = > 0; > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_LEN_IDX] = 0; > + } else { > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_LOW_IDX] = > cpu_to_le32((u32)root_vpbl->pbl_pbase); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PA_HIGH_IDX] = > cpu_to_le32((u32)((((u64)root_vpbl->pbl_pbase)>>32))); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX] = > cpu_to_le32(pbl_count); > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_LEN_IDX] = > cpu_to_le32(((pbl_count-1)*4096)+(residual_page_count*8)); > + if ((pbl_count > 1)||(residual_page_count > 32)) { > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= > NES_CQP_STAG_PBL_BLK_SIZE; > + } > + } > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > cpu_to_le32(cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX]); > + > + barrier(); > + > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("%s: Register STag 0x%08X completed, wait_event_timeout > ret = %u.\n", __FUNCTION__, stag, ret); > + /* TODO: Catch error code... */ > + > + return 0; > +} > + > + > +/** > + * nes_reg_phys_mr > + * > + * @param ib_pd > + * @param buffer_list > + * @param num_phys_buf > + * @param acc > + * @param iova_start > + * > + * @return struct ib_mr* > + */ > +static struct ib_mr *nes_reg_phys_mr(struct ib_pd *ib_pd, > + > struct ib_phys_buf *buffer_list, > + > int num_phys_buf, int acc, u64 * iova_start) { > + u64 region_length; > + struct nes_pd *nespd = to_nespd(ib_pd); > + struct nes_dev *nesdev = to_nesdev(ib_pd->device); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_mr *nesmr; > + struct ib_mr *ibmr; > + struct nes_vpbl vpbl; > + struct nes_root_vpbl root_vpbl; > + u32 stag; > + u32 i; > + u32 stag_index = 0; > + u32 next_stag_index = 0; > + u32 driver_key = 0; > + u32 root_pbl_index = 0; > + u32 cur_pbl_index = 0; > + int err = 0, pbl_depth = 0; > + int ret = 0; > + u16 pbl_count = 0; > + u8 single_page = 1; > + u8 stag_key = 0; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + pbl_depth = 0; > + region_length = 0; > + vpbl.pbl_vbase = NULL; > + root_vpbl.pbl_vbase = NULL; > + root_vpbl.pbl_pbase = 0; > + > + get_random_bytes(&next_stag_index, sizeof(next_stag_index)); > + stag_key = (u8)next_stag_index; > + > + driver_key = 0; > + > + next_stag_index >>= 8; > + next_stag_index %= nesadapter->max_mr; > + if (num_phys_buf > (1024*512)){ > + return ERR_PTR(-E2BIG); > + } > + > + err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, > nesadapter->max_mr, &stag_index, &next_stag_index); > + if (err) { > + return ERR_PTR(err); > + } > + > + nesmr = kmalloc(sizeof(*nesmr), GFP_KERNEL); > + if (!nesmr) { > + nes_free_resource(nesadapter, nesadapter->allocated_mrs, > stag_index); > + return ERR_PTR(-ENOMEM); > + } > + > + for (i = 0; i < num_phys_buf; i++) { > + > + if ((i & 0x01FF) == 0) { > + if (1 == root_pbl_index) { > + /* Allocate the root PBL */ > + root_vpbl.pbl_vbase = > pci_alloc_consistent(nesdev->pcidev, 8192, > + > &root_vpbl.pbl_pbase); > + dprintk("%s: Allocating root PBL, va = > %p, pa = 0x%08X\n", > + __FUNCTION__, > root_vpbl.pbl_vbase, (unsigned int)root_vpbl.pbl_pbase); > + if (!root_vpbl.pbl_vbase) { > + > pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, > + > vpbl.pbl_pbase); > + nes_free_resource(nesadapter, > nesadapter->allocated_mrs, stag_index); > + kfree(nesmr); > + return ERR_PTR(-ENOMEM); > + } > + root_vpbl.leaf_vpbl = > kmalloc(sizeof(*root_vpbl.leaf_vpbl)*1024, GFP_KERNEL); > + if (!root_vpbl.leaf_vpbl) { > + > pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, > + > root_vpbl.pbl_pbase); > + > pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, > + > vpbl.pbl_pbase); > + nes_free_resource(nesadapter, > nesadapter->allocated_mrs, stag_index); > + kfree(nesmr); > + return ERR_PTR(-ENOMEM); > + } > + root_vpbl.pbl_vbase[0].pa_low = > cpu_to_le32((u32)vpbl.pbl_pbase); > + root_vpbl.pbl_vbase[0].pa_high = > cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); > + root_vpbl.leaf_vpbl[0] = vpbl; > + } > + /* Allocate a 4K buffer for the PBL */ > + vpbl.pbl_vbase = > pci_alloc_consistent(nesdev->pcidev, 4096, > + > &vpbl.pbl_pbase); > + dprintk("%s: Allocating leaf PBL, va = %p, pa = > 0x%016lX\n", > + __FUNCTION__, vpbl.pbl_vbase, > (unsigned long)vpbl.pbl_pbase); > + if (!vpbl.pbl_vbase) { > + /* TODO: Unwind allocated buffers */ > + nes_free_resource(nesadapter, > nesadapter->allocated_mrs, stag_index); > + ibmr = ERR_PTR(-ENOMEM); > + kfree(nesmr); > + goto reg_phys_err; > + } > + /* Fill in the root table */ > + if (1 <= root_pbl_index) { > + > root_vpbl.pbl_vbase[root_pbl_index].pa_low = > cpu_to_le32((u32)vpbl.pbl_pbase); > + > root_vpbl.pbl_vbase[root_pbl_index].pa_high = > cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); > + root_vpbl.leaf_vpbl[root_pbl_index] = > vpbl; > + } > + root_pbl_index++; > + cur_pbl_index = 0; > + } > + if (buffer_list[i].addr & ~PAGE_MASK) { > + /* TODO: Unwind allocated buffers */ > + nes_free_resource(nesadapter, > nesadapter->allocated_mrs, stag_index); > + dprintk("Unaligned Memory Buffer: 0x%x\n", > + (unsigned int) > buffer_list[i].addr); > + ibmr = ERR_PTR(-EINVAL); > + kfree(nesmr); > + goto reg_phys_err; > + } > + > + if (!buffer_list[i].size) { > + /* TODO: Unwind allocated buffers */ > + nes_free_resource(nesadapter, > nesadapter->allocated_mrs, stag_index); > + dprintk("Invalid Buffer Size\n"); > + ibmr = ERR_PTR(-EINVAL); > + kfree(nesmr); > + goto reg_phys_err; > + } > + > + region_length += buffer_list[i].size; > + if ((i != 0) && (single_page)) { > + if ((buffer_list[i-1].addr+PAGE_SIZE) != > buffer_list[i].addr) > + single_page = 0; > + } > + vpbl.pbl_vbase[cur_pbl_index].pa_low = > cpu_to_le32((u32)buffer_list[i].addr); > + vpbl.pbl_vbase[cur_pbl_index++].pa_high = > cpu_to_le32((u32)((((u64)buffer_list[i].addr)>>32))); > + } > + > + stag = stag_index<<8; > + stag |= driver_key; > + /* TODO: key should come from consumer */ > + stag += (u32)stag_key; > + > + dprintk("%s: Registering STag 0x%08X, VA = 0x%016lX, length = > 0x%016lX, index = 0x%08X\n", > + __FUNCTION__, stag, (unsigned long)*iova_start, > (unsigned long)region_length, stag_index); > + > + /* TODO: Should the region length be reduced by iova_start > &PAGE_MASK, think so */ > + region_length -= (*iova_start)&PAGE_MASK; > + > + /* Make the leaf PBL the root if only one PBL */ > + if (root_pbl_index == 1) { > + root_vpbl.pbl_pbase = vpbl.pbl_pbase; > + } > + > + if (single_page) { > + pbl_count = 0; > + } else { > + pbl_count = root_pbl_index; > + } > + ret = nes_reg_mr( nesdev, nespd, stag, region_length, > &root_vpbl, > + buffer_list[0].addr, > pbl_count, (u16)cur_pbl_index, > + acc, iova_start); > + > + if (ret == 0) { > + nesmr->ibmr.rkey = stag; > + nesmr->ibmr.lkey = stag; > + nesmr->mode = IWNES_MEMREG_TYPE_MEM; > + ibmr = &nesmr->ibmr; > + nesmr->pbl_4k = ((pbl_count>1)||(cur_pbl_index>32)) ? 1 > : 0; > + nesmr->pbls_used = pbl_count; > + if (pbl_count > 1) { > + nesmr->pbls_used++; > + } > + } else { > + kfree(nesmr); > + ibmr = ERR_PTR(-ENOMEM); > + } > + > +reg_phys_err: > + /* free the resources */ > + if (root_pbl_index == 1) { > + /* single PBL case */ > + pci_free_consistent(nesdev->pcidev, 4096, > vpbl.pbl_vbase, > + vpbl.pbl_pbase); > + } else { > + for (i=0; i + pci_free_consistent(nesdev->pcidev, 4096, > root_vpbl.leaf_vpbl[i].pbl_vbase, > + > root_vpbl.leaf_vpbl[i].pbl_pbase); > + } > + kfree(root_vpbl.leaf_vpbl); > + pci_free_consistent(nesdev->pcidev, 8192, > root_vpbl.pbl_vbase, > + > root_vpbl.pbl_pbase); > + } > + > + return ibmr; > +} > + > + > +/** > + * nes_get_dma_mr > + * > + * @param pd > + * @param acc > + * > + * @return struct ib_mr* > + */ > +static struct ib_mr *nes_get_dma_mr(struct ib_pd *pd, int acc) { > + struct ib_phys_buf bl; > + u64 kva = 0; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + bl.size = 0xffffffffff; > + bl.addr = 0; > + return nes_reg_phys_mr(pd, &bl, 1, acc, &kva); This doesn't support high addresses. Chelsio has a similar issue. I don't really know what to do about this... > +} > + > + > +/** > + * nes_reg_user_mr > + * > + * @param pd > + * @param region > + * @param acc > + * @param udata > + * > + * @return struct ib_mr* > + */ > +static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, struct ib_umem > *region, > + > int acc, struct ib_udata *udata) > +{ > + u64 iova_start; > + u64 *pbl; > + u64 region_length; > + dma_addr_t last_dma_addr = 0; > + dma_addr_t first_dma_addr = 0; > + struct nes_pd *nespd = to_nespd(pd); > + struct nes_dev *nesdev = to_nesdev(pd->device); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct ib_mr *ibmr; > + struct ib_umem_chunk *chunk; > + struct nes_ucontext *nes_ucontext; > + struct nes_pbl *nespbl; > + struct nes_mr *nesmr; > + struct nes_mem_reg_req req; > + struct nes_vpbl vpbl; > + struct nes_root_vpbl root_vpbl; > + int j; > + int page_count = 0; > + int err, pbl_depth = 0; > + int ret; > + u32 stag; > + u32 stag_index = 0; > + u32 next_stag_index; > + u32 driver_key; > + u32 root_pbl_index = 0; > + u32 cur_pbl_index = 0; > + u16 pbl_count; > + u8 single_page = 1; > + u8 stag_key; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + dprintk("%s: User base = 0x%lX, Virt base = 0x%lX, length = %u, > offset = %u, page size = %u.\n", > + __FUNCTION__, region->user_base, > region->virt_base, (u32)region->length, region->offset, > region->page_size); > + > + if (ib_copy_from_udata(&req, udata, sizeof(req))) > + return ERR_PTR(-EFAULT); > + dprintk("%s: Memory Registration type = %08X.\n", __FUNCTION__, > req.reg_type); > + > + switch (req.reg_type) { > + case IWNES_MEMREG_TYPE_MEM: > + pbl_depth = 0; > + region_length = 0; > + vpbl.pbl_vbase = NULL; > + root_vpbl.pbl_vbase = NULL; > + root_vpbl.pbl_pbase = 0; > + > + get_random_bytes(&next_stag_index, > sizeof(next_stag_index)); > + stag_key = (u8)next_stag_index; > + > + driver_key = 0; > + > + next_stag_index >>= 8; > + next_stag_index %= nesadapter->max_mr; > + > + err = nes_alloc_resource(nesadapter, > nesadapter->allocated_mrs, nesadapter->max_mr, &stag_index, > &next_stag_index); > + if (err) { > + return ERR_PTR(err); > + } > + > + nesmr = kmalloc(sizeof(*nesmr), GFP_KERNEL); > + if (!nesmr) { > + nes_free_resource(nesadapter, > nesadapter->allocated_mrs, stag_index); > + return ERR_PTR(-ENOMEM); > + } > + > + /* todo: make this code and reg_phy_mr loop more > common!!! */ > + list_for_each_entry(chunk, ®ion->chunk_list, > list) { > + dprintk("%s: Chunk: nents = %u, nmap = > %u .\n", __FUNCTION__, chunk->nents, chunk->nmap ); > + for (j = 0; j < chunk->nmap; ++j) { > + dprintk("%s: \tsg_dma_addr = > 0x%08lx, length = %u.\n", > + __FUNCTION__, > (unsigned long)sg_dma_address(&chunk->page_list[j]), > sg_dma_len(&chunk->page_list[j]) ); > + > + if ((page_count&0x01FF) == 0) { > + if > (page_count>(1024*512)) { > + > pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, > + > vpbl.pbl_pbase); > + > nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); > + kfree(nesmr); > + return > ERR_PTR(-E2BIG); > + } > + if (1 == root_pbl_index) > { > + > root_vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 8192, > + > &root_vpbl.pbl_pbase); > + dprintk("%s: > Allocating root PBL, va = %p, pa = 0x%08X\n", > + > __FUNCTION__, root_vpbl.pbl_vbase, (unsigned int)root_vpbl.pbl_pbase); > + if > (!root_vpbl.pbl_vbase) { > + > pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, > + > vpbl.pbl_pbase); > + > nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); > + > kfree(nesmr); > + return > ERR_PTR(-ENOMEM); > + } > + > root_vpbl.leaf_vpbl = kmalloc(sizeof(*root_vpbl.leaf_vpbl)*1024, > GFP_KERNEL); > + if > (!root_vpbl.leaf_vpbl) { > + > pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, > + > root_vpbl.pbl_pbase); > + > pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, > + > vpbl.pbl_pbase); > + > nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); > + > kfree(nesmr); > + return > ERR_PTR(-ENOMEM); > + } > + > root_vpbl.pbl_vbase[0].pa_low = cpu_to_le32((u32)vpbl.pbl_pbase); > + > root_vpbl.pbl_vbase[0].pa_high = > cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); > + > root_vpbl.leaf_vpbl[0] = vpbl; > + } > + vpbl.pbl_vbase = > pci_alloc_consistent(nesdev->pcidev, 4096, > + > &vpbl.pbl_pbase); > + dprintk("%s: Allocating > leaf PBL, va = %p, pa = 0x%08X\n", > + > __FUNCTION__, vpbl.pbl_vbase, (unsigned int)vpbl.pbl_pbase); > + if (!vpbl.pbl_vbase) { > + > nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); > + ibmr = > ERR_PTR(-ENOMEM); > + kfree(nesmr); > + goto > reg_user_mr_err; > + } > + if (1 <= root_pbl_index) > { > + > root_vpbl.pbl_vbase[root_pbl_index].pa_low = > cpu_to_le32((u32)vpbl.pbl_pbase); > + > root_vpbl.pbl_vbase[root_pbl_index].pa_high = > cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); > + > root_vpbl.leaf_vpbl[root_pbl_index] = vpbl; > + } > + root_pbl_index++; > + cur_pbl_index = 0; > + } > + if > (sg_dma_address(&chunk->page_list[j]) & ~PAGE_MASK) { > + > nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); > + dprintk("%s: Unaligned > Memory Buffer: 0x%x\n", __FUNCTION__, > + > (unsigned int) sg_dma_address(&chunk->page_list[j])); > + ibmr = ERR_PTR(-EINVAL); > + kfree(nesmr); > + goto reg_user_mr_err; > + } > + > + if > (!sg_dma_len(&chunk->page_list[j])) { > + > nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); > + dprintk("%s: Invalid > Buffer Size\n", __FUNCTION__); > + ibmr = ERR_PTR(-EINVAL); > + kfree(nesmr); > + goto reg_user_mr_err; > + } > + > + region_length += > sg_dma_len(&chunk->page_list[j]); > + if (single_page) { > + if (page_count != 0) { > + if > ((last_dma_addr+PAGE_SIZE) != sg_dma_address(&chunk->page_list[j])) > + > single_page = 0; > + last_dma_addr = > sg_dma_address(&chunk->page_list[j]); > + } else { > + first_dma_addr = > sg_dma_address(&chunk->page_list[j]); > + last_dma_addr = > first_dma_addr; > + } > + } > + > + > vpbl.pbl_vbase[cur_pbl_index].pa_low = > cpu_to_le32((u32)sg_dma_address(&chunk->page_list[j])); > + > vpbl.pbl_vbase[cur_pbl_index].pa_high = > cpu_to_le32((u32)((((u64)sg_dma_address(&chunk->page_list[j]))>>32))); > + dprintk("%s: PBL %u (@%p) = > 0x%08X:%08X\n", __FUNCTION__, cur_pbl_index, > + > &vpbl.pbl_vbase[cur_pbl_index], vpbl.pbl_vbase[cur_pbl_index].pa_high, > + > vpbl.pbl_vbase[cur_pbl_index].pa_low); > + cur_pbl_index++; > + page_count++; > + } > + } > + stag = stag_index<<8; > + stag |= driver_key; > + /* TODO: key should come from consumer */ > + stag += (u32)stag_key; > + > + iova_start = (u64)region->virt_base; > + dprintk("%s: Registering STag 0x%08X, VA = > 0x%08X, length = 0x%08X, index = 0x%08X, region->length=0x%08x\n", > + __FUNCTION__, stag, (unsigned > int)iova_start, (unsigned int)region_length, stag_index, > region->length); > + > + > + /* Make the leaf PBL the root if only one PBL */ > + if (root_pbl_index == 1) { > + root_vpbl.pbl_pbase = vpbl.pbl_pbase; > + } > + > + if (single_page) { > + pbl_count = 0; > + } else { > + pbl_count = root_pbl_index; > + first_dma_addr = 0; > + } > + ret = nes_reg_mr( nesdev, nespd, stag, > region->length, &root_vpbl, > + > first_dma_addr, pbl_count, (u16)cur_pbl_index, > + acc, > &iova_start); > + > + if (ret == 0) { > + nesmr->ibmr.rkey = stag; > + nesmr->ibmr.lkey = stag; > + nesmr->mode = IWNES_MEMREG_TYPE_MEM; > + ibmr = &nesmr->ibmr; > + nesmr->pbl_4k = > ((pbl_count>1)||(cur_pbl_index>32)) ? 1 : 0; > + nesmr->pbls_used = pbl_count; > + if (pbl_count > 1) { > + nesmr->pbls_used++; > + } > + } else { > + kfree(nesmr); > + ibmr = ERR_PTR(-ENOMEM); > + } > + > +reg_user_mr_err: > + /* free the resources */ > + if (root_pbl_index == 1) { > + pci_free_consistent(nesdev->pcidev, > 4096, vpbl.pbl_vbase, > + > vpbl.pbl_pbase); > + } else { > + for (j=0; j + > pci_free_consistent(nesdev->pcidev, 4096, > root_vpbl.leaf_vpbl[j].pbl_vbase, > + > root_vpbl.leaf_vpbl[j].pbl_pbase); > + } > + kfree(root_vpbl.leaf_vpbl); > + pci_free_consistent(nesdev->pcidev, > 8192, root_vpbl.pbl_vbase, > + > root_vpbl.pbl_pbase); > + } > + > + return ibmr; > + break; > + case IWNES_MEMREG_TYPE_QP: > + return ERR_PTR(-ENOSYS); > + break; > + case IWNES_MEMREG_TYPE_CQ: > + nespbl = kmalloc(sizeof(*nespbl), GFP_KERNEL); > + if (!nespbl) { > + dprintk("%s: Unable to allocate PBL\n", > __FUNCTION__); > + return ERR_PTR(-ENOMEM); > + } > + memset(nespbl, 0, sizeof(*nespbl)); > + nesmr = kmalloc(sizeof(*nesmr), GFP_KERNEL); > + if (!nesmr) { > + kfree(nespbl); > + dprintk("%s: Unable to allocate > nesmr\n", __FUNCTION__); > + return ERR_PTR(-ENOMEM); > + } > + memset(nesmr, 0, sizeof(*nesmr)); > + nes_ucontext = > to_nesucontext(pd->uobject->context); > + pbl_depth = region->length >> PAGE_SHIFT; > + pbl_depth += (region->length & ~PAGE_MASK) ? 1 : > 0; > + nespbl->pbl_size = pbl_depth*sizeof(u64); > + dprintk("%s: Attempting to allocate CQ PBL > memory, %u bytes, %u entries.\n", __FUNCTION__, nespbl->pbl_size, > pbl_depth ); > + pbl = pci_alloc_consistent(nesdev->pcidev, > nespbl->pbl_size, > + > &nespbl->pbl_pbase); > + if (!pbl) { > + kfree(nesmr); > + kfree(nespbl); > + dprintk("%s: Unable to allocate cq PBL > memory\n", __FUNCTION__); > + return ERR_PTR(-ENOMEM); > + } > + > + nespbl->pbl_vbase = pbl; > + nespbl->user_base = region->user_base; > + > + list_for_each_entry(chunk, ®ion->chunk_list, > list) { > + for (j = 0; j < chunk->nmap; ++j) { > + *pbl++ = > cpu_to_le64((u64)sg_dma_address(&chunk->page_list[j])); > + } > + } > + list_add_tail(&nespbl->list, > &nes_ucontext->cq_reg_mem_list); > + nesmr->ibmr.rkey = -1; > + nesmr->ibmr.lkey = -1; > + nesmr->mode = IWNES_MEMREG_TYPE_CQ; > + return &nesmr->ibmr; > + break; > + } > + > + return ERR_PTR(-ENOSYS); > +} > + > + > +/** > + * nes_dereg_mr > + * > + * @param ib_mr > + * > + * @return int > + */ > +static int nes_dereg_mr(struct ib_mr *ib_mr) > +{ > + struct nes_mr *nesmr = to_nesmr(ib_mr); > + struct nes_dev *nesdev = to_nesdev(ib_mr->device); > + struct nes_adapter *nesadapter = nesdev->nesadapter; > + struct nes_hw_cqp_wqe *cqp_wqe; > + u32 cqp_head; > + int err; > + unsigned long flags; > + int ret; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + if (nesmr->mode != IWNES_MEMREG_TYPE_MEM) { > + /* TODO: Any cross checking with CQ/QP that owned? */ > + kfree(nesmr); > + return 0; > + } > + > + /* Deallocate the region with the adapter */ > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + > + if (0 != nesmr->pbls_used) { > + if (nesmr->pbl_4k) { > + nesadapter->free_4kpbl += nesmr->pbls_used; > + if (nesadapter->free_4kpbl > > nesadapter->max_4kpbl) { > + printk(KERN_ERR PFX "free 4KB PBLs(%u) > has exceeded the max(%u)\n", > + nesadapter->free_4kpbl, > nesadapter->max_4kpbl); > + } > + } else { > + nesadapter->free_256pbl += nesmr->pbls_used; > + if (nesadapter->free_256pbl > > nesadapter->max_256pbl) { > + printk(KERN_ERR PFX "free 256B PBLs(%u) > has exceeded the max(%u)\n", > + nesadapter->free_256pbl, > nesadapter->max_256pbl); > + } > + } > + } > + > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > NES_CQP_DEALLOCATE_STAG | NES_CQP_STAG_VA_TO | > + NES_CQP_STAG_DEALLOC_PBLS | > NES_CQP_STAG_MR; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0; > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_BLK_COUNT_IDX] = 0; > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_PBL_LEN_IDX] = 0; > + cqp_wqe->wqe_words[NES_CQP_STAG_WQE_STAG_IDX] = ib_mr->rkey; > + > + barrier(); > + > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > + dprintk("Waiting for deallocate STag 0x%08X to complete.\n", > ib_mr->rkey); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("Deallocate STag completed, wait_event_timeout ret = > %u.\n", ret); > + /* TODO: Catch error code... */ > + > + nes_free_resource(nesadapter, nesadapter->allocated_mrs, > (ib_mr->rkey&0x0fffff00)>>8); > + > + err = 0; > + if (err) > + dprintk("nes_stag_dealloc failed: %d\n", err); > + else > + kfree(nesmr); > + > + return err; > +} > + > + > +/** > + * show_rev > + * > + * @param cdev > + * @param buf > + * > + * @return ssize_t > + */ > +static ssize_t show_rev(struct class_device *cdev, char *buf) > +{ > + struct nes_dev *nesdev = container_of(cdev, struct nes_dev, > ibdev.class_dev); > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return sprintf(buf, "%x\n", nesdev->nesadapter->hw_rev); > +} > + > + > +/** > + * show_fw_ver > + * > + * @param cdev > + * @param buf > + * > + * @return ssize_t > + */ > +static ssize_t show_fw_ver(struct class_device *cdev, char *buf) > +{ > + struct nes_dev *nesdev = container_of(cdev, struct nes_dev, > ibdev.class_dev); > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return sprintf(buf, "%x.%x.%x\n", > + (int) (nesdev->nesadapter->fw_ver >> > 32), > + (int) (nesdev->nesadapter->fw_ver >> > 16) & 0xffff, > + (int) (nesdev->nesadapter->fw_ver & > 0xffff)); > +} > + > + > +/** > + * show_hca > + * > + * @param cdev > + * @param buf > + * > + * @return ssize_t > + */ > +static ssize_t show_hca(struct class_device *cdev, char *buf) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return sprintf(buf, "NES010\n"); > +} > + > + > +/** > + * show_board > + * > + * @param cdev > + * @param buf > + * > + * @return ssize_t > + */ > +static ssize_t show_board(struct class_device *cdev, char *buf) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return sprintf(buf, "%.*s\n", 32, "NES010 Board ID"); > +} > + > +static CLASS_DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL); > +static CLASS_DEVICE_ATTR(fw_ver, S_IRUGO, show_fw_ver, NULL); > +static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca, NULL); > +static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board, NULL); > + > +static struct class_device_attribute *nes_class_attributes[] = { > + &class_device_attr_hw_rev, > + &class_device_attr_fw_ver, > + &class_device_attr_hca_type, > + &class_device_attr_board_id > +}; > + > + > +/** > + * nes_query_qp > + * > + * @param qp > + * @param qp_attr > + * @param qp_attr_mask > + * @param qp_init_attr > + * > + * @return int > + */ > +static int nes_query_qp(struct ib_qp *qp, > + struct ib_qp_attr > *qp_attr, > + int qp_attr_mask, > + struct ib_qp_init_attr > *qp_init_attr) > +{ > + int err; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + // TODO: Do work here > + err = 0; > + > + return err; > +} > + > + > +/** > + * nes_modify_qp > + * > + * @param ibqp > + * @param attr > + * @param attr_mask > + * > + * @return int > + */ > +int nes_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, > + int attr_mask) > +{ > + u64 u64temp; > + struct nes_qp *nesqp = to_nesqp(ibqp); > + struct nes_dev *nesdev = to_nesdev(ibqp->device); > + struct nes_hw_cqp_wqe *cqp_wqe; > + struct iw_cm_id *cm_id = nesqp->cm_id; > + struct iw_cm_event cm_event; > + u8 abrupt_disconnect = 0; > + u32 cqp_head; > +// u32 counter; > + u32 next_iwarp_state = 0; > + int err; > + /* TODO: don't need both of these!!! */ > + unsigned long flags; > + unsigned long qplockflags; > + int ret; > + u8 issue_modify_qp = 0; > + u8 issue_disconnect = 0; > + > + spin_lock_irqsave(&nesqp->lock, qplockflags); > +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + dprintk("%s:QP%u: QP State = %u, cur QP State = %u, iwarp_state > = 0x%X. \n", > + __FUNCTION__, nesqp->hwqp.qp_id, attr->qp_state, > nesqp->ibqp_state, nesqp->iwarp_state); > + dprintk("%s:QP%u: QP Access Flags = 0x%X, attr_mask = 0x%0x. > \n", > + __FUNCTION__, nesqp->hwqp.qp_id, > attr->qp_access_flags, attr_mask ); > + > + > + if (attr_mask & IB_QP_STATE) { > + switch (attr->qp_state) { > + case IB_QPS_INIT: > + dprintk("%s:QP%u: new state = init. \n", > + __FUNCTION__, nesqp->hwqp.qp_id > ); > + if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_IDLE) { > + /* TODO: Need to add code to handle back > from error or closing */ > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + } > + next_iwarp_state = NES_CQP_QP_IWARP_STATE_IDLE; > + issue_modify_qp = 1; > + break; > + case IB_QPS_RTR: > + dprintk("%s:QP%u: new state = rtr. \n", > + __FUNCTION__, nesqp->hwqp.qp_id > ); > + if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_IDLE) { > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + } > + next_iwarp_state = NES_CQP_QP_IWARP_STATE_IDLE; > + issue_modify_qp = 1; > + break; > + case IB_QPS_RTS: > + dprintk("%s:QP%u: new state = rts. \n", > + __FUNCTION__, nesqp->hwqp.qp_id > ); > + if (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_RTS) { > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + } > + next_iwarp_state = NES_CQP_QP_IWARP_STATE_RTS; > + if (nesqp->iwarp_state != > NES_CQP_QP_IWARP_STATE_RTS) > + next_iwarp_state |= > NES_CQP_QP_CONTEXT_VALID | NES_CQP_QP_ARP_VALID | NES_CQP_QP_ORD_VALID; > + issue_modify_qp = 1; > + break; > + case IB_QPS_SQD: > + dprintk("%s:QP%u: new state = closing. SQ head = %u, SQ > tail = %u. \n", > + __FUNCTION__, nesqp->hwqp.qp_id, > nesqp->hwqp.sq_head, nesqp->hwqp.sq_tail ); > + if > (nesqp->iwarp_state==(u32)NES_CQP_QP_IWARP_STATE_CLOSING) { > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return 0; > + } else if > (nesqp->iwarp_state>(u32)NES_CQP_QP_IWARP_STATE_CLOSING) { > + dprintk("%s:QP%u: State change to closing ignored > due to current iWARP state. \n", > + __FUNCTION__, nesqp->hwqp.qp_id ); > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + } > + next_iwarp_state = NES_CQP_QP_IWARP_STATE_CLOSING; > + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_RTS){ > + issue_disconnect = 1; > + } else > + if (nesqp->iwarp_state == > NES_CQP_QP_IWARP_STATE_IDLE) { > + /* Free up the connect_worker thread if > needed */ > + if (nesqp->ksock) { > + nes_sock_release( nesqp, > &qplockflags ); > + } > + } > + break; > + case IB_QPS_SQE: > + dprintk("%s:QP%u: new state = terminate. \n", > + __FUNCTION__, nesqp->hwqp.qp_id ); > + if > (nesqp->iwarp_state>=(u32)NES_CQP_QP_IWARP_STATE_TERMINATE) { > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + } > + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_RTS){ > + issue_disconnect = 1; > + abrupt_disconnect = 1; > + } > + next_iwarp_state = NES_CQP_QP_IWARP_STATE_TERMINATE; > + issue_modify_qp = 1; > + break; > + case IB_QPS_ERR: > + case IB_QPS_RESET: > + if (nesqp->iwarp_state==(u32)NES_CQP_QP_IWARP_STATE_ERROR) > { > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + } > + dprintk("%s:QP%u: new state = error. \n", > + __FUNCTION__, nesqp->hwqp.qp_id > ); > + next_iwarp_state = NES_CQP_QP_IWARP_STATE_ERROR; > + if (nesqp->iwarp_state == NES_CQP_QP_IWARP_STATE_RTS){ > + issue_disconnect = 1; > + } > + issue_modify_qp = 1; > + break; > + default: > + spin_unlock_irqrestore(&nesqp->lock, > qplockflags); > + return -EINVAL; > + break; > + } > + > + /* TODO: Do state checks */ > + > + nesqp->ibqp_state = attr->qp_state; > + if ( ((nesqp->iwarp_state & NES_CQP_QP_IWARP_STATE_MASK) == > (u32)NES_CQP_QP_IWARP_STATE_RTS) && > + ((next_iwarp_state & NES_CQP_QP_IWARP_STATE_MASK) > > (u32)NES_CQP_QP_IWARP_STATE_RTS)) { > + nesqp->iwarp_state = next_iwarp_state & > NES_CQP_QP_IWARP_STATE_MASK; > + issue_disconnect = 1; > + } else > + nesqp->iwarp_state = next_iwarp_state & > NES_CQP_QP_IWARP_STATE_MASK; > + /* TODO: nesqp->iwarp_state vs.next_iwarp_state */ > + } > + > + if (attr_mask & IB_QP_ACCESS_FLAGS) { > + if (attr->qp_access_flags & IB_ACCESS_LOCAL_WRITE) { > + /* TODO: had to add rdma read here for user mode access, > doesn't seem quite correct */ > + /* actually, might need to remove rdma write here too > */ > + nesqp->nesqp_context->misc |= > NES_QPCONTEXT_MISC_RDMA_WRITE_EN | NES_QPCONTEXT_MISC_RDMA_READ_EN; > + issue_modify_qp = 1; > + } > + if (attr->qp_access_flags & IB_ACCESS_REMOTE_WRITE) { > + nesqp->nesqp_context->misc |= > NES_QPCONTEXT_MISC_RDMA_WRITE_EN; > + issue_modify_qp = 1; > + } > + if (attr->qp_access_flags & IB_ACCESS_REMOTE_READ) { > + nesqp->nesqp_context->misc |= > NES_QPCONTEXT_MISC_RDMA_READ_EN; > + issue_modify_qp = 1; > + } > + if (attr->qp_access_flags & IB_ACCESS_MW_BIND) { > + nesqp->nesqp_context->misc |= > NES_QPCONTEXT_MISC_WBIND_EN; > + issue_modify_qp = 1; > + } > + } > + > + if (issue_disconnect) > + { > + dprintk("%s:QP%u: Issuing Disconnect.\n", __FUNCTION__, > nesqp->hwqp.qp_id ); > + } > + spin_unlock_irqrestore(&nesqp->lock, qplockflags); > + if (issue_disconnect) > + { > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > cpu_to_le32(NES_CQP_UPLOAD_CONTEXT | NES_CQP_QP_TYPE_IWARP); > + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = > cpu_to_le32(nesqp->hwqp.qp_id); > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = > 0; > + u64temp = (u64)nesqp->nesqp_context_pbase; > + cqp_wqe->wqe_words[NES_CQP_UPLOAD_WQE_CTXT_LOW_IDX] = > cpu_to_le32((u32)u64temp); > + cqp_wqe->wqe_words[NES_CQP_UPLOAD_WQE_CTXT_HIGH_IDX] = > cpu_to_le32((u32)(u64temp>>32)); > + /* TODO: this value should already be swapped? */ > + cqp_wqe->wqe_words[NES_CQP_UPLOAD_WQE_HTE_IDX] = > nesqp->nesqp_context->hte_index; > + > + barrier(); > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > +// dprintk("Waiting for modify iWARP QP%u to complete.\n", > nesqp->hwqp.qp_id); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + > + /* TODO: Catch error code... */ > + nes_disconnect(nesqp->cm_id, abrupt_disconnect); > + > + dprintk("%s:Generating a Close Complete Event (reset) > for QP%u \n", > + __FUNCTION__, nesqp->hwqp.qp_id); > + /* Send up the close complete event */ > + cm_event.event = IW_CM_EVENT_CLOSE; > + cm_event.status = IW_CM_EVENT_STATUS_OK; > + cm_event.provider_data = cm_id->provider_data; > + cm_event.local_addr = cm_id->local_addr; > + cm_event.remote_addr = cm_id->remote_addr; > + cm_event.private_data = NULL; > + cm_event.private_data_len = 0; > + > + cm_id->event_handler(cm_id, &cm_event); > + > + } > + > + if (issue_modify_qp) { > + spin_lock_irqsave(&nesdev->cqp.lock, flags); > + > + cqp_head = nesdev->cqp.sq_head++; > + nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1; > + cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head]; > + cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = > NES_CQP_MODIFY_QP | NES_CQP_QP_TYPE_IWARP | next_iwarp_state; > + cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = > nesqp->hwqp.qp_id; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0; > + *((struct nes_hw_cqp > **)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = > cqp_head; > + cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = > 0; > + u64temp = (u64)nesqp->nesqp_context_pbase; > + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_LOW_IDX] = > (u32)u64temp; > + cqp_wqe->wqe_words[NES_CQP_QP_WQE_CONTEXT_HIGH_IDX] = > (u32)(u64temp>>32); > + > + barrier(); > + // Ring doorbell (1 WQEs) > + nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x01800000 | > nesdev->cqp.qp_id ); > + > + /* Wait for CQP */ > + spin_unlock_irqrestore(&nesdev->cqp.lock, flags); > +// dprintk("Waiting for modify iWARP QP%u to complete.\n", > nesqp->hwqp.qp_id); > + cqp_head = (cqp_head+1)&(nesdev->cqp.sq_size-1); > + ret = > wait_event_timeout(nesdev->cqp.waitq,(nesdev->cqp.sq_tail==cqp_head), > 2); > + dprintk("Modify iwarp QP%u completed, wait_event_timeout > ret = %u, nesdev->cqp.sq_head = %u nesdev->cqp.sq_tail = %u.\n", > + nesqp->hwqp.qp_id, ret, > nesdev->cqp.sq_head, nesdev->cqp.sq_tail); > + /* TODO: Catch error code... */ > + } > + > + err = 0; > + > + return err; > +} > + > + > +/** > + * nes_muticast_attach > + * > + * @param ibqp > + * @param gid > + * @param lid > + * > + * @return int > + */ > +static int nes_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, > u16 lid) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return -ENOSYS; > +} > + > + > +/** > + * nes_multicast_detach > + * > + * @param ibqp > + * @param gid > + * @param lid > + * > + * @return int > + */ > +static int nes_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, > u16 lid) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return -ENOSYS; > +} > + > + > +/** > + * nes_process_mad > + * > + * @param ibdev > + * @param mad_flags > + * @param port_num > + * @param in_wc > + * @param in_grh > + * @param in_mad > + * @param out_mad > + * > + * @return int > + */ > +static int nes_process_mad(struct ib_device *ibdev, > + int mad_flags, > + u8 port_num, > + struct ib_wc *in_wc, > + struct ib_grh > *in_grh, > + struct ib_mad > *in_mad, struct ib_mad *out_mad) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return -ENOSYS; > +} > + > + > +/** > + * nes_post_send > + * > + * @param ibqp > + * @param ib_wr > + * @param bad_wr > + * > + * @return int > + */ > +static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, > + struct ib_send_wr **bad_wr) > +{ > + struct nes_dev *nesdev = to_nesdev(ibqp->device); > + struct nes_qp *nesqp = to_nesqp(ibqp); > + u32 qsize = nesqp->hwqp.sq_size; > + struct nes_hw_qp_wqe *wqe; > + unsigned long flags = 0; > + u32 head; > + int err = 0; > + u32 wqe_count = 0; > + u32 counter; > + int sge_index; > + u32 total_payload_length; > + > +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + if (nesqp->ibqp_state > IB_QPS_RTS) > + return -EINVAL; > + > + spin_lock_irqsave(&nesqp->lock, flags); > + > + head = nesqp->hwqp.sq_head; > + > + while (ib_wr) { > + /* Check for SQ overflow */ > + if (((head + (2 * qsize) - nesqp->hwqp.sq_tail) % qsize) > == (qsize - 1)) { > + err = -EINVAL; > + break; > + } > + > + wqe = &nesqp->hwqp.sq_vbase[head]; > +// dprintk("%s:processing sq wqe at %p, head = %u.\n", > __FUNCTION__, wqe, head); > + *((u64 > *)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_SCRATCH_LOW_IDX]) = > ib_wr->wr_id; > + *((struct nes_qp > **)&wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; > + wqe->wqe_words[NES_IWARP_SQ_WQE_COMP_CTX_LOW_IDX] |= > head; > + > + switch (ib_wr->opcode) { > + case IB_WR_SEND: > + if (ib_wr->send_flags & IB_SEND_SOLICITED) { > + > wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_SENDSE; > + } else { > + > wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = NES_IWARP_SQ_OP_SEND; > + } > + if (ib_wr->num_sge > > nesdev->nesadapter->max_sge) { > + err = -EINVAL; > + break; > + } > + if (ib_wr->send_flags & IB_SEND_FENCE) { > + /* TODO: is IB Send Fence local or RDMA > read? */ > + > wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= > NES_IWARP_SQ_WQE_LOCAL_FENCE; > + } > + total_payload_length = 0; > + for (sge_index=0; sge_index < ib_wr->num_sge; > sge_index++) { > + > wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = > cpu_to_le32((u32)ib_wr->sg_list[sge_index].addr); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = > cpu_to_le32((u32)(ib_wr->sg_list[sge_index].addr>>32)); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX+(sge_index*4)] = > cpu_to_le32(ib_wr->sg_list[sge_index].length); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX+(sge_index*4)] = > cpu_to_le32(ib_wr->sg_list[sge_index].lkey); > + total_payload_length += > ib_wr->sg_list[sge_index].length; > + } > + > wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = > cpu_to_le32(total_payload_length); > + nesqp->bytes_sent += total_payload_length; > + if (nesqp->bytes_sent > NES_MAX_SQ_PAYLOAD_SIZE) > { > + > wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= > NES_IWARP_SQ_WQE_READ_FENCE; > + nesqp->bytes_sent = 0; > + } > + break; > + case IB_WR_RDMA_WRITE: > + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = > NES_IWARP_SQ_OP_RDMAW; > + if (ib_wr->num_sge > > nesdev->nesadapter->max_sge) { > + err = -EINVAL; > + break; > + } > + if (ib_wr->send_flags & IB_SEND_FENCE) { > + /* TODO: is IB Send Fence local or RDMA > read? */ > + > wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= > NES_IWARP_SQ_WQE_LOCAL_FENCE; > + } > + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_STAG_IDX] = > cpu_to_le32(ib_wr->wr.rdma.rkey); > + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX] > = cpu_to_le32(ib_wr->wr.rdma.remote_addr); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX] = > cpu_to_le32((u32)(ib_wr->wr.rdma.remote_addr>>32)); > + total_payload_length = 0; > + for (sge_index=0; sge_index < ib_wr->num_sge; > sge_index++) { > + > wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = > cpu_to_le32((u32)ib_wr->sg_list[sge_index].addr); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = > cpu_to_le32((u32)(ib_wr->sg_list[sge_index].addr>>32)); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_LENGTH0_IDX+(sge_index*4)] = > cpu_to_le32(ib_wr->sg_list[sge_index].length); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX+(sge_index*4)] = > cpu_to_le32(ib_wr->sg_list[sge_index].lkey); > + total_payload_length += > ib_wr->sg_list[sge_index].length; > + } > + /* TODO: handle multiple fragments, switch to > loop on structure */ > + > wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX] = > cpu_to_le32(total_payload_length); > + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX] > = wqe->wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_IDX]; > + nesqp->bytes_sent += total_payload_length; > + if (nesqp->bytes_sent > NES_MAX_SQ_PAYLOAD_SIZE) > { > + > wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= > NES_IWARP_SQ_WQE_READ_FENCE; > + nesqp->bytes_sent = 0; > + } > + break; > + case IB_WR_RDMA_READ: > + /* IWarp only supports 1 sge for RDMA reads */ > + if (ib_wr->num_sge > 1) { > + err = -EINVAL; > + break; > + } > + /* TODO: what about fences... */ > + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = > NES_IWARP_SQ_OP_RDMAR; > + > + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_LOW_IDX] > = cpu_to_le32(ib_wr->wr.rdma.remote_addr); > + > wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_TO_HIGH_IDX] = > cpu_to_le32((u32)(ib_wr->wr.rdma.remote_addr>>32)); > + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_STAG_IDX] = > cpu_to_le32(ib_wr->wr.rdma.rkey); > + wqe->wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX] > = cpu_to_le32(ib_wr->sg_list->length); > + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_LOW_IDX] = > cpu_to_le32(ib_wr->sg_list->addr); > + wqe->wqe_words[NES_IWARP_SQ_WQE_FRAG0_HIGH_IDX] > = cpu_to_le32((u32)(ib_wr->sg_list->addr>>32)); > + wqe->wqe_words[NES_IWARP_SQ_WQE_STAG0_IDX] = > cpu_to_le32(ib_wr->sg_list->lkey); > + break; > + default: > + /* error */ > + err = -EINVAL; > + break; > + } > + > + if (ib_wr->send_flags & IB_SEND_SIGNALED) { > + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] |= > NES_IWARP_SQ_WQE_SIGNALED_COMPL; > + } > + wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX] = > cpu_to_le32(wqe->wqe_words[NES_IWARP_SQ_WQE_MISC_IDX]); > + > + ib_wr = ib_wr->next; > + head++; > + wqe_count++; > + if (head >= qsize) > + head = 0; > + > + } > + > + nesqp->hwqp.sq_head = head; > + barrier(); > + while (wqe_count) { > + counter = min(wqe_count, ((u32)255)); > + wqe_count -= counter; > + /* TODO: switch to using doorbell region */ > + nes_write32(nesdev->regs + NES_WQE_ALLOC, (counter << > 24) | 0x00800000 | nesqp->hwqp.qp_id); > + } > + > + spin_unlock_irqrestore(&nesqp->lock, flags); > + > + if (err) > + *bad_wr = ib_wr; > + return (err); > +} > + > + > +/** > + * nes_post_recv > + * > + * @param ibqp > + * @param ib_wr > + * @param bad_wr > + * > + * @return int > + */ > +static int nes_post_recv(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, > + struct ib_recv_wr **bad_wr) > +{ > + struct nes_dev *nesdev = to_nesdev(ibqp->device); > + struct nes_qp *nesqp = to_nesqp(ibqp); > + u32 qsize = nesqp->hwqp.rq_size; > + struct nes_hw_qp_wqe *wqe; > + unsigned long flags = 0; > + u32 head; > + int err = 0; > + u32 wqe_count = 0; > + u32 counter; > + int sge_index; > + u32 total_payload_length; > + > + // dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + if (nesqp->ibqp_state > IB_QPS_RTS) > + return -EINVAL; > + > + spin_lock_irqsave(&nesqp->lock, flags); > + > + head = nesqp->hwqp.rq_head; > + > + while (ib_wr) { > + if (ib_wr->num_sge > nesdev->nesadapter->max_sge) { > + err = -EINVAL; > + break; > + } > + /* Check for RQ overflow */ > + if (((head + (2 * qsize) - nesqp->hwqp.rq_tail) % qsize) > == (qsize - 1)) { > + err = -EINVAL; > + break; > + } > + > +// dprintk("%s: ibwr sge count = %u.\n", __FUNCTION__, > ib_wr->num_sge); > + wqe = &nesqp->hwqp.rq_vbase[head]; > +// dprintk("%s:QP%u:processing rq wqe at %p, head = %u.\n", > __FUNCTION__, nesqp->hwqp.qp_id, wqe, head); > + *((u64 > *)&wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_SCRATCH_LOW_IDX]) = > ib_wr->wr_id; > + *((struct nes_qp > **)&wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX]) = nesqp; > + wqe->wqe_words[NES_IWARP_RQ_WQE_COMP_CTX_LOW_IDX] |= > head; > + > + total_payload_length = 0; > + for (sge_index=0; sge_index < ib_wr->num_sge; > sge_index++) { > + > wqe->wqe_words[NES_IWARP_RQ_WQE_FRAG0_LOW_IDX+(sge_index*4)] = > cpu_to_le32((u32)ib_wr->sg_list[sge_index].addr); > + > wqe->wqe_words[NES_IWARP_RQ_WQE_FRAG0_HIGH_IDX+(sge_index*4)] = > cpu_to_le32((u32)(ib_wr->sg_list[sge_index].addr>>32)); > + > wqe->wqe_words[NES_IWARP_RQ_WQE_LENGTH0_IDX+(sge_index*4)] = > cpu_to_le32(ib_wr->sg_list[sge_index].length); > + > wqe->wqe_words[NES_IWARP_RQ_WQE_STAG0_IDX+(sge_index*4)] = > cpu_to_le32(ib_wr->sg_list[sge_index].lkey); > + total_payload_length += ib_wr->sg_list->length; > + } > + wqe->wqe_words[NES_IWARP_RQ_WQE_TOTAL_PAYLOAD_IDX] = > cpu_to_le32(total_payload_length); > + > + ib_wr = ib_wr->next; > + head++; > + wqe_count++; > + if (head >= qsize) > + head = 0; > + } > + > + nesqp->hwqp.rq_head = head; > + barrier(); > + while (wqe_count) { > + counter = min(wqe_count, ((u32)255)); > + wqe_count -= counter; > + /* TODO: switch to using doorbell region */ > + nes_write32(nesdev->regs+NES_WQE_ALLOC, (counter<<24) | > nesqp->hwqp.qp_id ); > + } > + > + spin_unlock_irqrestore(&nesqp->lock, flags); > + > + if (err) > + *bad_wr = ib_wr; > + return err; > +} > + > + > +/** > + * nes_poll_cq > + * > + * @param ibcq > + * @param num_entries > + * @param entry > + * > + * @return int > + */ > +static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct > ib_wc *entry) > +{ > + u64 wrid; > +// u64 u64temp; > + struct nes_dev *nesdev = to_nesdev(ibcq->device); > + struct nes_cq *nescq = to_nescq(ibcq); > + struct nes_qp *nesqp; > + struct nes_hw_cqe cqe; > + unsigned long flags = 0; > + u32 head; > + u32 wq_tail; > + u32 cq_size; > + u32 cqe_count=0; > + u32 wqe_index; > +// u32 counter; > + > +// dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + spin_lock_irqsave(&nescq->lock, flags); > + > + head = nescq->hw_cq.cq_head; > + cq_size = nescq->hw_cq.cq_size; > +// dprintk("%s: Polling CQ%u (head = %u, size = %u).\n", > __FUNCTION__, > +// nescq->hw_cq.cq_number, head, cq_size); > + > + while (cqe_count + if > (nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] & > NES_CQE_VALID) { > + /* TODO: determine if this copy of the cqe > actually helps since cq is volatile */ > + cqe = nescq->hw_cq.cq_vbase[head]; > + > nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0; > + /* TODO: need to add code to check for magic bit > (0x200) and ignore */ > + wqe_index = > cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]&(nesdev->nesadapter->max_qp > _wr - 1); > + cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX] &= > ~(NES_SW_CONTEXT_ALIGN-1); > + barrier(); > + /* parse CQE, get completion context from WQE > (either rq or sq */ > + nesqp = *((struct nes_qp > **)&cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]); > + memset(entry, 0, sizeof *entry); > + entry->status = IB_WC_SUCCESS; > + entry->qp_num = nesqp->hwqp.qp_id; > + entry->src_qp = nesqp->hwqp.qp_id; > + > + if (cqe.cqe_words[NES_CQE_OPCODE_IDX] & > NES_CQE_SQ) { > + if (nesqp->skip_lsmm) > + { > + nesqp->skip_lsmm = 0; > + wq_tail = nesqp->hwqp.sq_tail++; > + } > + > + /* Working on a SQ Completion*/ > + /* TODO: get the wr head from the > completion after proper alignment of nesqp */ > + wq_tail = wqe_index; > + nesqp->hwqp.sq_tail = > (wqe_index+1)&(nesqp->hwqp.sq_size - 1); > + wrid = *((u64 > *)&nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_COMP_SCRATCH > _LOW_IDX]); > + entry->byte_len = > nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_TOTAL_PAYLOAD_I > DX]; > + > + switch > (nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_MISC_IDX]&0x3f > ) { > + case NES_IWARP_SQ_OP_RDMAW: > +// dprintk("%s: Operation = RDMA > WRITE.\n", __FUNCTION__ ); > + entry->opcode = > IB_WC_RDMA_WRITE; > + break; > + case NES_IWARP_SQ_OP_RDMAR: > +// dprintk("%s: Operation = RDMA > READ.\n", __FUNCTION__ ); > + entry->opcode = IB_WC_RDMA_READ; > + entry->byte_len = > nesqp->hwqp.sq_vbase[wq_tail].wqe_words[NES_IWARP_SQ_WQE_RDMA_LENGTH_IDX > ]; > + break; > + case NES_IWARP_SQ_OP_SENDINV: > + case NES_IWARP_SQ_OP_SENDSEINV: > + case NES_IWARP_SQ_OP_SEND: > + case NES_IWARP_SQ_OP_SENDSE: > +// dprintk("%s: Operation = > Send.\n", __FUNCTION__ ); > + entry->opcode = IB_WC_SEND; > + break; > + } > + } else { > + /* Working on a RQ Completion*/ > + wq_tail = wqe_index; > + nesqp->hwqp.rq_tail = > (wqe_index+1)&(nesqp->hwqp.rq_size - 1); > + entry->byte_len = > le32_to_cpu(cqe.cqe_words[NES_CQE_PAYLOAD_LENGTH_IDX]); > + entry->byte_len = > le32_to_cpu(cqe.cqe_words[NES_CQE_PAYLOAD_LENGTH_IDX]); > + wrid = *((u64 > *)&nesqp->hwqp.rq_vbase[wq_tail].wqe_words[NES_IWARP_RQ_WQE_COMP_SCRATCH > _LOW_IDX]); > + entry->opcode = IB_WC_RECV; > + } > + /* TODO: report errors */ > + entry->wr_id = wrid; > + > + if (++head >= cq_size) > + head = 0; > + cqe_count++; > + nescq->polled_completions++; > + /* TODO: find a better number...if there is one > */ > + if ((nescq->polled_completions>(cq_size/2)) || > (nescq->polled_completions==255)) { > + dprintk("%s: CQ%u Issuing CQE Allocate > since more than half of cqes are pending %u of %u.\n", > + __FUNCTION__, > nescq->hw_cq.cq_number ,nescq->polled_completions, cq_size); > + nes_write32(nesdev->regs+NES_CQE_ALLOC, > nescq->hw_cq.cq_number | (nescq->polled_completions << 16) ); > + nescq->polled_completions = 0; > + } > + entry++; > + } else > + break; > + } > + > + if (nescq->polled_completions) { > +// dprintk("%s: CQ%u Issuing CQE Allocate for %u cqes.\n", > +// __FUNCTION__, nescq->hw_cq.cq_number > ,nescq->polled_completions); > + nes_write32(nesdev->regs+NES_CQE_ALLOC, > nescq->hw_cq.cq_number | (nescq->polled_completions << 16) ); > + nescq->polled_completions = 0; > + } > + > + /* TODO: Add code to check if overflow checking is on, if so > write CQE_ALLOC with remaining CQEs here or overflow > + could occur */ > + > + nescq->hw_cq.cq_head = head; > +// dprintk("%s: Reporting %u completions for CQ%u.\n", > __FUNCTION__, cqe_count, nescq->hw_cq.cq_number); > + > + spin_unlock_irqrestore(&nescq->lock, flags); > + > + return cqe_count; > +} > + > + > +/** > + * nes_req_notify_cq > + * > + * @param ibcq > + * @param notify > + * > + * @return int > + */ > +static int nes_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify > notify) > +{ > + struct nes_dev *nesdev = to_nesdev(ibcq->device); > + struct nes_cq *nescq = to_nescq(ibcq); > + u32 cq_arm; > + > +// dprintk("%s: Requesting notification for CQ%u.\n", __FUNCTION__, > nescq->hw_cq.cq_number); > + cq_arm = nescq->hw_cq.cq_number; > + if (notify == IB_CQ_NEXT_COMP) > + cq_arm |= NES_CQE_ALLOC_NOTIFY_NEXT; > + else if (notify == IB_CQ_SOLICITED) > + cq_arm |= NES_CQE_ALLOC_NOTIFY_SE; > + else > + return -EINVAL; > + > +// dprintk("%s: Arming CQ%u, command = 0x%08X.\n", __FUNCTION__, > nescq->hw_cq.cq_number, cq_arm); > + nes_write32(nesdev->regs+NES_CQE_ALLOC, cq_arm ); > + > + return 0; > +} > + > + > +/** > + * nes_register_device > + * > + * @param nesdev > + * > + * @return int > + */ > +int nes_register_device(struct nes_dev *nesdev) > +{ > + int ret; > + int i; > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + strlcpy(nesdev->ibdev.name, "nes%d", IB_DEVICE_NAME_MAX); > + nesdev->ibdev.owner = THIS_MODULE; > + > + nesdev->ibdev.node_type = RDMA_NODE_RNIC; > + memset(&nesdev->ibdev.node_guid, 0, > sizeof(nesdev->ibdev.node_guid)); > + memcpy(&nesdev->ibdev.node_guid, nesdev->netdev->dev_addr, 6); > + nesdev->nesadapter->device_cap_flags = > + (IB_DEVICE_ZERO_STAG | > IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW); > + > + nesdev->ibdev.uverbs_cmd_mask = > + (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) > | > + (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) > | > + (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | > + (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | > + (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) | > + (1ull << IB_USER_VERBS_CMD_REG_MR) | > + (1ull << IB_USER_VERBS_CMD_DEREG_MR) | > + (1ull << > IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | > + (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | > + (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | > + (1ull << IB_USER_VERBS_CMD_CREATE_AH) | > + (1ull << IB_USER_VERBS_CMD_DESTROY_AH) | > + (1ull << > IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) | > + (1ull << IB_USER_VERBS_CMD_CREATE_QP) | > + (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | > + (1ull << IB_USER_VERBS_CMD_POLL_CQ) | > + (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | > + (1ull << IB_USER_VERBS_CMD_POST_SEND) | > + (1ull << IB_USER_VERBS_CMD_POST_RECV); > + > + nesdev->ibdev.phys_port_cnt = 1; > + nesdev->ibdev.dma_device = &nesdev->pcidev->dev; > + nesdev->ibdev.class_dev.dev = &nesdev->pcidev->dev; > + nesdev->ibdev.query_device = nes_query_device; > + nesdev->ibdev.query_port = nes_query_port; > + nesdev->ibdev.modify_port = nes_modify_port; > + nesdev->ibdev.query_pkey = nes_query_pkey; > + nesdev->ibdev.query_gid = nes_query_gid; > + nesdev->ibdev.alloc_ucontext = nes_alloc_ucontext; > + nesdev->ibdev.dealloc_ucontext = nes_dealloc_ucontext; > + nesdev->ibdev.mmap = nes_mmap; > + nesdev->ibdev.alloc_pd = nes_alloc_pd; > + nesdev->ibdev.dealloc_pd = nes_dealloc_pd; > + nesdev->ibdev.create_ah = nes_create_ah; > + nesdev->ibdev.destroy_ah = nes_destroy_ah; > + nesdev->ibdev.create_qp = nes_create_qp; > + nesdev->ibdev.modify_qp = nes_modify_qp; > + nesdev->ibdev.query_qp = nes_query_qp; > + nesdev->ibdev.destroy_qp = nes_destroy_qp; > + nesdev->ibdev.create_cq = nes_create_cq; > + nesdev->ibdev.destroy_cq = nes_destroy_cq; > + nesdev->ibdev.poll_cq = nes_poll_cq; > + nesdev->ibdev.get_dma_mr = nes_get_dma_mr; > + nesdev->ibdev.reg_phys_mr = nes_reg_phys_mr; > + nesdev->ibdev.reg_user_mr = nes_reg_user_mr; > + nesdev->ibdev.dereg_mr = nes_dereg_mr; > + > + nesdev->ibdev.alloc_fmr = 0; > + nesdev->ibdev.unmap_fmr = 0; > + nesdev->ibdev.dealloc_fmr = 0; > + nesdev->ibdev.map_phys_fmr = 0; > + > + nesdev->ibdev.attach_mcast = nes_multicast_attach; > + nesdev->ibdev.detach_mcast = nes_multicast_detach; > + nesdev->ibdev.process_mad = nes_process_mad; > + > + nesdev->ibdev.req_notify_cq = nes_req_notify_cq; > + nesdev->ibdev.post_send = nes_post_send; > + nesdev->ibdev.post_recv = nes_post_recv; > + > + nesdev->ibdev.iwcm = kmalloc(sizeof(*nesdev->ibdev.iwcm), > GFP_KERNEL); > + if (nesdev->ibdev.iwcm == NULL) { > + return (-ENOMEM); > + } > + nesdev->ibdev.iwcm->add_ref = nes_add_ref; > + nesdev->ibdev.iwcm->rem_ref = nes_rem_ref; > + nesdev->ibdev.iwcm->get_qp = nes_get_qp; > + nesdev->ibdev.iwcm->connect = nes_connect; > + nesdev->ibdev.iwcm->accept = nes_accept; > + nesdev->ibdev.iwcm->reject = nes_reject; > + nesdev->ibdev.iwcm->create_listen = nes_create_listen; > + nesdev->ibdev.iwcm->destroy_listen = nes_destroy_listen; > + > + dprintk("&nes_dev=0x%p : &nes->ibdev = 0x%p: %s : %u\n", nesdev, > &nesdev->ibdev, > + __FUNCTION__, __LINE__); > + > + ret = ib_register_device(&nesdev->ibdev); > + if (ret) { > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + > + return ret; > + } > + > + for (i = 0; i < ARRAY_SIZE(nes_class_attributes); ++i) { > + ret = class_device_create_file(&nesdev->ibdev.class_dev, > nes_class_attributes[i]); > + if (ret) { > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, > __LINE__); > + ib_unregister_device(&nesdev->ibdev); > + return ret; > + } > + } > + > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + return 0; > +} > + > + > +/** > + * nes_unregister_device > + * > + * @param nesdev > + */ > +void nes_unregister_device(struct nes_dev *nesdev) > +{ > + dprintk("%s:%s:%u\n", __FILE__, __FUNCTION__, __LINE__); > + ib_unregister_device(&nesdev->ibdev); > +} > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tom at opengridcomputing.com Fri Oct 27 08:30:12 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 27 Oct 2006 10:30:12 -0500 Subject: [openib-general] OFED 1.1 Build Issue Message-ID: <1161963012.2748.29.camel@trinity.ogc.int> I've been testing some code against the OFED 1.1 release and noticed that if you build anything that depends on IB (RNFS in this case) into the kernel, that the OFED kit doesn't work correctly. This is because the dependent modules (ib_core, etc...) get sucked into the kernel automagically and will cause the subsequent modprobe of the OFED module to fail. I don't think you can fix this without rebuilding the kernel so it should probably be listed in the OFED_release_notes as a known issue. Providing a mechanism to rebuild the kernel as part of the OFED install would be great too, sorry if it's already there and I missed it. Tom From tom at opengridcomputing.com Fri Oct 27 08:44:54 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 27 Oct 2006 10:44:54 -0500 Subject: [openib-general] More OFED 1.1 Message-ID: <1161963894.2748.32.camel@trinity.ogc.int> It just occurred to me that if you rebuild your kernel, it will stomp the OFED install. I didn't see this in the docs but I have to admit to skimming... From swise at opengridcomputing.com Fri Oct 27 08:48:21 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 10:48:21 -0500 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161960198.14333.16.camel@stevo-desktop> References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> Message-ID: <1161964101.14333.36.camel@stevo-desktop> Also, if I drop the memory down to 1G (via mem=1G on the boot line), then things work. But I think I'm just disabling the IOMMU. Note that __pa() and dma_map_single() return the same thing: c2_alloc_mqsp_chunk mqsp_chunk va ffff81003c4d2000 dma_addr 3c4d2000 __pa 3c4d2000 c2_alloc_mqsp addr ffff81003c4d201a dma_addr 3c4d201a c2_alloc_mqsp addr ffff81003c4d201c dma_addr 3c4d201c c2_alloc_mqsp addr ffff81003c4d201e dma_addr 3c4d201e c2_alloc_mqsp addr ffff81003c4d2020 dma_addr 3c4d2020 c2_rnic_init rep_vq va ffff81003cb08000 dma 3cb08000 __pa 3cb08000 c2_rnic_init aeq va ffff81003d4e0000 dma 3d4e0000 __pa 3d4e0000 On Fri, 2006-10-27 at 09:43 -0500, Steve Wise wrote: > On Thu, 2006-10-26 at 15:26 -0700, Roland Dreier wrote: > > Steve> The adapter seems to be dma'ing into the wrong memory. The > > Steve> patch below backs the usage of dma_map_single() back to > > Steve> using __pa() for converting kernel virtual addresses (from > > Steve> kmalloc) into bus addresses, and things work ok. > > > > Hmm. It might be interesting to hack the driver to print the result > > of both dma_map_single() and __pa() and see if they're different. > > > > Here's a dump. They are different. The __pa()'s are what I expect. I > dunno if the dma_addr's returned from dma_map_single() look good or not > (they certainly don't work :) > > c2_alloc_mqsp_chunk mqsp_chunk va ffff810148b1b000 dma_addr 3a56000 __pa 148b1b000 > c2_alloc_mqsp addr ffff810148b1b01a dma_addr 3a5601a > c2_alloc_mqsp addr ffff810148b1b01c dma_addr 3a5601c > c2_alloc_mqsp addr ffff810148b1b01e dma_addr 3a5601e > c2_alloc_mqsp addr ffff810148b1b020 dma_addr 3a56020 > c2_rnic_init rep_vq va ffff810147e78000 dma 3a57000 __pa 147e78000 > c2_rnic_init aeq va ffff810147d48000 dma 3a5f000 __pa 147d48000 > > > Are you running on a 32-bit (i386) or 64-bit (x86_64) kernel? How > > much RAM do you have? > > 64b/X86_64. 4GB RAM. The CPUs are Dempsey class XEONs - Dual CPU, Dual > core. So with HT on linux sees 8 CPUs. > > > Is the kernel using swiotlb? If so then you > > need to make sure your DMA_{TO,FROM} directions and dma_unmap calls > > are right, since otherwise the DMAed data won't be copied to/from the > > bounce buffer at the right time. > > All these mappings are for the device to DMA into the host memory, and > I'm using DMA_FROM_DEVICE in my calls to dma_map_single(). > > How do I know if the kernel is using swiotlb? > > > > Another thing to do if you're patient would be to use git-bisect and > > figure out exactly which patch made amso1100 stop working. > > > > I added these calls as part of the review for submission into the > kernel, and I originally tested them on dual CPU opteron systems with > 1GB of memory. But maybe they weren't using the IOMMU? Dunno. > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Fri Oct 27 08:53:08 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 10:53:08 -0500 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161964101.14333.36.camel@stevo-desktop> References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> Message-ID: <1161964388.14333.38.camel@stevo-desktop> And I do have CONFIG_SWIOTLB set: CONFIG_SWIOTLB=y On Fri, 2006-10-27 at 10:48 -0500, Steve Wise wrote: > Also, if I drop the memory down to 1G (via mem=1G on the boot line), > then things work. But I think I'm just disabling the IOMMU. Note that > __pa() and dma_map_single() return the same thing: > > > c2_alloc_mqsp_chunk mqsp_chunk va ffff81003c4d2000 dma_addr 3c4d2000 __pa 3c4d2000 > c2_alloc_mqsp addr ffff81003c4d201a dma_addr 3c4d201a > c2_alloc_mqsp addr ffff81003c4d201c dma_addr 3c4d201c > c2_alloc_mqsp addr ffff81003c4d201e dma_addr 3c4d201e > c2_alloc_mqsp addr ffff81003c4d2020 dma_addr 3c4d2020 > c2_rnic_init rep_vq va ffff81003cb08000 dma 3cb08000 __pa 3cb08000 > c2_rnic_init aeq va ffff81003d4e0000 dma 3d4e0000 __pa 3d4e0000 > > > > On Fri, 2006-10-27 at 09:43 -0500, Steve Wise wrote: > > On Thu, 2006-10-26 at 15:26 -0700, Roland Dreier wrote: > > > Steve> The adapter seems to be dma'ing into the wrong memory. The > > > Steve> patch below backs the usage of dma_map_single() back to > > > Steve> using __pa() for converting kernel virtual addresses (from > > > Steve> kmalloc) into bus addresses, and things work ok. > > > > > > Hmm. It might be interesting to hack the driver to print the result > > > of both dma_map_single() and __pa() and see if they're different. > > > > > > > Here's a dump. They are different. The __pa()'s are what I expect. I > > dunno if the dma_addr's returned from dma_map_single() look good or not > > (they certainly don't work :) > > > > c2_alloc_mqsp_chunk mqsp_chunk va ffff810148b1b000 dma_addr 3a56000 __pa 148b1b000 > > c2_alloc_mqsp addr ffff810148b1b01a dma_addr 3a5601a > > c2_alloc_mqsp addr ffff810148b1b01c dma_addr 3a5601c > > c2_alloc_mqsp addr ffff810148b1b01e dma_addr 3a5601e > > c2_alloc_mqsp addr ffff810148b1b020 dma_addr 3a56020 > > c2_rnic_init rep_vq va ffff810147e78000 dma 3a57000 __pa 147e78000 > > c2_rnic_init aeq va ffff810147d48000 dma 3a5f000 __pa 147d48000 > > > > > Are you running on a 32-bit (i386) or 64-bit (x86_64) kernel? How > > > much RAM do you have? > > > > 64b/X86_64. 4GB RAM. The CPUs are Dempsey class XEONs - Dual CPU, Dual > > core. So with HT on linux sees 8 CPUs. > > > > > Is the kernel using swiotlb? If so then you > > > need to make sure your DMA_{TO,FROM} directions and dma_unmap calls > > > are right, since otherwise the DMAed data won't be copied to/from the > > > bounce buffer at the right time. > > > > All these mappings are for the device to DMA into the host memory, and > > I'm using DMA_FROM_DEVICE in my calls to dma_map_single(). > > > > How do I know if the kernel is using swiotlb? > > > > > > > Another thing to do if you're patient would be to use git-bisect and > > > figure out exactly which patch made amso1100 stop working. > > > > > > > I added these calls as part of the review for submission into the > > kernel, and I originally tested them on dual CPU opteron systems with > > 1GB of memory. But maybe they weren't using the IOMMU? Dunno. > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri Oct 27 08:59:23 2006 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 27 Oct 2006 17:59:23 +0200 Subject: [openib-general] SM Receive Handling References: <000301c6f9ae$1c0b82e0$21606d86@one7> Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F5018943B6@taurus.voltaire.com> Hi Michael, The SM registers several MAD agents (for SM and SA traffic) and in mad.c this causes an agent to be found based on the high 32 bits of the transaction ID if it is a response (unsolicited messages are handled differently). This causes ib_mad_complete_recv to be invoked which handles RMPP, does the response matching, etc. before calling back the receive handler registered. -- Hal ________________________________ From: openib-general-bounces at openib.org on behalf of Michael Arndt Sent: Fri 10/27/2006 5:56 AM To: openib-general at openib.org Subject: [openib-general] SM Receive Handling Hi, I have a question about the way the SM is informed when a SMP is received. If I look at the sources and go from the bottom up I stop at the 'ib_mad_recv_done_handler' (core/mad.c). At this point the SMI is processing the packet and notice if the SMP has to be handled by the SM or SMA. In this case the function jumps to the label 'local' at line 1860 (see code attachment). I would really like if someone can explain the steps are taking between the labels 'local' and 'out'. The reason is, that I can't see were the __osm_sm_mad_ctrl_rcv_callback (which is the function the SM register to handle received MADs, right?) is informed, in any way (Message, JobQueue, Interrupt)? Thanks Michael static void ib_mad_recv_done_handler(struct ib_mad_port_private *port_priv, struct ib_wc *wc) { struct ib_mad_qp_info *qp_info; struct ib_mad_private_header *mad_priv_hdr; struct ib_mad_private *recv, *response; struct ib_mad_list_head *mad_list; struct ib_mad_agent_private *mad_agent; response = kmem_cache_alloc(ib_mad_cache, GFP_KERNEL); if (!response) printk(KERN_ERR PFX "ib_mad_recv_done_handler no memory " "for response buffer\n"); mad_list = (struct ib_mad_list_head *)(unsigned long)wc->wr_id; qp_info = mad_list->mad_queue->qp_info; dequeue_mad(mad_list); mad_priv_hdr = container_of(mad_list, struct ib_mad_private_header, mad_list); recv = container_of(mad_priv_hdr, struct ib_mad_private, header); dma_unmap_single(port_priv->device->dma_device, pci_unmap_addr(&recv->header, mapping), sizeof(struct ib_mad_private) - sizeof(struct ib_mad_private_header), DMA_FROM_DEVICE); /* Setup MAD receive work completion from "normal" work completion */ recv->header.wc = *wc; recv->header.recv_wc.wc = &recv->header.wc; recv->header.recv_wc.mad_len = sizeof(struct ib_mad); recv->header.recv_wc.recv_buf.mad = &recv->mad.mad; recv->header.recv_wc.recv_buf.grh = &recv->grh; if (atomic_read(&qp_info->snoop_count)) snoop_recv(qp_info, &recv->header.recv_wc, IB_MAD_SNOOP_RECVS); /* Validate MAD */ if (!validate_mad(&recv->mad.mad, qp_info->qp->qp_num)) goto out; if (recv->mad.mad.mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { if (!smi_handle_dr_smp_recv(&recv->mad.smp, port_priv->device->node_type, port_priv->port_num, port_priv->device->phys_port_cnt)) goto out; if (!smi_check_forward_dr_smp(&recv->mad.smp)) goto local; if (!smi_handle_dr_smp_send(&recv->mad.smp, port_priv->device->node_type, port_priv->port_num)) goto out; if (!smi_check_local_smp(&recv->mad.smp, port_priv->device)) goto out; } local: /* Give driver "right of first refusal" on incoming MAD */ if (port_priv->device->process_mad) { int ret; if (!response) { printk(KERN_ERR PFX "No memory for response MAD\n"); /* * Is it better to assume that * it wouldn't be processed ? */ goto out; } ret = port_priv->device->process_mad(port_priv->device, 0, port_priv->port_num, wc, &recv->grh, &recv->mad.mad, &response->mad.mad); if (ret & IB_MAD_RESULT_SUCCESS) { if (ret & IB_MAD_RESULT_CONSUMED) goto out; if (ret & IB_MAD_RESULT_REPLY) { agent_send_response(&response->mad.mad, &recv->grh, wc, port_priv->device, port_priv->port_num, qp_info->qp->qp_num); goto out; } } } mad_agent = find_mad_agent(port_priv, &recv->mad.mad); if (mad_agent) { ib_mad_complete_recv(mad_agent, &recv->header.recv_wc); /* * recv is freed up in error cases in ib_mad_complete_recv * or via recv_handler in ib_mad_complete_recv() */ recv = NULL; } out: /* Post another receive request for this QP */ if (response) { ib_mad_post_receive_mads(qp_info, response); if (recv) kmem_cache_free(ib_mad_cache, recv); } else ib_mad_post_receive_mads(qp_info, recv); } _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From johnt1johnt2 at gmail.com Fri Oct 27 08:59:42 2006 From: johnt1johnt2 at gmail.com (john t) Date: Fri, 27 Oct 2006 21:29:42 +0530 Subject: [openib-general] Building OFED with FC4 kernel sources Message-ID: Hi, I am using FC4, kernel version 2.6.11-1.1369_FC4smp, here under /usr/src/kernels/2.6.11-1.1369_FC4smp-x86_64/include/linux/device.h, struct class_device contains a member "dev_t devt" Whereas when I obtained the corresponding kernel sources " kernel-2.6.11-1.1369_FC4.src.rpm" from http://download.fedora.redhat.com/pub/fedora/linux/core/4/SRPMS, the corresponding "struct class_device" in these sources do not contain the member "dev_t devt". As a result my OpenIB (OFED) build is failing. Can some one please point me to the correct kernel sources for FC4? Thanks, John T. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Fri Oct 27 09:44:15 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 11:44:15 -0500 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161964388.14333.38.camel@stevo-desktop> References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> Message-ID: <1161967455.14333.45.camel@stevo-desktop> So the issue is my maps are getting setup with bounce bufs. This shouldn't be the case, I think, because my device supports 64b addressing. I'm diggin' into this. On Fri, 2006-10-27 at 10:53 -0500, Steve Wise wrote: > And I do have CONFIG_SWIOTLB set: > > CONFIG_SWIOTLB=y > > On Fri, 2006-10-27 at 10:48 -0500, Steve Wise wrote: > > Also, if I drop the memory down to 1G (via mem=1G on the boot line), > > then things work. But I think I'm just disabling the IOMMU. Note that > > __pa() and dma_map_single() return the same thing: > > > > > > c2_alloc_mqsp_chunk mqsp_chunk va ffff81003c4d2000 dma_addr 3c4d2000 __pa 3c4d2000 > > c2_alloc_mqsp addr ffff81003c4d201a dma_addr 3c4d201a > > c2_alloc_mqsp addr ffff81003c4d201c dma_addr 3c4d201c > > c2_alloc_mqsp addr ffff81003c4d201e dma_addr 3c4d201e > > c2_alloc_mqsp addr ffff81003c4d2020 dma_addr 3c4d2020 > > c2_rnic_init rep_vq va ffff81003cb08000 dma 3cb08000 __pa 3cb08000 > > c2_rnic_init aeq va ffff81003d4e0000 dma 3d4e0000 __pa 3d4e0000 > > > > > > > > On Fri, 2006-10-27 at 09:43 -0500, Steve Wise wrote: > > > On Thu, 2006-10-26 at 15:26 -0700, Roland Dreier wrote: > > > > Steve> The adapter seems to be dma'ing into the wrong memory. The > > > > Steve> patch below backs the usage of dma_map_single() back to > > > > Steve> using __pa() for converting kernel virtual addresses (from > > > > Steve> kmalloc) into bus addresses, and things work ok. > > > > > > > > Hmm. It might be interesting to hack the driver to print the result > > > > of both dma_map_single() and __pa() and see if they're different. > > > > > > > > > > Here's a dump. They are different. The __pa()'s are what I expect. I > > > dunno if the dma_addr's returned from dma_map_single() look good or not > > > (they certainly don't work :) > > > > > > c2_alloc_mqsp_chunk mqsp_chunk va ffff810148b1b000 dma_addr 3a56000 __pa 148b1b000 > > > c2_alloc_mqsp addr ffff810148b1b01a dma_addr 3a5601a > > > c2_alloc_mqsp addr ffff810148b1b01c dma_addr 3a5601c > > > c2_alloc_mqsp addr ffff810148b1b01e dma_addr 3a5601e > > > c2_alloc_mqsp addr ffff810148b1b020 dma_addr 3a56020 > > > c2_rnic_init rep_vq va ffff810147e78000 dma 3a57000 __pa 147e78000 > > > c2_rnic_init aeq va ffff810147d48000 dma 3a5f000 __pa 147d48000 > > > > > > > Are you running on a 32-bit (i386) or 64-bit (x86_64) kernel? How > > > > much RAM do you have? > > > > > > 64b/X86_64. 4GB RAM. The CPUs are Dempsey class XEONs - Dual CPU, Dual > > > core. So with HT on linux sees 8 CPUs. > > > > > > > Is the kernel using swiotlb? If so then you > > > > need to make sure your DMA_{TO,FROM} directions and dma_unmap calls > > > > are right, since otherwise the DMAed data won't be copied to/from the > > > > bounce buffer at the right time. > > > > > > All these mappings are for the device to DMA into the host memory, and > > > I'm using DMA_FROM_DEVICE in my calls to dma_map_single(). > > > > > > How do I know if the kernel is using swiotlb? > > > > > > > > > > Another thing to do if you're patient would be to use git-bisect and > > > > figure out exactly which patch made amso1100 stop working. > > > > > > > > > > I added these calls as part of the review for submission into the > > > kernel, and I originally tested them on dual CPU opteron systems with > > > 1GB of memory. But maybe they weren't using the IOMMU? Dunno. > > > > > > > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From bgreen at nas.nasa.gov Fri Oct 27 09:46:41 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Fri, 27 Oct 2006 09:46:41 -0700 Subject: [openib-general] openib/OFED-1.1 Gentoo Linux ebuilds Message-ID: <200610271646.k9RGkfbT011475@ece06.nas.nasa.gov> Hello, I have produced and tested a set of ebuilds for installing openib-1.1 on Gentoo Linux. They are based on a pair of tar files I generated from OFED-1.1.tgz. Once these two tar files are placed on 'mirror.gentooscience.org', I will be submitting the ebuilds to the Gentoo Science overlay for use by others. Ideally, these tar files could be downloaded from 'www.openfabrics.org' itself. Would someone be willing to post them there? I have a script to produce them automatically from OFED-1.1.tgz. The two files can be found here: http://www.nas.nasa.gov/~bgreen/openib/openib-userspace-1.1.tgz http://www.nas.nasa.gov/~bgreen/openib/openib-drivers-1.1.tgz Inside each of these packages is a script called 'gen_openib_packages.sh', which when run, will automatically download OFED-1.1.tgz and produce these two tar files. Thanks, -bryan P.S., If you happen to be interested in my own results from a few of the OSU benchmarks on Opteron and Zeon systems, you can see them here: http://people.nas.nasa.gov/~bgreen/mpiperf/ I am actually able to achieve almost 1.7G/s between two opteron systems with osu_bw. From robert.j.woodruff at intel.com Fri Oct 27 10:41:00 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 27 Oct 2006 10:41:00 -0700 Subject: [openib-general] FW: /etc/rocks-release screws up OFED 1.1.1 build process Message-ID: -----Original Message----- From: Worley, Chris B Sent: Friday, October 27, 2006 10:18 AM To: Woodruff, Robert J Subject: FW: /etc/rocks-release screws up OFED 1.1.1 build process Woody, I was unable to post this to the OpenIB list server. Could you post it? Chris >---------- Forwarded message ---------- >From: Chris Worley >Date: Oct 26, 2006 3:20 PM >Subject: /etc/rocks-release screws up OFED 1.1.1 build process >To: openib-general at openib.org > > >To fix it, make sure the test for rocks-release is at the end, or >might as well delete it altogether, of build_env.sh: > ># Set Distribuition dependency environment >dist_rpm="" >if [ -f /etc/SuSE-release ]; then > dist_rpm=$($RPM -qf /etc/SuSE-release) > DISTRIBUTION="SuSE" >elif [ -f /etc/fedora-release ]; then > dist_rpm=$($RPM -qf /etc/fedora-release) > DISTRIBUTION="fedora" >elif [ -f /etc/redhat-release ]; then > dist_rpm=$($RPM -qf /etc/redhat-release) > DISTRIBUTION="redhat" >elif [ -f /etc/rocks-release ]; then > dist_rpm=$($RPM -qf /etc/rocks-release) > DISTRIBUTION="Rocks" >else > dist_rpm="Unknown" > DISTRIBUTION=$(ls /etc/*-release | head -n 1 | xargs -iXXX >basename XXX -release 2> $NULL) > [ -z "${DISTRIBUTION}" ] && DISTRIBUTION="Unknown" >fi > >The problem is the rpm -qf returns an error: > >[root at c OFED-1.1.1]# rpm -qf /etc/redhat-release >redhat-release-4AS-4.1 >[root at c OFED-1.1.1]# rpm -qf /etc/rocks-release >file /etc/rocks-release is not owned by any package > >Which, during the build, generates the error: > >ERROR: Failed executing "/bin/mv -f >/var/tmp/OFEDRPM/RPMS/x86_64/dapl-1.2.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/dapl-devel-1.2.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/ipoibtools-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/kernel-ib-1.1-2.6.9_34.ELsmp.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/kernel-ib-devel-1.1-2.6.9_34.ELsmp.x86_64. rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibcm-0.9.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibcm-devel-0.9.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibcommon-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibcommon-devel-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibmad-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibmad-devel-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibumad-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibumad-devel-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibverbs-1.0.4-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibverbs-devel-1.0.4-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libibverbs-utils-1.0.4-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libipathverbs-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libipathverbs-devel-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libmthca-1.0.3-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libmthca-devel-1.0.3-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libopensm-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libopensm-devel-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libosmcomp-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libosmcomp-devel-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libosmvendor-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libosmvendor-devel-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/librdmacm-0.9.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/librdmacm-devel-0.9.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/librdmacm-utils-0.9.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/libsdp-1.1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/mstflint-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/openib-diags-1.1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/opensm-2.0.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/perftest-1.0-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/srptools-0.0.4-0.x86_64.rpm >/var/tmp/OFEDRPM/RPMS/x86_64/tvflash-0.9.0-0.x86_64.rpm >/export/tools/OFED-1.1.1/RPMS/file /etc/rocks-release is not owned by >any package" From rdreier at cisco.com Fri Oct 27 10:56:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 10:56:45 -0700 Subject: [openib-general] [PATCH 1/5] NetEffect 10Gb RNIC Userspace Library: userspace config generation In-Reply-To: <1161959235.2748.1.camel@trinity.ogc.int> (Tom Tucker's message of "Fri, 27 Oct 2006 09:27:15 -0500") References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECA@venom2> <1161959235.2748.1.camel@trinity.ogc.int> Message-ID: > I don't think the userspace stuff belongs on netdev. Someone please > correct me if I'm wrong. Yeah, it's not a bad thing to get wider review, but your userspace library is pretty much your business. If you screw it up it doesn't hurt anyone else, so I'm happy to let you write it however you want. - R. From rdreier at cisco.com Fri Oct 27 11:02:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 11:02:48 -0700 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161967455.14333.45.camel@stevo-desktop> (Steve Wise's message of "Fri, 27 Oct 2006 11:44:15 -0500") References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> Message-ID: > So the issue is my maps are getting setup with bounce bufs. This > shouldn't be the case, I think, because my device supports 64b > addressing. I'm diggin' into this. Yes, it's a little strange that you're still bouncing with the pci_set_dma_mask(DMA_64BIT_MASK). But the bounce buffering should work -- the fact that it doesn't means you have a bug in how you're using dma_map_single(), because dma_unmap_single() will copy things back from the bounce buffer. So you have two issues: - why is the bounce buffering happening? - what's wrong with your dma mapping calls? (Because there are other situations with an IOMMU where you have to get things right) - R. From shemminger at osdl.org Fri Oct 27 11:01:04 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Fri, 27 Oct 2006 11:01:04 -0700 Subject: [openib-general] [PATCH 1/5] NetEffect 10Gb RNIC Userspace Library: userspace config generation In-Reply-To: References: <5E701717F2B2ED4EA60F87C8AA57B7CC064C9ECA@venom2> <1161959235.2748.1.camel@trinity.ogc.int> Message-ID: <20061027110104.116a53bc@freekitty> On Fri, 27 Oct 2006 10:56:45 -0700 Roland Dreier wrote: > > I don't think the userspace stuff belongs on netdev. Someone please > > correct me if I'm wrong. > > Yeah, it's not a bad thing to get wider review, but your userspace > library is pretty much your business. If you screw it up it doesn't > hurt anyone else, so I'm happy to let you write it however you want. > > - R. > I prefer a pointer to the project download source. Seeing the userspace stuff helps answer questions where the administration process is confusing (or could/should be done differently). -- Stephen Hemminger From swise at opengridcomputing.com Fri Oct 27 11:10:01 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 13:10:01 -0500 Subject: [openib-general] problem with 2.6.19? In-Reply-To: References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> Message-ID: <1161972601.14333.50.camel@stevo-desktop> On Fri, 2006-10-27 at 11:02 -0700, Roland Dreier wrote: > > So the issue is my maps are getting setup with bounce bufs. This > > shouldn't be the case, I think, because my device supports 64b > > addressing. I'm diggin' into this. > > Yes, it's a little strange that you're still bouncing with the > pci_set_dma_mask(DMA_64BIT_MASK). But the bounce buffering should > work -- the fact that it doesn't means you have a bug in how you're > using dma_map_single(), because dma_unmap_single() will copy things > back from the bounce buffer. > > So you have two issues: > - why is the bounce buffering happening? > - what's wrong with your dma mapping calls? (Because there are other > situations with an IOMMU where you have to get things right) > > - R. I must be misusing dma_map_single(). What I'm doing is allocating a verb message reply queue for the adapter to DMA verb replies into. It never gets unmapped. I kmalloc() it, then map it. I could use dma_alloc_coherent() or something, and maybe that's what I need to do? You're saying I must unmap it before the data is valid (cuz of the bounce buffering). If that's true, then how in sam hill does user mode RDMA work since the user's memory isn't unmapped before the user looks like memory that is the target of RDMA???? The uverbs code calls dma_map_sg() which is roughly the same as dma_map_single, eh? Slightly confused... Steve. From rdreier at cisco.com Fri Oct 27 11:15:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 11:15:40 -0700 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161972601.14333.50.camel@stevo-desktop> (Steve Wise's message of "Fri, 27 Oct 2006 13:10:01 -0500") References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> Message-ID: > I must be misusing dma_map_single(). What I'm doing is allocating a > verb message reply queue for the adapter to DMA verb replies into. It > never gets unmapped. I kmalloc() it, then map it. I could use > dma_alloc_coherent() or something, and maybe that's what I need to do? Yeah, if you want to leave something mapped and have the device DMA into it, and the CPU look into the buffer too, then you need consistent/coherent memory -- either pci_alloc_consistent() or dma_alloc_coherent(). The dma_ variant is slightly better because you can pass in a GFP_ mask rather than having the kernel pick GFP_ATOMIC for you. > You're saying I must unmap it before the data is valid (cuz of the > bounce buffering). If that's true, then how in sam hill does user mode > RDMA work since the user's memory isn't unmapped before the user looks > like memory that is the target of RDMA???? The uverbs code calls > dma_map_sg() which is roughly the same as dma_map_single, eh? It's a good point. We're kind of counting on the IOMMU situation not being too wacky, and the device being able to DMA to arbitrary addresses. So swiotlb won't work in this case actually -- but we assume any RDMA device can do 64-bit DMA so it doesn't hurt us in practice. But yes, DMA to userspace is slightly risky and won't work everywhere. - R. From swise at opengridcomputing.com Fri Oct 27 11:32:05 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 13:32:05 -0500 Subject: [openib-general] problem with 2.6.19? In-Reply-To: References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> Message-ID: <1161973925.14333.61.camel@stevo-desktop> I found the bug. The ammasso driver was passing in the pointer c2_dev->ibdev.dma_device to dma_map_single(). However the driver hadn't yet registered with the ib core, so that field is zero. The dma map code uses bounce buffers if a map is done with a NULL device ptr passed in... The simple fix is to pass in the correct pointer, &c2dev->pcidev->dev. But I guess I should still change this all to use dma_alloc_coherent()... Roland, Can we get this fix into 2.6.19? Steve. On Fri, 2006-10-27 at 11:15 -0700, Roland Dreier wrote: > > I must be misusing dma_map_single(). What I'm doing is allocating a > > verb message reply queue for the adapter to DMA verb replies into. It > > never gets unmapped. I kmalloc() it, then map it. I could use > > dma_alloc_coherent() or something, and maybe that's what I need to do? > > Yeah, if you want to leave something mapped and have the device DMA > into it, and the CPU look into the buffer too, then you need > consistent/coherent memory -- either pci_alloc_consistent() or > dma_alloc_coherent(). The dma_ variant is slightly better because you > can pass in a GFP_ mask rather than having the kernel pick GFP_ATOMIC > for you. > > > You're saying I must unmap it before the data is valid (cuz of the > > bounce buffering). If that's true, then how in sam hill does user mode > > RDMA work since the user's memory isn't unmapped before the user looks > > like memory that is the target of RDMA???? The uverbs code calls > > dma_map_sg() which is roughly the same as dma_map_single, eh? > > It's a good point. We're kind of counting on the IOMMU situation not > being too wacky, and the device being able to DMA to arbitrary > addresses. So swiotlb won't work in this case actually -- but we > assume any RDMA device can do 64-bit DMA so it doesn't hurt us in > practice. But yes, DMA to userspace is slightly risky and won't work > everywhere. > > - R. From psoltero at cs.unm.edu Fri Oct 27 11:49:16 2006 From: psoltero at cs.unm.edu (Philip T. Soltero) Date: Fri, 27 Oct 2006 12:49:16 -0600 Subject: [openib-general] Building OFED with FC4 kernel sources In-Reply-To: References: Message-ID: <454254AC.60303@cs.unm.edu> John T., I'm using kernel 2.6.17-1.2141_FC4 and am getting a clean build. The source rpm can be found here: http://download.fedora.redhat.com/pub/fedora/linux/core/updates/4/SRPMS/ Philip john t wrote: > Hi, > > I am using FC4, kernel version 2.6.11-1.1369_FC4smp, here under > /usr/src/kernels/2.6.11-1.1369_FC4smp-x86_64/include/linux/device.h, > > struct class_device contains a member "dev_t devt" > > Whereas when I obtained the corresponding kernel sources > "kernel-2.6.11-1.1369_FC4.src.rpm" from > http://download.fedora.redhat.com/pub/fedora/linux/core/4/SRPMS, the > corresponding "struct class_device" in these sources do not contain the > member "dev_t devt". As a result my OpenIB (OFED) build is failing. > > Can some one please point me to the correct kernel sources for FC4? > > Thanks, > John T. > > > ------------------------------------------------------------------------ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From psoltero at cs.unm.edu Fri Oct 27 11:53:00 2006 From: psoltero at cs.unm.edu (Philip T. Soltero) Date: Fri, 27 Oct 2006 12:53:00 -0600 Subject: [openib-general] Building OFED with FC4 kernel sources In-Reply-To: <454254AC.60303@cs.unm.edu> References: <454254AC.60303@cs.unm.edu> Message-ID: <4542558C.3020201@cs.unm.edu> Sorry read that "kernel 2.6.17-2142_FC4" =) Philip T. Soltero wrote: > John T., > > I'm using kernel 2.6.17-1.2141_FC4 and am getting a clean build. The > source rpm can be found here: > http://download.fedora.redhat.com/pub/fedora/linux/core/updates/4/SRPMS/ > > Philip > > > john t wrote: >> Hi, >> >> I am using FC4, kernel version 2.6.11-1.1369_FC4smp, here under >> /usr/src/kernels/2.6.11-1.1369_FC4smp-x86_64/include/linux/device.h, >> >> struct class_device contains a member "dev_t devt" >> >> Whereas when I obtained the corresponding kernel sources >> "kernel-2.6.11-1.1369_FC4.src.rpm" from >> http://download.fedora.redhat.com/pub/fedora/linux/core/4/SRPMS, the >> corresponding "struct class_device" in these sources do not contain the >> member "dev_t devt". As a result my OpenIB (OFED) build is failing. >> >> Can some one please point me to the correct kernel sources for FC4? >> >> Thanks, >> John T. >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Fri Oct 27 12:23:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 12:23:01 -0700 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <1161973925.14333.61.camel@stevo-desktop> (Steve Wise's message of "Fri, 27 Oct 2006 13:32:05 -0500") References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> <1161973925.14333.61.camel@stevo-desktop> Message-ID: > But I guess I should still change this all to use > dma_alloc_coherent()... Yes, please fix at least this part the right way (even if we can't do anything about userspace). > Roland, Can we get this fix into 2.6.19? Yes, there's plenty of time before 2.6.19-final, and this is clearly a bugfix. - R. From swise at opengridcomputing.com Fri Oct 27 13:58:19 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 15:58:19 -0500 Subject: [openib-general] [PATCH 2.6.19-rc3 1/2] amso1100 - Use dma_alloc_coherent instead of kmalloc/dma_map_single. Message-ID: <20061027205819.8511.85102.stgit@dell3.ogc.int> The Ammasso driver needs to use dma_alloc_coherent() for allocating memory that will be used by the HW for dma. Signed-off-by: Steve Wise --- drivers/infiniband/hw/amso1100/c2_alloc.c | 13 +++---- drivers/infiniband/hw/amso1100/c2_cq.c | 14 ++------ drivers/infiniband/hw/amso1100/c2_rnic.c | 52 ++++++++++++----------------- 3 files changed, 31 insertions(+), 48 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c b/drivers/infiniband/hw/amso1100/c2_alloc.c index 028a60b..1d30ef7 100644 --- a/drivers/infiniband/hw/amso1100/c2_alloc.c +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c @@ -42,13 +42,14 @@ static int c2_alloc_mqsp_chunk(struct c2 { int i; struct sp_chunk *new_head; + dma_addr_t dma_addr; - new_head = (struct sp_chunk *) __get_free_page(gfp_mask); + new_head = dma_alloc_coherent(&c2dev->pcidev->dev, PAGE_SIZE, + &dma_addr, gfp_mask); if (new_head == NULL) return -ENOMEM; - new_head->dma_addr = dma_map_single(c2dev->ibdev.dma_device, new_head, - PAGE_SIZE, DMA_FROM_DEVICE); + new_head->dma_addr = dma_addr; pci_unmap_addr_set(new_head, mapping, new_head->dma_addr); new_head->next = NULL; @@ -80,10 +81,8 @@ void c2_free_mqsp_pool(struct c2_dev *c2 while (root) { next = root->next; - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(root, mapping), PAGE_SIZE, - DMA_FROM_DEVICE); - __free_page((struct page *) root); + dma_free_coherent(&c2dev->pcidev->dev, PAGE_SIZE, root, + pci_unmap_addr(root, mapping)); root = next; } } diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 9d7bcc5..9b7af81 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -246,11 +246,8 @@ int c2_arm_cq(struct ib_cq *ibcq, enum i static void c2_free_cq_buf(struct c2_dev *c2dev, struct c2_mq *mq) { - - dma_unmap_single(c2dev->ibdev.dma_device, pci_unmap_addr(mq, mapping), - mq->q_size * mq->msg_size, DMA_FROM_DEVICE); - free_pages((unsigned long) mq->msg_pool.host, - get_order(mq->q_size * mq->msg_size)); + dma_free_coherent(&c2dev->pcidev->dev, mq->q_size * mq->msg_size, + mq->msg_pool.host, pci_unmap_addr(mq, mapping)); } static int c2_alloc_cq_buf(struct c2_dev *c2dev, struct c2_mq *mq, int q_size, @@ -258,8 +255,8 @@ static int c2_alloc_cq_buf(struct c2_dev { unsigned long pool_start; - pool_start = __get_free_pages(GFP_KERNEL, - get_order(q_size * msg_size)); + pool_start = dma_alloc_coherent(&c2dev->pcidev->dev, q_size * msg_size, + &mq->host_dma, GFP_KERNEL); if (!pool_start) return -ENOMEM; @@ -271,9 +268,6 @@ static int c2_alloc_cq_buf(struct c2_dev NULL, /* peer (currently unknown) */ C2_MQ_HOST_TARGET); - mq->host_dma = dma_map_single(c2dev->ibdev.dma_device, - (void *)pool_start, - q_size * msg_size, DMA_FROM_DEVICE); pci_unmap_addr_set(mq, mapping, mq->host_dma); return 0; diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 30409e1..6d82464 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -517,14 +517,12 @@ int c2_rnic_init(struct c2_dev *c2dev) /* Initialize the Verbs Reply Queue */ qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_QSIZE)); msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_MSGSIZE)); - q1_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + q1_pages = dma_alloc_coherent(&c2dev->pcidev->dev, qsize * msgsize, + &c2dev->rep_vq.host_dma, GFP_KERNEL); if (!q1_pages) { err = -ENOMEM; goto bail1; } - c2dev->rep_vq.host_dma = dma_map_single(c2dev->ibdev.dma_device, - (void *)q1_pages, qsize * msgsize, - DMA_FROM_DEVICE); pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma); pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages, (unsigned long long) c2dev->rep_vq.host_dma); @@ -540,14 +538,12 @@ int c2_rnic_init(struct c2_dev *c2dev) /* Initialize the Asynchronus Event Queue */ qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_QSIZE)); msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_MSGSIZE)); - q2_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + q2_pages = dma_alloc_coherent(&c2dev->pcidev->dev, qsize * msgsize, + &c2dev->aeq.host_dma, GFP_KERNEL); if (!q2_pages) { err = -ENOMEM; goto bail2; } - c2dev->aeq.host_dma = dma_map_single(c2dev->ibdev.dma_device, - (void *)q2_pages, qsize * msgsize, - DMA_FROM_DEVICE); pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, (unsigned long long) c2dev->rep_vq.host_dma); @@ -597,17 +593,13 @@ int c2_rnic_init(struct c2_dev *c2dev) bail4: vq_term(c2dev); bail3: - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->aeq, mapping), - c2dev->aeq.q_size * c2dev->aeq.msg_size, - DMA_FROM_DEVICE); - kfree(q2_pages); + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->aeq.q_size * c2dev->aeq.msg_size, + q2_pages, pci_unmap_addr(&c2dev->aeq, mapping)); bail2: - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->rep_vq, mapping), - c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, - DMA_FROM_DEVICE); - kfree(q1_pages); + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, + q1_pages, pci_unmap_addr(&c2dev->rep_vq, mapping)); bail1: c2_free_mqsp_pool(c2dev, c2dev->kern_mqsp_pool); bail0: @@ -640,19 +632,17 @@ void c2_rnic_term(struct c2_dev *c2dev) /* Free the verbs request allocator */ vq_term(c2dev); - /* Unmap and free the asynchronus event queue */ - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->aeq, mapping), - c2dev->aeq.q_size * c2dev->aeq.msg_size, - DMA_FROM_DEVICE); - kfree(c2dev->aeq.msg_pool.host); - - /* Unmap and free the verbs reply queue */ - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->rep_vq, mapping), - c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, - DMA_FROM_DEVICE); - kfree(c2dev->rep_vq.msg_pool.host); + /* Free the asynchronus event queue */ + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->aeq.q_size * c2dev->aeq.msg_size, + c2dev->aeq.msg_pool.host, + pci_unmap_addr(&c2dev->aeq, mapping)); + + /* Free the verbs reply queue */ + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, + c2dev->rep_vq.msg_pool.host, + pci_unmap_addr(&c2dev->rep_vq, mapping)); /* Free the MQ shared pointer pool */ c2_free_mqsp_pool(c2dev, c2dev->kern_mqsp_pool); From swise at opengridcomputing.com Fri Oct 27 13:58:39 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 15:58:39 -0500 Subject: [openib-general] [PATCH 2.6.19-rc3 2/2] amso1100 - Fix incorrect pr_debug(). In-Reply-To: <20061027205819.8511.85102.stgit@dell3.ogc.int> References: <20061027205819.8511.85102.stgit@dell3.ogc.int> Message-ID: <20061027205839.8511.84399.stgit@dell3.ogc.int> pr_debug() was printing the wrong stuff. Signed-off-by: Steve Wise --- drivers/infiniband/hw/amso1100/c2_rnic.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 6d82464..e1d31fb 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -545,8 +545,8 @@ int c2_rnic_init(struct c2_dev *c2dev) goto bail2; } pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); - pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, - (unsigned long long) c2dev->rep_vq.host_dma); + pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q2_pages, + (unsigned long long) c2dev->aeq.host_dma); c2_mq_rep_init(&c2dev->aeq, 2, qsize, From rdreier at cisco.com Fri Oct 27 13:58:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 13:58:25 -0700 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <453F831C.4010204@dev.mellanox.co.il> (Tziporet Koren's message of "Wed, 25 Oct 2006 17:30:36 +0200") References: <453F831C.4010204@dev.mellanox.co.il> Message-ID: > I want to suggest that you will create releases to the libraries you own To make this simpler, is there any way we can give maintainers the ability to put library releases somewhere on the new server so that they show up on the downloads page automatically? Right now it is somewhat cumbersome to create library releases, since the poor sysadmins have to manually add tarballs to the downloads page. - R. From somenath at veritas.com Thu Oct 26 14:25:04 2006 From: somenath at veritas.com (somenath) Date: Thu, 26 Oct 2006 14:25:04 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <453F9A12.4020500@ichips.intel.com> References: <453F9A12.4020500@ichips.intel.com> Message-ID: <454127B0.8080109@veritas.com> Sean Hefty wrote: > Tang, Changqing wrote: > >> Is there a way (not limited to cm) to know that path 1 is back again and >> reload it as the new alternate path ? If path 1 is down, we can not set >> it as alternate path, right ? >> >> If we can not bring a path back on fly, the usage of APM is limited. > > > Yes - the usage is currently limited, but that is being worked on. > Support is needed in both the ib_cm and ib_sa. The ib_sa needs SA > informinfo/notice support, so users can receive fabric event > notifications. And the ib_cm needs to know when to switch to the > alternate path. Venkatesh has a couple of patches that assist with > both of these. > > - Sean Sean, will there be a new API for SA event notification? today we already get this IB_EVENT_PATH_MIG (as defined below), will "path1 is back again" event be delivered the same way? thanks, som. enum ib_event_type { IB_EVENT_CQ_ERR, IB_EVENT_QP_FATAL, IB_EVENT_QP_REQ_ERR, IB_EVENT_QP_ACCESS_ERR, IB_EVENT_COMM_EST, IB_EVENT_SQ_DRAINED, IB_EVENT_PATH_MIG, IB_EVENT_PATH_MIG_ERR, IB_EVENT_DEVICE_FATAL, IB_EVENT_PORT_ACTIVE, IB_EVENT_PORT_ERR, IB_EVENT_LID_CHANGE, IB_EVENT_PKEY_CHANGE, IB_EVENT_SM_CHANGE, IB_EVENT_SRQ_ERR, IB_EVENT_SRQ_LIMIT_REACHED, IB_EVENT_QP_LAST_WQE_REACHED }; From swise at opengridcomputing.com Fri Oct 27 15:28:37 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 17:28:37 -0500 Subject: [openib-general] [PATCH 2.6.19-rc3 v2 2/2] amso1100 - Fix incorrect pr_debug(). In-Reply-To: <20061027222835.10329.14982.stgit@dell3.ogc.int> References: <20061027222835.10329.14982.stgit@dell3.ogc.int> Message-ID: <20061027222837.10329.71614.stgit@dell3.ogc.int> pr_debug() was printing the wrong stuff. Signed-off-by: Steve Wise --- drivers/infiniband/hw/amso1100/c2_rnic.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 6d82464..e1d31fb 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -545,8 +545,8 @@ int c2_rnic_init(struct c2_dev *c2dev) goto bail2; } pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); - pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, - (unsigned long long) c2dev->rep_vq.host_dma); + pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q2_pages, + (unsigned long long) c2dev->aeq.host_dma); c2_mq_rep_init(&c2dev->aeq, 2, qsize, From swise at opengridcomputing.com Fri Oct 27 15:28:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 27 Oct 2006 17:28:35 -0500 Subject: [openib-general] [PATCH 2.6.19-rc3 v2 1/2] amso1100 - Use dma_alloc_coherent instead of kmalloc/dma_map_single. Message-ID: <20061027222835.10329.14982.stgit@dell3.ogc.int> v2 of this patch fixes a compiler warning I missed... Steve. ----- The Ammasso driver needs to use dma_alloc_coherent() for allocating memory that will be used by the HW for dma. Signed-off-by: Steve Wise --- drivers/infiniband/hw/amso1100/c2_alloc.c | 13 +++---- drivers/infiniband/hw/amso1100/c2_cq.c | 18 +++------- drivers/infiniband/hw/amso1100/c2_rnic.c | 52 ++++++++++++----------------- 3 files changed, 33 insertions(+), 50 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c b/drivers/infiniband/hw/amso1100/c2_alloc.c index 028a60b..1d30ef7 100644 --- a/drivers/infiniband/hw/amso1100/c2_alloc.c +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c @@ -42,13 +42,14 @@ static int c2_alloc_mqsp_chunk(struct c2 { int i; struct sp_chunk *new_head; + dma_addr_t dma_addr; - new_head = (struct sp_chunk *) __get_free_page(gfp_mask); + new_head = dma_alloc_coherent(&c2dev->pcidev->dev, PAGE_SIZE, + &dma_addr, gfp_mask); if (new_head == NULL) return -ENOMEM; - new_head->dma_addr = dma_map_single(c2dev->ibdev.dma_device, new_head, - PAGE_SIZE, DMA_FROM_DEVICE); + new_head->dma_addr = dma_addr; pci_unmap_addr_set(new_head, mapping, new_head->dma_addr); new_head->next = NULL; @@ -80,10 +81,8 @@ void c2_free_mqsp_pool(struct c2_dev *c2 while (root) { next = root->next; - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(root, mapping), PAGE_SIZE, - DMA_FROM_DEVICE); - __free_page((struct page *) root); + dma_free_coherent(&c2dev->pcidev->dev, PAGE_SIZE, root, + pci_unmap_addr(root, mapping)); root = next; } } diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c index 9d7bcc5..05c9154 100644 --- a/drivers/infiniband/hw/amso1100/c2_cq.c +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -246,20 +246,17 @@ int c2_arm_cq(struct ib_cq *ibcq, enum i static void c2_free_cq_buf(struct c2_dev *c2dev, struct c2_mq *mq) { - - dma_unmap_single(c2dev->ibdev.dma_device, pci_unmap_addr(mq, mapping), - mq->q_size * mq->msg_size, DMA_FROM_DEVICE); - free_pages((unsigned long) mq->msg_pool.host, - get_order(mq->q_size * mq->msg_size)); + dma_free_coherent(&c2dev->pcidev->dev, mq->q_size * mq->msg_size, + mq->msg_pool.host, pci_unmap_addr(mq, mapping)); } static int c2_alloc_cq_buf(struct c2_dev *c2dev, struct c2_mq *mq, int q_size, int msg_size) { - unsigned long pool_start; + u8 *pool_start; - pool_start = __get_free_pages(GFP_KERNEL, - get_order(q_size * msg_size)); + pool_start = dma_alloc_coherent(&c2dev->pcidev->dev, q_size * msg_size, + &mq->host_dma, GFP_KERNEL); if (!pool_start) return -ENOMEM; @@ -267,13 +264,10 @@ static int c2_alloc_cq_buf(struct c2_dev 0, /* index (currently unknown) */ q_size, msg_size, - (u8 *) pool_start, + pool_start, NULL, /* peer (currently unknown) */ C2_MQ_HOST_TARGET); - mq->host_dma = dma_map_single(c2dev->ibdev.dma_device, - (void *)pool_start, - q_size * msg_size, DMA_FROM_DEVICE); pci_unmap_addr_set(mq, mapping, mq->host_dma); return 0; diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c index 30409e1..6d82464 100644 --- a/drivers/infiniband/hw/amso1100/c2_rnic.c +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -517,14 +517,12 @@ int c2_rnic_init(struct c2_dev *c2dev) /* Initialize the Verbs Reply Queue */ qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_QSIZE)); msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_MSGSIZE)); - q1_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + q1_pages = dma_alloc_coherent(&c2dev->pcidev->dev, qsize * msgsize, + &c2dev->rep_vq.host_dma, GFP_KERNEL); if (!q1_pages) { err = -ENOMEM; goto bail1; } - c2dev->rep_vq.host_dma = dma_map_single(c2dev->ibdev.dma_device, - (void *)q1_pages, qsize * msgsize, - DMA_FROM_DEVICE); pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma); pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages, (unsigned long long) c2dev->rep_vq.host_dma); @@ -540,14 +538,12 @@ int c2_rnic_init(struct c2_dev *c2dev) /* Initialize the Asynchronus Event Queue */ qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_QSIZE)); msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_MSGSIZE)); - q2_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + q2_pages = dma_alloc_coherent(&c2dev->pcidev->dev, qsize * msgsize, + &c2dev->aeq.host_dma, GFP_KERNEL); if (!q2_pages) { err = -ENOMEM; goto bail2; } - c2dev->aeq.host_dma = dma_map_single(c2dev->ibdev.dma_device, - (void *)q2_pages, qsize * msgsize, - DMA_FROM_DEVICE); pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma); pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages, (unsigned long long) c2dev->rep_vq.host_dma); @@ -597,17 +593,13 @@ int c2_rnic_init(struct c2_dev *c2dev) bail4: vq_term(c2dev); bail3: - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->aeq, mapping), - c2dev->aeq.q_size * c2dev->aeq.msg_size, - DMA_FROM_DEVICE); - kfree(q2_pages); + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->aeq.q_size * c2dev->aeq.msg_size, + q2_pages, pci_unmap_addr(&c2dev->aeq, mapping)); bail2: - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->rep_vq, mapping), - c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, - DMA_FROM_DEVICE); - kfree(q1_pages); + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, + q1_pages, pci_unmap_addr(&c2dev->rep_vq, mapping)); bail1: c2_free_mqsp_pool(c2dev, c2dev->kern_mqsp_pool); bail0: @@ -640,19 +632,17 @@ void c2_rnic_term(struct c2_dev *c2dev) /* Free the verbs request allocator */ vq_term(c2dev); - /* Unmap and free the asynchronus event queue */ - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->aeq, mapping), - c2dev->aeq.q_size * c2dev->aeq.msg_size, - DMA_FROM_DEVICE); - kfree(c2dev->aeq.msg_pool.host); - - /* Unmap and free the verbs reply queue */ - dma_unmap_single(c2dev->ibdev.dma_device, - pci_unmap_addr(&c2dev->rep_vq, mapping), - c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, - DMA_FROM_DEVICE); - kfree(c2dev->rep_vq.msg_pool.host); + /* Free the asynchronus event queue */ + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->aeq.q_size * c2dev->aeq.msg_size, + c2dev->aeq.msg_pool.host, + pci_unmap_addr(&c2dev->aeq, mapping)); + + /* Free the verbs reply queue */ + dma_free_coherent(&c2dev->pcidev->dev, + c2dev->rep_vq.q_size * c2dev->rep_vq.msg_size, + c2dev->rep_vq.msg_pool.host, + pci_unmap_addr(&c2dev->rep_vq, mapping)); /* Free the MQ shared pointer pool */ c2_free_mqsp_pool(c2dev, c2dev->kern_mqsp_pool); From bgreen at nas.nasa.gov Fri Oct 27 15:52:04 2006 From: bgreen at nas.nasa.gov (Bryan Green) Date: Fri, 27 Oct 2006 15:52:04 -0700 Subject: [openib-general] openib/OFED-1.1 Gentoo Linux ebuilds In-Reply-To: Your message of "Fri, 27 Oct 2006 09:46:41 PDT." <200610271646.k9RGkfbT011475@ece06.nas.nasa.gov> Message-ID: <200610272252.k9RMq4Hr021906@ece06.nas.nasa.gov> OpenIB-1.1 packages have been added to the Gentoo Linux Science Overlay. If you are a Gentoo user: $ emerge layman $ layman -a science $ emerge -vp openib http://www.gentooscience.org/ -bryan From venkatesh.babu at 3leafnetworks.com Fri Oct 27 16:51:16 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Fri, 27 Oct 2006 16:51:16 -0700 Subject: [openib-general] APM support in openib stack In-Reply-To: <454127B0.8080109@veritas.com> References: <453F9A12.4020500@ichips.intel.com> <454127B0.8080109@veritas.com> Message-ID: <45429B74.8090607@3leafnetworks.com> I don't think there is any event which says "path1 is back again". It is the application which needs to load the alternate path. The HW just sends an event IB_EVENT_PORT_ACTIVE when port comes up. Upon recipt of the this event the application has to see if there exists a path from this port to the remote node and then load this alternate path by sending the APR message. PS: In Gen1 implementation there was an event called IB_PATH_MIG_ARMED which was generated by HW/FW after loading the alternate path by the application. SA event notification is to just callback registered handlers when IB_EVENT_PORT_ACTIVE event occurrs on any node in the subnet or on a specific node according to the registeration parameters. VBabu somenath wrote: > Sean, > > will there be a new API for SA event notification? > today we already get this IB_EVENT_PATH_MIG (as defined below), will > "path1 is back again" event > be delivered the same way? > > thanks, som. > > enum ib_event_type { > IB_EVENT_CQ_ERR, > IB_EVENT_QP_FATAL, > IB_EVENT_QP_REQ_ERR, > IB_EVENT_QP_ACCESS_ERR, > IB_EVENT_COMM_EST, > IB_EVENT_SQ_DRAINED, > IB_EVENT_PATH_MIG, > IB_EVENT_PATH_MIG_ERR, > IB_EVENT_DEVICE_FATAL, > IB_EVENT_PORT_ACTIVE, > IB_EVENT_PORT_ERR, > IB_EVENT_LID_CHANGE, > IB_EVENT_PKEY_CHANGE, > IB_EVENT_SM_CHANGE, > IB_EVENT_SRQ_ERR, > IB_EVENT_SRQ_LIMIT_REACHED, > IB_EVENT_QP_LAST_WQE_REACHED > }; From rdreier at cisco.com Fri Oct 27 16:33:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 16:33:12 -0700 Subject: [openib-general] [PATCH 2.6.19-rc3 v2 2/2] amso1100 - Fix incorrect pr_debug(). In-Reply-To: <20061027222837.10329.71614.stgit@dell3.ogc.int> (Steve Wise's message of "Fri, 27 Oct 2006 17:28:37 -0500") References: <20061027222835.10329.14982.stgit@dell3.ogc.int> <20061027222837.10329.71614.stgit@dell3.ogc.int> Message-ID: Applied, thanks. From rdreier at cisco.com Fri Oct 27 16:35:06 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 27 Oct 2006 16:35:06 -0700 Subject: [openib-general] [PATCH 2.6.19-rc3 v2 1/2] amso1100 - Use dma_alloc_coherent instead of kmalloc/dma_map_single. In-Reply-To: <20061027222835.10329.14982.stgit@dell3.ogc.int> (Steve Wise's message of "Fri, 27 Oct 2006 17:28:35 -0500") References: <20061027222835.10329.14982.stgit@dell3.ogc.int> Message-ID: tsk, tsk: fatal: 7 lines add trailing whitespaces. applied to for-2.6.19 anyway, thanks. From rdreier at cisco.com Fri Oct 27 16:42:23 2006 From: rdreier at cisco.com (Roland Dreier (rdreier)) Date: Fri, 27 Oct 2006 16:42:23 -0700 Subject: [openib-general] [PATCH] libibverbs-1.0: fix static linking Message-ID: It turns out that static linking of libibverbs never really worked, which makes me wonder whether people who insisted building mthca.a every actually tried it. Anyway, here's a patch that tries to fix things up, although only one device driver can be linked in at a time because everyone exports the same driver entry point. Comments / test results appreciated. I'll check this in now, and if I don't get any bug reports then I'll put out the latest libibverbs-1.0 tree as libibverbs-1.0.4 around Tuesday or Wednesday of next week. (I have a more complicated plan for libibverbs 1.1 that I'm still coding up) Thanks, Roland --- libibverbs-1.0/ChangeLog (revision 9973) +++ libibverbs-1.0/ChangeLog (working copy) @@ -1,3 +1,17 @@ +2006-10-27 Roland Dreier + + * src/init.c: Revise initialization order to fix static linking. + Using dlopen() on a device-specific driver from a statically + linked copy of libibverbs will crash, because the driver will + bring in dynamic copies of libibverbs and libdl that clash with + the copies already linked statically. + + To fix this, we change the way we search for drivers: first we + find all uverbs devices and try the driver (if any) that is + linked in directly. If all devices are handled by that driver, + then we don't proceed any further. If not, then we try dynamic + loading of drivers and match them against any remaining devices. + 2006-10-17 Roland Dreier * include/infiniband/arch.h: Update i386 and x86_64 memory barrier --- libibverbs-1.0/src/ibverbs.h (revision 9973) +++ libibverbs-1.0/src/ibverbs.h (working copy) @@ -60,6 +60,12 @@ #define PFX "libibverbs: " +struct ibv_sysfs_dev { + struct sysfs_class_device *verbs_dev; + struct ibv_sysfs_dev *next; + int have_driver; +}; + struct ibv_driver { ibv_driver_init_func init_func; struct ibv_driver *next; --- libibverbs-1.0/src/init.c (revision 9973) +++ libibverbs-1.0/src/init.c (working copy) @@ -52,11 +52,52 @@ HIDDEN int abi_ver; -static char default_path[] = DRIVER_PATH; +static const char default_path[] = DRIVER_PATH; static const char *user_path; +static struct ibv_sysfs_dev *sysfs_dev_list; static struct ibv_driver *driver_list; +static void find_sysfs_devs(void) +{ + struct sysfs_class *cls; + struct dlist *verbs_dev_list; + struct sysfs_class_device *verbs_dev; + struct ibv_sysfs_dev *dev; + + cls = sysfs_open_class("infiniband_verbs"); + if (!cls) { + fprintf(stderr, PFX "Fatal: couldn't open sysfs class 'infiniband_verbs'.\n"); + return; + } + + verbs_dev_list = sysfs_get_class_devices(cls); + if (!verbs_dev_list) { + fprintf(stderr, PFX "Fatal: no infiniband class devices found.\n"); + return; + } + + dlist_for_each_data(verbs_dev_list, verbs_dev, struct sysfs_class_device) { + dev = malloc(sizeof *dev); + if (!dev) { + fprintf(stderr, PFX "Warning: couldn't allocate device for %s\n", + verbs_dev->name); + continue; + } + + dev->verbs_dev = verbs_dev; + dev->next = sysfs_dev_list; + dev->have_driver = 0; + sysfs_dev_list = dev; + } +} + +__attribute__((weak)) +struct ibv_device *openib_driver_init(struct sysfs_class_device *dev) +{ + return NULL; +} + static void load_driver(char *so_path) { void *dlhandle; @@ -79,7 +120,7 @@ static void load_driver(char *so_path) driver = malloc(sizeof *driver); if (!driver) { - fprintf(stderr, PFX "Fatal: couldn't allocate driver for %s\n", so_path); + fprintf(stderr, PFX "Warning: couldn't allocate driver for %s\n", so_path); dlclose(dlhandle); return; } @@ -89,7 +130,7 @@ static void load_driver(char *so_path) driver_list = driver; } -static void find_drivers(char *dir) +static void find_drivers(const char *dir) { size_t len = strlen(dir); glob_t so_glob; @@ -101,9 +142,9 @@ static void find_drivers(char *dir) return; while (len && dir[len - 1] == '/') - dir[--len] = '\0'; + --len; - asprintf(&pat, "%s/*.so", dir); + asprintf(&pat, "%.*s/*.so", (int) len, dir); ret = glob(pat, 0, NULL, &so_glob); free(pat); @@ -120,10 +161,10 @@ static void find_drivers(char *dir) globfree(&so_glob); } -static struct ibv_device *init_drivers(struct sysfs_class_device *verbs_dev) +static struct ibv_device *try_driver(ibv_driver_init_func init_func, + struct sysfs_class_device *verbs_dev) { struct sysfs_class_device *ib_dev; - struct ibv_driver *driver; struct ibv_device *dev; char ibdev_name[64]; @@ -141,24 +182,14 @@ static struct ibv_device *init_drivers(s return NULL; } - for (driver = driver_list; driver; driver = driver->next) { - dev = driver->init_func(verbs_dev); - if (dev) { - dev->dev = verbs_dev; - dev->ibdev = ib_dev; - dev->driver = driver; - - return dev; - } + dev = init_func(verbs_dev); + if (dev) { + dev->dev = verbs_dev; + dev->ibdev = ib_dev; + dev->driver = NULL; } - fprintf(stderr, PFX "Warning: no userspace device-specific driver found for %s\n" - " driver search path: ", verbs_dev->name); - if (user_path) - fprintf(stderr, "%s:", user_path); - fprintf(stderr, "%s\n", default_path); - - return NULL; + return dev; } static int check_abi_version(void) @@ -191,26 +222,87 @@ static int check_abi_version(void) return 0; } +static void add_device(struct ibv_device *dev, + struct ibv_device ***dev_list, + int *num_devices, + int *list_size) +{ + struct ibv_device **new_list; + + if (*list_size <= *num_devices) { + *list_size = *list_size ? *list_size * 2 : 1; + new_list = realloc(*dev_list, *list_size * sizeof (struct ibv_device *)); + if (!new_list) + return; + *dev_list = new_list; + } + + *dev_list[*num_devices++] = dev; +} + HIDDEN int ibverbs_init(struct ibv_device ***list) { char *wr_path, *dir; - struct sysfs_class *cls; - struct dlist *verbs_dev_list; - struct sysfs_class_device *verbs_dev; + struct ibv_sysfs_dev *sysfs_dev, *next_dev; struct ibv_device *device; - struct ibv_device **new_list; + struct ibv_driver *driver; int num_devices = 0; int list_size = 0; + int no_driver = 0; + int statically_linked = 0; *list = NULL; + if (check_abi_version()) + return 0; + if (ibv_init_mem_map()) return 0; + find_sysfs_devs(); + + /* + * First check if a driver statically linked in can support + * all the devices. This is needed to avoid dlopen() in the + * all-static case (which will break because we end up with + * both a static and dynamic copy of libdl. + */ + for (sysfs_dev = sysfs_dev_list; sysfs_dev; sysfs_dev = sysfs_dev->next) { + device = try_driver(openib_driver_init, sysfs_dev->verbs_dev); + if (device) { + add_device(device, list, &num_devices, &list_size); + sysfs_dev->have_driver = 1; + } else + ++no_driver; + } + + if (!no_driver) + goto out; + + /* + * Check if we can dlopen() ourselves. If this fails, + * libibverbs is probably statically linked into the + * executable, and we should just give up, since trying to + * dlopen() a driver module will fail spectacularly (loading a + * driver .so will bring in dynamic copies of libibverbs and + * libdl to go along with the static copies the executable + * has, which quickly leads to a crash. + */ + { + void *hand = dlopen(NULL, RTLD_NOW); + if (!hand) { + fprintf(stderr, PFX "Warning: dlopen(NULL) failed, " + "assuming static linking.\n"); + statically_linked = 1; + goto out; + } + dlclose(hand); + } + find_drivers(default_path); /* - * Only follow use path passed in through the calling user's + * Only use path passed in through the calling user's * environment if we're not running SUID. */ if (getuid() == geteuid()) { @@ -222,42 +314,37 @@ HIDDEN int ibverbs_init(struct ibv_devic } } - /* - * Now check if a driver is statically linked. Since we push - * drivers onto our driver list, the last driver we find will - * be the first one we try. - */ - load_driver(NULL); - - cls = sysfs_open_class("infiniband_verbs"); - if (!cls) { - fprintf(stderr, PFX "Fatal: couldn't open sysfs class 'infiniband_verbs'.\n"); - return 0; - } - - if (check_abi_version()) - return 0; - - verbs_dev_list = sysfs_get_class_devices(cls); - if (!verbs_dev_list) { - fprintf(stderr, PFX "Fatal: no infiniband class devices found.\n"); - return 0; + for (sysfs_dev = sysfs_dev_list; sysfs_dev; sysfs_dev = sysfs_dev->next) { + if (sysfs_dev->have_driver) + continue; + for (driver = driver_list; driver; driver = driver->next) { + device = try_driver(driver->init_func, sysfs_dev->verbs_dev); + if (device) { + add_device(device, list, &num_devices, &list_size); + sysfs_dev->have_driver = 1; + } + } } - dlist_for_each_data(verbs_dev_list, verbs_dev, struct sysfs_class_device) { - device = init_drivers(verbs_dev); - if (device) { - if (list_size <= num_devices) { - list_size = list_size ? list_size * 2 : 1; - new_list = realloc(*list, list_size * sizeof (struct ibv_device *)); - if (!new_list) - goto out; - *list = new_list; +out: + for (sysfs_dev = sysfs_dev_list, next_dev = sysfs_dev->next; + sysfs_dev; + sysfs_dev = next_dev, next_dev = sysfs_dev ? sysfs_dev->next : NULL) { + if (!sysfs_dev->have_driver) { + fprintf(stderr, PFX "Warning: no userspace device-specific " + " driver found for %s\n", sysfs_dev->verbs_dev->name); + if (statically_linked) + fprintf(stderr, " When linking libibverbs statically, " + "driver must be statically linked too.\n"); + else { + fprintf(stderr, " driver search path: "); + if (user_path) + fprintf(stderr, "%s:", user_path); + fprintf(stderr, "%s\n", default_path); } - (*list)[num_devices++] = device; } + free(sysfs_dev); } -out: return num_devices; } --- libibverbs-1.0/README (revision 9973) +++ libibverbs-1.0/README (working copy) @@ -60,6 +60,23 @@ via the file /etc/security/limits.conf. necessary if you are logging in via OpenSSH and your sshd is configured to use privilege separation. +Static linking +-------------- + +In almost all cases it is better to dynamically link libibverbs into +an application. However, if you are forced to use static linking for +libibverbs, then you will also have to link a device-specific +userspace driver (such as libmthca, libipathverbs, libehca, etc) +statically into your application. This is because of limitations on +dynamically loading new modules into a static executable. + +In particular, a static application can only be linked against a +single device-specific driver, which means that the application will +only work with a single type of device. This limitation will be +removed in future libibverbs releases, but this will require a change +to the libibverbs ABI, so it cannot be done as part of the libibverbs +1.0 release series. + Valgrind support ---------------- From or.gerlitz at gmail.com Fri Oct 27 22:05:07 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Sat, 28 Oct 2006 07:05:07 +0200 Subject: [openib-general] OFED 1.1 Build Issue In-Reply-To: <1161963012.2748.29.camel@trinity.ogc.int> References: <1161963012.2748.29.camel@trinity.ogc.int> Message-ID: <15ddcffd0610272205n347c0f32k9c5e054b6611b795@mail.gmail.com> On 10/27/06, Tom Tucker wrote: > > I've been testing some code against the OFED 1.1 release and noticed > that if you build anything that depends on IB (RNFS in this case) into > the kernel, that the OFED kit doesn't work correctly. This is because > the dependent modules (ib_core, etc...) get sucked into the kernel > automagically and will cause the subsequent modprobe of the OFED module > to fail. We have also noted this failure to modeprobe 3rd party IB SW on top of the kernel OFED modules. I will ask the individuals working on that matter to send an RFC with the suggested solution so it can be reviewed and integrated into OFED. What's RNFS? Or. From muli at il.ibm.com Sat Oct 28 03:20:57 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Sat, 28 Oct 2006 12:20:57 +0200 Subject: [openib-general] problem with 2.6.19? In-Reply-To: References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> Message-ID: <20061028102057.GG4868@rhun.haifa.ibm.com> On Fri, Oct 27, 2006 at 11:15:40AM -0700, Roland Dreier wrote: > > I must be misusing dma_map_single(). What I'm doing is allocating a > > verb message reply queue for the adapter to DMA verb replies into. It > > never gets unmapped. I kmalloc() it, then map it. I could use > > dma_alloc_coherent() or something, and maybe that's what I need to do? > > Yeah, if you want to leave something mapped and have the device DMA > into it, and the CPU look into the buffer too, then you need > consistent/coherent memory -- either pci_alloc_consistent() or > dma_alloc_coherent(). The dma_ variant is slightly better because you > can pass in a GFP_ mask rather than having the kernel pick GFP_ATOMIC > for you. You can also use the sync_{single|sg}_for_{device|cpu} calls to manually sync the buffers (this will cause a memcpy for swiotlb). > > You're saying I must unmap it before the data is valid (cuz of the > > bounce buffering). No, sync_xxx should work as well to make the data valid. > > If that's true, then how in sam hill does user mode > > RDMA work since the user's memory isn't unmapped before the user looks > > like memory that is the target of RDMA???? The uverbs code calls > > dma_map_sg() which is roughly the same as dma_map_single, eh? > > It's a good point. We're kind of counting on the IOMMU situation not > being too wacky, Could you elaborate, what are the "requirements" for this to work with an arbitrary IOMMU (e.g., Calgary) than requires a mapping to exist before a DMA operation can go through? Cheers, Muli From sashak at voltaire.com Sat Oct 28 09:37:06 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 18:37:06 +0200 Subject: [openib-general] New server svn up In-Reply-To: <20061027120110.GB31235@mellanox.co.il> References: <1161826839.26066.99.camel@localhost> <20061026021635.GA14818@sashak.voltaire.com> <1161885128.31851.16.camel@localhost> <20061026211655.GH11425@sashak.voltaire.com> <20061027120110.GB31235@mellanox.co.il> Message-ID: <20061028163706.GB11470@sashak.voltaire.com> On 14:01 Fri 27 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > Subject: Re: [openib-general] New server svn up > > > > On 13:51 Thu 26 Oct , Roland Dreier wrote: > > > > That's up to the developers. I suggest folks try out the new server > > > > and move over to using git/svn on it as soon as possible. We can > > > > figure out how to clean up or remove the svn user space tree during the > > > > summit as SC06. > > > > > > How does one use git on the new server? > > > > To put your tree there you need user account. > > > > Then to make 'tree' publically available you can place it under > > ~rdreier/scm/ and this will be pullable as > > git://staging.openfabrics.org/~rdreier/tree , or under /pub/scm/ , then > > this will be available as git://staging.openfabrics.org/tree . > > > > Sasha > > Adding stuff under /pub/scm/ seems to require root account on the server. You will need to ask sysadmin to create directory (or symbolic link) there, but I guess it is needed only once. > Why do we need 2 places? /pub/scm will be useful for keeping "default" (or "official") git trees, (e.g symbolic links to maintainer's tree under ~maintainer/scm/). (Also as side effect, I suspect (not sure) that this helps to use simple gitweb configuration with single $projectroot). > Let's have everyone keep stuff under ~user/scm. I agree with this, but it is orthogonal to "/pub/scm". Sasha From swise at opengridcomputing.com Sat Oct 28 12:02:40 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Sat, 28 Oct 2006 14:02:40 -0500 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> Message-ID: <1162062160.29140.29.camel@stevo-desktop> Sean/Roland, FYI: I'm running with: linus's 2.6.19-rc3 git tree + Sean's 7 kernel patches that include ucma + Sean's librdmacm patch + my 2 amso kernel fixes + the libamso library. I'm successfully running user mode rping over the Ammasso rnic on this setup. :-) Steve. On Wed, 2006-10-25 at 13:49 -0700, Sean Hefty wrote: > Updates the librdmacm to work with ABI version 3, which is the proposed > kernel changes for inclusion in 2.6.20. > From sashak at voltaire.com Sat Oct 28 12:57:27 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 21:57:27 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: net to host conversion for printing Message-ID: <20061028195727.GA11988@sashak.voltaire.com> This converts guid value to host byte order before printing. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_mtree.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c index 65e9911..bd23192 100644 --- a/osm/opensm/osm_mtree.c +++ b/osm/opensm/osm_mtree.c @@ -123,8 +123,8 @@ __osm_mtree_dump( if (p_mtn == NULL) return; - printf("GUID:0x%016" PRIx64 " max_children:%d\n", - p_mtn->p_sw->p_node->node_info.node_guid, + printf("GUID:0x%016" PRIx64 " max_children:%u\n", + cl_ntoh64(p_mtn->p_sw->p_node->node_info.node_guid), p_mtn->max_children ); if ( p_mtn->child_array != NULL ) { @@ -135,5 +135,4 @@ __osm_mtree_dump( __osm_mtree_dump(p_mtn->child_array[i]); } } - } -- 1.4.3.2.g4bf7 From sashak at voltaire.com Sat Oct 28 13:00:53 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:00:53 +0200 Subject: [openib-general] [PATCH] opensm: fix node_desc.description as string usages Message-ID: <20061028200053.GC11988@sashak.voltaire.com> node_desc.description buffer is received from the network and should not be NULL-terminated. In such cases using it as regular string in functions like printf() leads to segmentation faults. This patch fixes such usages. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_drop_mgr.c | 26 +++++++++++++++++--------- osm/opensm/osm_helper.c | 7 +++++-- osm/opensm/osm_state_mgr.c | 23 ++++++++++++++++++----- 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c index 5ed320e..a35933d 100644 --- a/osm/opensm/osm_drop_mgr.c +++ b/osm/opensm/osm_drop_mgr.c @@ -145,7 +145,6 @@ __osm_drop_mgr_remove_port( ib_gid_t port_gid; ib_mad_notice_attr_t notice; ib_api_status_t status; - char* p_node_desc; OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_remove_port ); @@ -306,8 +305,9 @@ __osm_drop_mgr_remove_port( p_mcm = (osm_mcm_info_t*)cl_qlist_remove_head( &p_port->mcm_list ); } - /* initialize the p_node_desc */ - p_node_desc = p_port->p_node ? (char*)(p_port->p_node->node_desc.description) : "UNKNOWN"; + /* initialize the p_node - may need to get node_desc later */ + p_node = p_port->p_node; + osm_port_delete( &p_port ); /* issue a notice - trap 65 */ @@ -341,12 +341,20 @@ __osm_drop_mgr_remove_port( ib_get_err_str( status ) ); goto Exit; } - osm_log( p_mgr->p_log, OSM_LOG_INFO, - "Removed port with GUID:0x%016" PRIx64 - " LID range [0x%X,0x%X] of node:%s\n", - cl_ntoh64( port_gid.unicast.interface_id ), - min_lid_ho, max_lid_ho, p_node_desc ); - + if (osm_log_is_active( p_mgr->p_log, OSM_LOG_INFO )) + { + char desc[IB_NODE_DESCRIPTION_SIZE]; + if (p_node) + { + memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + } + osm_log( p_mgr->p_log, OSM_LOG_INFO, + "Removed port with GUID:0x%016" PRIx64 + " LID range [0x%X,0x%X] of node:%s\n", + cl_ntoh64( port_gid.unicast.interface_id ), + min_lid_ho, max_lid_ho, p_node ? desc : "UNKNOWN" ); + } Exit: OSM_LOG_EXIT( p_mgr->p_log ); } diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c index b06b2f2..100892f 100644 --- a/osm/opensm/osm_helper.c +++ b/osm/opensm/osm_helper.c @@ -1039,6 +1039,10 @@ osm_dump_node_record( if( osm_log_is_active( p_log, log_level ) ) { + char desc[sizeof(p_nr->node_desc.description) + 1]; + memcpy(desc, p_nr->node_desc.description, + sizeof(p_nr->node_desc.description)); + desc[sizeof(desc) - 1] = '\0'; osm_log( p_log, log_level, "Node Record dump:\n" "\t\t\t\tRID\n" @@ -1074,9 +1078,8 @@ osm_dump_node_record( cl_ntoh32( p_ni->revision ), ib_node_info_get_local_port_num( p_ni ), cl_ntoh32( ib_node_info_get_vendor_id( p_ni )), - p_nr->node_desc.description + desc ); - } } diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 9c159df..c1e6d01 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -1072,6 +1072,7 @@ static void osm_topology_file_create( IN osm_state_mgr_t * const p_mgr ) { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; const osm_node_t *p_node; char *file_name; FILE *rc; @@ -1136,6 +1137,10 @@ osm_topology_file_create( p_default_physp = p_physp; } + memcpy(desc, p_node->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + fprintf( rc, "{ %s%s Ports:%02X" " SystemGUID:%016" PRIx64 " NodeGUID:%016" PRIx64 @@ -1158,7 +1163,7 @@ osm_topology_file_create( ( &p_node->node_info ) ), cl_ntoh32( p_node->node_info.device_id ), cl_ntoh32( p_node->node_info.revision ), - p_node->node_desc.description, + desc, cl_ntoh16( p_default_physp->port_info.base_lid ), cPort ); @@ -1173,6 +1178,9 @@ osm_topology_file_create( p_default_physp = p_rphysp; } + memcpy(desc, p_nbnode->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf( rc, "{ %s%s Ports:%02X" " SystemGUID:%016" PRIx64 @@ -1196,7 +1204,7 @@ osm_topology_file_create( ( &p_nbnode->node_info ) ), cl_ntoh32( p_nbnode->node_info.device_id ), cl_ntoh32( p_nbnode->node_info.revision ), - p_nbnode->node_desc.description, + desc, cl_ntoh16( p_default_physp->port_info.base_lid ), p_rphysp->port_num ); @@ -1645,6 +1653,7 @@ static void __osm_state_mgr_report_new_ports( IN osm_state_mgr_t * const p_mgr ) { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; osm_port_t *p_port; ib_gid_t port_gid; ib_mad_notice_attr_t notice; @@ -1693,14 +1702,18 @@ __osm_state_mgr_report_new_ports( ib_get_err_str( status ) ); } osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); + if (p_port->p_node) + { + memcpy(desc, p_port->p_node->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + } osm_log( p_mgr->p_log, OSM_LOG_INFO, "Discovered new port with GUID:0x%016" PRIx64 " LID range [0x%X,0x%X] of node:%s\n", cl_ntoh64( port_gid.unicast.interface_id ), min_lid_ho, max_lid_ho, - ( p_port->p_node ? - ( char * )( p_port->p_node->node_desc.description ) : - "UNKNOWN" ) ); + p_port->p_node ? desc : "UNKNOWN" ); p_port = ( osm_port_t -- 1.4.3.2.g4bf7 From sashak at voltaire.com Sat Oct 28 13:03:07 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:03:07 +0200 Subject: [openib-general] [PATCH] osmtest: fix node_desc.description as string usages Message-ID: <20061028200307.GE11988@sashak.voltaire.com> node_desc.description buffer is received from the network and should not be NULL-terminated. In such cases using it as regular string in functions like printf() leads to segmentation faults. This patch fixes such usages. Signed-off-by: Sasha Khapyorsky --- osm/osmtest/osmtest.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c index b4d9498..6444b9d 100644 --- a/osm/osmtest/osmtest.c +++ b/osm/osmtest/osmtest.c @@ -1984,11 +1984,15 @@ osmtest_write_node_info( IN osmtest_t * IN FILE * fh, IN const ib_node_record_t * const p_rec ) { + char desc[IB_NODE_DESCRIPTION_SIZE]; int result; cl_status_t status = IB_SUCCESS; OSM_LOG_ENTER( &p_osmt->log, osmtest_write_node_info ); + memcpy(desc, p_rec->node_desc.description, IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + result = fprintf( fh, "DEFINE_NODE\n" "lid 0x%X\n" @@ -2021,7 +2025,7 @@ osmtest_write_node_info( IN osmtest_t * ib_node_info_get_local_port_num( &p_rec->node_info ), cl_ntoh32( ib_node_info_get_vendor_id ( &p_rec->node_info ) ), - p_rec->node_desc.description ); + desc ); if( result < 0 ) { -- 1.4.3.2.g4bf7 From sashak at voltaire.com Sat Oct 28 13:04:25 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:04:25 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages Message-ID: <20061028200425.GF11988@sashak.voltaire.com> node_desc.description buffer is received from the network and should not be NULL-terminated. In such cases using it as regular string in functions like strcmp() or printf() leads to segmentation faults. This patch fixes such usages. Signed-off-by: Sasha Khapyorsky --- diags/src/saquery.c | 22 ++++++++++++++++------ 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/diags/src/saquery.c b/diags/src/saquery.c index 5b4a85e..f5b23fd 100644 --- a/diags/src/saquery.c +++ b/diags/src/saquery.c @@ -90,17 +90,21 @@ static void print_node_desc(ib_node_record_t *node_record) { ib_node_info_t *p_ni = &(node_record->node_info); + ib_node_desc_t *p_nd = &(node_record->node_desc); if (p_ni->node_type == IB_NODE_TYPE_CA) { + char desc[sizeof(p_nd->description) + 1]; + memcpy(desc, p_nd->description, sizeof(p_nd->description)); + desc[sizeof(desc) - 1] = '\0'; printf("%6d \"%s\"\n", - cl_ntoh16(node_record->lid), - node_record->node_desc.description); + cl_ntoh16(node_record->lid), desc); } } void print_node_record(ib_node_record_t *node_record) { + char desc[sizeof(node_record->node_desc.description) + 1]; ib_node_info_t *p_ni = NULL; p_ni = &(node_record->node_info); @@ -117,6 +121,10 @@ print_node_record(ib_node_record_t *node break; } + memcpy(desc, node_record->node_desc.description, + sizeof(node_record->node_desc.description)); + desc[sizeof(desc) - 1] = '\0'; + printf("NodeRecord dump:\n" "\t\tlid.....................0x%X\n" "\t\treserved................0x%X\n" @@ -148,7 +156,7 @@ print_node_record(ib_node_record_t *node cl_ntoh32( p_ni->revision ), ib_node_info_get_local_port_num( p_ni ), cl_ntoh32( ib_node_info_get_vendor_id( p_ni )), - node_record->node_desc.description + desc ); } @@ -448,8 +456,9 @@ print_node_records(osm_bind_handle_t bin print_node_desc(node_record); } else { if (!requested_name || - (strcmp(requested_name, - (char *)node_record->node_desc.description) == 0)) { + (strncmp(requested_name, + (char *)node_record->node_desc.description, + sizeof(node_record->node_desc.description)) == 0)) { print_node_record(node_record); if (node_print_desc == UNIQUE_LID_ONLY) { return_mad(); @@ -481,7 +490,8 @@ get_lid_from_name(osm_bind_handle_t bind for (i = 0; i < result.result_cnt; i++) { node_record = osmv_get_query_node_rec(result.p_result_madw, i); p_ni = &(node_record->node_info); - if (name && strcmp(name, node_record->node_desc.description) == 0) { + if (name && strncmp(name, (char *)node_record->node_desc.description, + sizeof(node_record->node_desc.description)) == 0) { *lid = cl_ntoh16(node_record->lid); break; } -- 1.4.3.2.g4bf7 From sashak at voltaire.com Sat Oct 28 13:06:26 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:06:26 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: indentation fixes Message-ID: <20061028200626.GG11988@sashak.voltaire.com> Some trivial indentation fixes. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_mcm_port.h | 18 ++++++------ osm/include/opensm/osm_mtree.h | 11 +++---- osm/include/opensm/osm_multicast.h | 57 ++++++++++++++++++----------------- 3 files changed, 43 insertions(+), 43 deletions(-) diff --git a/osm/include/opensm/osm_mcm_port.h b/osm/include/opensm/osm_mcm_port.h index 206c248..1c68247 100644 --- a/osm/include/opensm/osm_mcm_port.h +++ b/osm/include/opensm/osm_mcm_port.h @@ -78,11 +78,10 @@ BEGIN_C_DECLS */ typedef struct _osm_mcm_port { - cl_map_item_t map_item; - ib_gid_t port_gid; - uint8_t scope_state; - boolean_t proxy_join; - + cl_map_item_t map_item; + ib_gid_t port_gid; + uint8_t scope_state; + boolean_t proxy_join; } osm_mcm_port_t; /* * FIELDS @@ -95,10 +94,11 @@ typedef struct _osm_mcm_port * scope_state * ??? * -* proxy_join -* If FALSE - Join was performed by the endport identified by PortGID -* If TRUE - Join was performed on behalf of the endport identified -* by PortGID by another port within the same partition +* proxy_join +* If FALSE - Join was performed by the endport identified +* by PortGID. If TRUE - Join was performed on behalf of +* the endport identified by PortGID by another port within +* the same partition * * SEE ALSO * MCM Port Object diff --git a/osm/include/opensm/osm_mtree.h b/osm/include/opensm/osm_mtree.h index 013112d..10270df 100644 --- a/osm/include/opensm/osm_mtree.h +++ b/osm/include/opensm/osm_mtree.h @@ -110,12 +110,11 @@ #define OSM_MTREE_LEAF ((void*)-1) */ typedef struct _osm_mtree_node { - cl_map_item_t map_item; - osm_switch_t *p_sw; - uint8_t max_children; - struct _osm_mtree_node *p_up; - struct _osm_mtree_node *child_array[1]; - + cl_map_item_t map_item; + osm_switch_t *p_sw; + uint8_t max_children; + struct _osm_mtree_node *p_up; + struct _osm_mtree_node *child_array[1]; } osm_mtree_node_t; /* * FIELDS diff --git a/osm/include/opensm/osm_multicast.h b/osm/include/opensm/osm_multicast.h index 44b0bb1..56970ff 100644 --- a/osm/include/opensm/osm_multicast.h +++ b/osm/include/opensm/osm_multicast.h @@ -162,15 +162,15 @@ typedef struct osm_mcast_mgr_ctxt */ typedef struct _osm_mgrp { - cl_map_item_t map_item; - ib_net16_t mlid; - osm_mtree_node_t *p_root; - cl_qmap_t mcm_port_tbl; - ib_member_rec_t mcmember_rec; - boolean_t well_known; - boolean_t to_be_deleted; - uint32_t last_change_id; - uint32_t last_tree_id; + cl_map_item_t map_item; + ib_net16_t mlid; + osm_mtree_node_t *p_root; + cl_qmap_t mcm_port_tbl; + ib_member_rec_t mcmember_rec; + boolean_t well_known; + boolean_t to_be_deleted; + uint32_t last_change_id; + uint32_t last_tree_id; } osm_mgrp_t; /* * FIELDS @@ -178,7 +178,8 @@ typedef struct _osm_mgrp * Map Item for qmap linkage. Must be first element!! * * mlid -* The network ordered LID of this Multicast Group (must be >= 0xC000). +* The network ordered LID of this Multicast Group (must be +* >= 0xC000). * * p_root * Pointer to the root "tree node" in the single spanning tree @@ -186,29 +187,29 @@ typedef struct _osm_mgrp * switches. Member ports are not represented in the tree. * * mcm_port_tbl -* Table (sorted by port GUID) of osm_mcm_port_t objects representing -* the member ports of this multicast group. +* Table (sorted by port GUID) of osm_mcm_port_t objects +* representing the member ports of this multicast group. * * mcmember_rec * Hold the parameters of the Multicast Group. * * well_known -* Indicates that this is the wellknow multicast group which is created -* during the initialization of SM/SA and will be present even if -* there are no ports for this group -* -* to_be_deleted -* Since groups are deleted only after re-route we need to track the -* fact the group is about to be deleted so we can track the fact a -* new join is actually a create request. -* -* last_change_id -* a counter for the number of changes applied to the group. -* this counter shuold be incremented on any modification to the group: -* joining or leaving of ports. -* -* last_tree_id -* the last change id used for building the current tree. +* Indicates that this is the wellknow multicast group which +* is created during the initialization of SM/SA and will be +* present even if there are no ports for this group +* +* to_be_deleted +* Since groups are deleted only after re-route we need to +* track the fact the group is about to be deleted so we can +* track the fact a new join is actually a create request. +* +* last_change_id +* a counter for the number of changes applied to the group. +* This counter shuold be incremented on any modification +* to the group: joining or leaving of ports. +* +* last_tree_id +* the last change id used for building the current tree. * * SEE ALSO *********/ -- 1.4.3.2.g4bf7 From muli at il.ibm.com Sat Oct 28 13:09:34 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Sat, 28 Oct 2006 22:09:34 +0200 Subject: [openib-general] [PATCH] osmtest: fix node_desc.description as string usages In-Reply-To: <20061028200307.GE11988@sashak.voltaire.com> References: <20061028200307.GE11988@sashak.voltaire.com> Message-ID: <20061028200934.GH4868@rhun.haifa.ibm.com> On Sat, Oct 28, 2006 at 10:03:07PM +0200, Sasha Khapyorsky wrote: > > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like printf() leads to segmentation faults. This patch fixes > such usages. > > Signed-off-by: Sasha Khapyorsky > --- > osm/osmtest/osmtest.c | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c > index b4d9498..6444b9d 100644 > --- a/osm/osmtest/osmtest.c > +++ b/osm/osmtest/osmtest.c > @@ -1984,11 +1984,15 @@ osmtest_write_node_info( IN osmtest_t * > IN FILE * fh, > IN const ib_node_record_t * const p_rec ) > { > + char desc[IB_NODE_DESCRIPTION_SIZE]; > int result; > cl_status_t status = IB_SUCCESS; > > OSM_LOG_ENTER( &p_osmt->log, osmtest_write_node_info ); > > + memcpy(desc, p_rec->node_desc.description, IB_NODE_DESCRIPTION_SIZE); > + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; You mean IB_NODE_DESCRIPTION_SIZE - 1 here. Cheers, Muli From rdreier at cisco.com Sat Oct 28 13:10:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 28 Oct 2006 13:10:54 -0700 Subject: [openib-general] problem with 2.6.19? References: <1161901218.4280.55.camel@stevo-desktop> <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> <20061028102057.GG4868@rhun.haifa.ibm.com> Message-ID: > You can also use the sync_{single|sg}_for_{device|cpu} calls to manually sync > the buffers (this will cause a memcpy for swiotlb). True, although in this case it's much simpler just to use coherent memory. Also, the dma_sync_xxx calls don't _always_ work -- for example if both the CPU and the device need to touch different parts of a cacheline at the same time on a non-coherent architecture. > Could you elaborate, what are the "requirements" for this to work with > an arbitrary IOMMU (e.g., Calgary) than requires a mapping to exist > before a DMA operation can go through? Calgary should be fine, since pseries works Ok. I guess the requirement for userspace RDMA to work is that no further action is required after the dma_map_sg() for both the device and the CPU to touch the region. Because userspace has no way for calling dma_sync_xxx or anything like that. - R. From sashak at voltaire.com Sat Oct 28 13:17:50 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:17:50 +0200 Subject: [openib-general] [PATCH] opensm: osm_ucast_mgr.c: fix node_desc.description as string usages [was: [PATCH 2/5] opensm: ucast_mgr dumper unification] In-Reply-To: <116129014671-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> <116129014671-git-send-email-sashak@voltaire.com> Message-ID: <20061028201750.GH11988@sashak.voltaire.com> On 22:35 Thu 19 Oct , Sasha Khapyorsky wrote: > This unifies ucsat_mgr dumper. Main goal is to provide infrastructure > for different dump file generation using the same routines. > > Signed-off-by: Sasha Khapyorsky > --- > osm/opensm/osm_ucast_mgr.c | 104 +++++++++++++++++++++++--------------------- > 1 files changed, 55 insertions(+), 49 deletions(-) > And there is incremental patch already: node_desc.description buffer is received from the network and should not be NULL-terminated. In such cases using it as regular string in functions like printf() leads to segmentation faults. This patch fixes such usages. This was in new lft dumper too. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_ucast_mgr.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index c0c1738..d008d91 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -355,6 +355,7 @@ ucast_mgr_dump_lid_matrix(cl_map_item_t static void ucast_mgr_dump_lfts(cl_map_item_t *p_map_item, void *cxt) { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; osm_switch_t* p_sw = (osm_switch_t *)p_map_item; osm_ucast_mgr_t* p_mgr = ((struct ucast_mgr_dump_context *)cxt)->p_mgr; FILE *file = ((struct ucast_mgr_dump_context *)cxt)->file; @@ -364,11 +365,12 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map uint16_t lid; uint8_t port; + memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf(file, "Unicast lids [0x0-0x%x] of switch Lid %u guid 0x%016" PRIx64 " (\'%s\'):\n", max_lid, osm_node_get_base_lid(p_node, 0), - cl_ntoh64(osm_node_get_node_guid(p_node)), - p_node->node_desc.description); + cl_ntoh64(osm_node_get_node_guid(p_node)), desc); for (lid = 0; lid <= max_lid; lid++) { osm_port_t *p_port; port = osm_switch_get_port_by_lid(p_sw, lid); @@ -381,10 +383,12 @@ ucast_mgr_dump_lfts(cl_map_item_t *p_map p_port = cl_ptr_vector_get(&p_mgr->p_subn->port_lid_tbl, lid); if (p_port) { p_node = osm_port_get_parent_node(p_port); + memcpy(desc, p_node->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf(file, "%s portguid 0x016%" PRIx64 ": \'%s\'", ib_get_node_type_str(osm_node_get_type(p_node)), - cl_ntoh64(osm_port_get_guid(p_port)), - p_node->node_desc.description); + cl_ntoh64(osm_port_get_guid(p_port)), desc); } else fprintf(file, "unknown node and type"); -- 1.4.3.2.g4bf7 From muli at il.ibm.com Sat Oct 28 13:12:11 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Sat, 28 Oct 2006 22:12:11 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061028200425.GF11988@sashak.voltaire.com> References: <20061028200425.GF11988@sashak.voltaire.com> Message-ID: <20061028201210.GI4868@rhun.haifa.ibm.com> On Sat, Oct 28, 2006 at 10:04:25PM +0200, Sasha Khapyorsky wrote: > > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like strcmp() or printf() leads to segmentation faults. > This patch fixes such usages. > > Signed-off-by: Sasha Khapyorsky > --- > diags/src/saquery.c | 22 ++++++++++++++++------ > 1 files changed, 16 insertions(+), 6 deletions(-) > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > index 5b4a85e..f5b23fd 100644 > --- a/diags/src/saquery.c > +++ b/diags/src/saquery.c > @@ -90,17 +90,21 @@ static void > print_node_desc(ib_node_record_t *node_record) > { > ib_node_info_t *p_ni = &(node_record->node_info); > + ib_node_desc_t *p_nd = &(node_record->node_desc); > if (p_ni->node_type == IB_NODE_TYPE_CA) > { > + char desc[sizeof(p_nd->description) + 1]; > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > + desc[sizeof(desc) - 1] = '\0'; No need for the -1 here - desc is (sizeof(p_nd->description) + 1), so the terminating NULL should be at index sizeof(). > void > print_node_record(ib_node_record_t *node_record) > { > + char desc[sizeof(node_record->node_desc.description) + 1]; > ib_node_info_t *p_ni = NULL; > p_ni = &(node_record->node_info); > > @@ -117,6 +121,10 @@ print_node_record(ib_node_record_t *node > break; > } > > + memcpy(desc, node_record->node_desc.description, > + sizeof(node_record->node_desc.description)); > + desc[sizeof(desc) - 1] = '\0'; Same thing. Cheers, Muli From sashak at voltaire.com Sat Oct 28 13:21:14 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:21:14 +0200 Subject: [openib-general] [PATCH] osmtest: fix node_desc.description as string usages In-Reply-To: <20061028200934.GH4868@rhun.haifa.ibm.com> References: <20061028200307.GE11988@sashak.voltaire.com> <20061028200934.GH4868@rhun.haifa.ibm.com> Message-ID: <20061028202114.GI11988@sashak.voltaire.com> On 22:09 Sat 28 Oct , Muli Ben-Yehuda wrote: > On Sat, Oct 28, 2006 at 10:03:07PM +0200, Sasha Khapyorsky wrote: > > > > node_desc.description buffer is received from the network and should > > not be NULL-terminated. In such cases using it as regular string in > > functions like printf() leads to segmentation faults. This patch fixes > > such usages. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > osm/osmtest/osmtest.c | 6 +++++- > > 1 files changed, 5 insertions(+), 1 deletions(-) > > > > diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c > > index b4d9498..6444b9d 100644 > > --- a/osm/osmtest/osmtest.c > > +++ b/osm/osmtest/osmtest.c > > @@ -1984,11 +1984,15 @@ osmtest_write_node_info( IN osmtest_t * > > IN FILE * fh, > > IN const ib_node_record_t * const p_rec ) > > { > > + char desc[IB_NODE_DESCRIPTION_SIZE]; > > int result; > > cl_status_t status = IB_SUCCESS; > > > > OSM_LOG_ENTER( &p_osmt->log, osmtest_write_node_info ); > > > > + memcpy(desc, p_rec->node_desc.description, IB_NODE_DESCRIPTION_SIZE); > > + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; > > You mean IB_NODE_DESCRIPTION_SIZE - 1 here. No, but I meant IB_NODE_DESCRIPTION_SIZE + 1 above. Thanks for catching. Sasha From sashak at voltaire.com Sat Oct 28 13:25:32 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:25:32 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061028201210.GI4868@rhun.haifa.ibm.com> References: <20061028200425.GF11988@sashak.voltaire.com> <20061028201210.GI4868@rhun.haifa.ibm.com> Message-ID: <20061028202532.GJ11988@sashak.voltaire.com> On 22:12 Sat 28 Oct , Muli Ben-Yehuda wrote: > On Sat, Oct 28, 2006 at 10:04:25PM +0200, Sasha Khapyorsky wrote: > > > > node_desc.description buffer is received from the network and should > > not be NULL-terminated. In such cases using it as regular string in > > functions like strcmp() or printf() leads to segmentation faults. > > This patch fixes such usages. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > diags/src/saquery.c | 22 ++++++++++++++++------ > > 1 files changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > > index 5b4a85e..f5b23fd 100644 > > --- a/diags/src/saquery.c > > +++ b/diags/src/saquery.c > > @@ -90,17 +90,21 @@ static void > > print_node_desc(ib_node_record_t *node_record) > > { > > ib_node_info_t *p_ni = &(node_record->node_info); > > + ib_node_desc_t *p_nd = &(node_record->node_desc); > > if (p_ni->node_type == IB_NODE_TYPE_CA) > > { > > + char desc[sizeof(p_nd->description) + 1]; > > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > > + desc[sizeof(desc) - 1] = '\0'; > > No need for the -1 here - desc is (sizeof(p_nd->description) + 1), so > the terminating NULL should be at index sizeof(). At index sizeof(p_nd->description) - yes, but not at sizeof(desc) as it is used here (this one has extra byte). Sasha From sashak at voltaire.com Sat Oct 28 13:33:40 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:33:40 +0200 Subject: [openib-general] [PATCH v2] osmtest: fix node_desc.description as string usages In-Reply-To: <20061028200307.GE11988@sashak.voltaire.com> References: <20061028200307.GE11988@sashak.voltaire.com> Message-ID: <20061028203340.GK11988@sashak.voltaire.com> node_desc.description buffer is received from the network and should not be NULL-terminated. In such cases using it as regular string in functions like printf() leads to segmentation faults. This patch fixes such usages. Signed-off-by: Sasha Khapyorsky --- osm/osmtest/osmtest.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c index b4d9498..6444b9d 100644 --- a/osm/osmtest/osmtest.c +++ b/osm/osmtest/osmtest.c @@ -1984,11 +1984,15 @@ osmtest_write_node_info( IN osmtest_t * IN FILE * fh, IN const ib_node_record_t * const p_rec ) { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; int result; cl_status_t status = IB_SUCCESS; OSM_LOG_ENTER( &p_osmt->log, osmtest_write_node_info ); + memcpy(desc, p_rec->node_desc.description, IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + result = fprintf( fh, "DEFINE_NODE\n" "lid 0x%X\n" @@ -2021,7 +2025,7 @@ osmtest_write_node_info( IN osmtest_t * ib_node_info_get_local_port_num( &p_rec->node_info ), cl_ntoh32( ib_node_info_get_vendor_id ( &p_rec->node_info ) ), - p_rec->node_desc.description ); + desc ); if( result < 0 ) { -- 1.4.3.2.g4bf7 From halr at voltaire.com Sat Oct 28 13:23:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Oct 2006 16:23:43 -0400 Subject: [openib-general] [PATCH] opensm: autogen.sh: tools version verification fixes In-Reply-To: <20061023214011.GA6311@sashak.voltaire.com> References: <20061023214011.GA6311@sashak.voltaire.com> Message-ID: <1162066990.22403.326681.camel@hal.voltaire.com> On Mon, 2006-10-23 at 17:40, Sasha Khapyorsky wrote: > This fixes couple of things related to tools version verifications in > autogen.sh. Originally autogen.sh was claiming that automake-1.10 is > older that automake-1.6.3 and was failing with zero exit status, so: > > - regular expression fix - proper version string separation > - numeric camparison for extracted version elements > - non-zero exit status when old tools are detected > - slightly improved condition statements > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From sashak at voltaire.com Sat Oct 28 13:41:03 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 28 Oct 2006 22:41:03 +0200 Subject: [openib-general] [PATCH v2] opensm: fix node_desc.description as string usages In-Reply-To: <20061028200053.GC11988@sashak.voltaire.com> References: <20061028200053.GC11988@sashak.voltaire.com> Message-ID: <20061028204103.GL11988@sashak.voltaire.com> Hmm, in one place there was the same copy-paste error as in the osmtest patch. Resend this one too... Sasha node_desc.description buffer is received from the network and should not be NULL-terminated. In such cases using it as regular string in functions like printf() leads to segmentation faults. This patch fixes such usages. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_drop_mgr.c | 26 +++++++++++++++++--------- osm/opensm/osm_helper.c | 7 +++++-- osm/opensm/osm_state_mgr.c | 23 ++++++++++++++++++----- 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c index 5ed320e..a35933d 100644 --- a/osm/opensm/osm_drop_mgr.c +++ b/osm/opensm/osm_drop_mgr.c @@ -145,7 +145,6 @@ __osm_drop_mgr_remove_port( ib_gid_t port_gid; ib_mad_notice_attr_t notice; ib_api_status_t status; - char* p_node_desc; OSM_LOG_ENTER( p_mgr->p_log, __osm_drop_mgr_remove_port ); @@ -306,8 +305,9 @@ __osm_drop_mgr_remove_port( p_mcm = (osm_mcm_info_t*)cl_qlist_remove_head( &p_port->mcm_list ); } - /* initialize the p_node_desc */ - p_node_desc = p_port->p_node ? (char*)(p_port->p_node->node_desc.description) : "UNKNOWN"; + /* initialize the p_node - may need to get node_desc later */ + p_node = p_port->p_node; + osm_port_delete( &p_port ); /* issue a notice - trap 65 */ @@ -341,12 +341,20 @@ __osm_drop_mgr_remove_port( ib_get_err_str( status ) ); goto Exit; } - osm_log( p_mgr->p_log, OSM_LOG_INFO, - "Removed port with GUID:0x%016" PRIx64 - " LID range [0x%X,0x%X] of node:%s\n", - cl_ntoh64( port_gid.unicast.interface_id ), - min_lid_ho, max_lid_ho, p_node_desc ); - + if (osm_log_is_active( p_mgr->p_log, OSM_LOG_INFO )) + { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; + if (p_node) + { + memcpy(desc, p_node->node_desc.description, IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + } + osm_log( p_mgr->p_log, OSM_LOG_INFO, + "Removed port with GUID:0x%016" PRIx64 + " LID range [0x%X,0x%X] of node:%s\n", + cl_ntoh64( port_gid.unicast.interface_id ), + min_lid_ho, max_lid_ho, p_node ? desc : "UNKNOWN" ); + } Exit: OSM_LOG_EXIT( p_mgr->p_log ); } diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c index b06b2f2..100892f 100644 --- a/osm/opensm/osm_helper.c +++ b/osm/opensm/osm_helper.c @@ -1039,6 +1039,10 @@ osm_dump_node_record( if( osm_log_is_active( p_log, log_level ) ) { + char desc[sizeof(p_nr->node_desc.description) + 1]; + memcpy(desc, p_nr->node_desc.description, + sizeof(p_nr->node_desc.description)); + desc[sizeof(desc) - 1] = '\0'; osm_log( p_log, log_level, "Node Record dump:\n" "\t\t\t\tRID\n" @@ -1074,9 +1078,8 @@ osm_dump_node_record( cl_ntoh32( p_ni->revision ), ib_node_info_get_local_port_num( p_ni ), cl_ntoh32( ib_node_info_get_vendor_id( p_ni )), - p_nr->node_desc.description + desc ); - } } diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 9c159df..c1e6d01 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -1072,6 +1072,7 @@ static void osm_topology_file_create( IN osm_state_mgr_t * const p_mgr ) { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; const osm_node_t *p_node; char *file_name; FILE *rc; @@ -1136,6 +1137,10 @@ osm_topology_file_create( p_default_physp = p_physp; } + memcpy(desc, p_node->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + fprintf( rc, "{ %s%s Ports:%02X" " SystemGUID:%016" PRIx64 " NodeGUID:%016" PRIx64 @@ -1158,7 +1163,7 @@ osm_topology_file_create( ( &p_node->node_info ) ), cl_ntoh32( p_node->node_info.device_id ), cl_ntoh32( p_node->node_info.revision ), - p_node->node_desc.description, + desc, cl_ntoh16( p_default_physp->port_info.base_lid ), cPort ); @@ -1173,6 +1178,9 @@ osm_topology_file_create( p_default_physp = p_rphysp; } + memcpy(desc, p_nbnode->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; fprintf( rc, "{ %s%s Ports:%02X" " SystemGUID:%016" PRIx64 @@ -1196,7 +1204,7 @@ osm_topology_file_create( ( &p_nbnode->node_info ) ), cl_ntoh32( p_nbnode->node_info.device_id ), cl_ntoh32( p_nbnode->node_info.revision ), - p_nbnode->node_desc.description, + desc, cl_ntoh16( p_default_physp->port_info.base_lid ), p_rphysp->port_num ); @@ -1645,6 +1653,7 @@ static void __osm_state_mgr_report_new_ports( IN osm_state_mgr_t * const p_mgr ) { + char desc[IB_NODE_DESCRIPTION_SIZE + 1]; osm_port_t *p_port; ib_gid_t port_gid; ib_mad_notice_attr_t notice; @@ -1693,14 +1702,18 @@ __osm_state_mgr_report_new_ports( ib_get_err_str( status ) ); } osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); + if (p_port->p_node) + { + memcpy(desc, p_port->p_node->node_desc.description, + IB_NODE_DESCRIPTION_SIZE); + desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; + } osm_log( p_mgr->p_log, OSM_LOG_INFO, "Discovered new port with GUID:0x%016" PRIx64 " LID range [0x%X,0x%X] of node:%s\n", cl_ntoh64( port_gid.unicast.interface_id ), min_lid_ho, max_lid_ho, - ( p_port->p_node ? - ( char * )( p_port->p_node->node_desc.description ) : - "UNKNOWN" ) ); + p_port->p_node ? desc : "UNKNOWN" ); p_port = ( osm_port_t -- 1.4.3.2.g4bf7 From halr at voltaire.com Sat Oct 28 13:28:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Oct 2006 16:28:03 -0400 Subject: [openib-general] [PATCH] diags: fix compilation warning with gcc-4.1.1 In-Reply-To: <20061023215303.GB6311@sashak.voltaire.com> References: <20061023215303.GB6311@sashak.voltaire.com> Message-ID: <1162067220.22403.326779.camel@hal.voltaire.com> On Mon, 2006-10-23 at 17:53, Sasha Khapyorsky wrote: > This fixes 'differ in signedness pointer' compilation warnings with > gcc-4.1.1 . > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Sat Oct 28 13:41:32 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Oct 2006 16:41:32 -0400 Subject: [openib-general] [PATCH] osm: Trivial fix in osmtest In-Reply-To: <453E37EF.8070900@dev.mellanox.co.il> References: <453E37EF.8070900@dev.mellanox.co.il> Message-ID: <1162068056.22403.327206.camel@hal.voltaire.com> On Tue, 2006-10-24 at 11:57, Yevgeny Kliteynik wrote: > Fixing signed/unsigned data types problem (discovered on Windows) > > -- > Yevgeny > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From muli at il.ibm.com Sat Oct 28 14:33:02 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Sat, 28 Oct 2006 23:33:02 +0200 Subject: [openib-general] problem with 2.6.19? In-Reply-To: References: <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> <20061028102057.GG4868@rhun.haifa.ibm.com> Message-ID: <20061028213302.GK4868@rhun.haifa.ibm.com> On Sat, Oct 28, 2006 at 01:10:54PM -0700, Roland Dreier wrote: > I guess the requirement for userspace RDMA to work is that no > further action is required after the dma_map_sg() for both the > device and the CPU to touch the region. Because userspace has no > way for calling dma_sync_xxx or anything like that. Ok, if there's no requirement on the address (i.e., the DMA address does not need to be the same as the machine physical address) then Calgary should be fine. By the way, by userspace DMA do you mean DMA to userspace buffers, or DMA initiated by userspace? I'm assuming the former, but if it's the latter, how are the addresses returned from dma_map_sg communicated to userspace? Cheers, Muli From muli at il.ibm.com Sat Oct 28 14:33:57 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Sat, 28 Oct 2006 23:33:57 +0200 Subject: [openib-general] [PATCH v2] osmtest: fix node_desc.description as string usages In-Reply-To: <20061028203340.GK11988@sashak.voltaire.com> References: <20061028200307.GE11988@sashak.voltaire.com> <20061028203340.GK11988@sashak.voltaire.com> Message-ID: <20061028213357.GL4868@rhun.haifa.ibm.com> On Sat, Oct 28, 2006 at 10:33:40PM +0200, Sasha Khapyorsky wrote: > > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like printf() leads to segmentation faults. This patch fixes > such usages. > > Signed-off-by: Sasha Khapyorsky Looks good, Acked-by: Muli Ben-Yehuda Cheers, Muli From sashak at voltaire.com Sat Oct 28 17:09:04 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 29 Oct 2006 02:09:04 +0200 Subject: [openib-general] [PATCH TRIVIAL] opensm: remove non-referenced include/opensm/osm_mcmember.h Message-ID: <20061029000904.GA13537@sashak.voltaire.com> This removes non-referenced include/opensm/osm_mcmember.h header file. Signed-off-by: Sasha Khapyorsky --- osm/include/Makefile.am | 1 - osm/include/opensm/osm_mcmember.h | 160 ------------------------------------- 2 files changed, 0 insertions(+), 161 deletions(-) diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index 3dca624..c7048d1 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -19,7 +19,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_sm_state_mgr.h \ $(srcdir)/opensm/osm_state_mgr.h \ $(srcdir)/opensm/osm_rand_fwd_tbl.h \ - $(srcdir)/opensm/osm_mcmember.h \ $(srcdir)/opensm/osm_sa_vlarb_record.h \ $(srcdir)/opensm/osm_madw.h \ $(srcdir)/opensm/osm_sa_sminfo_record_ctrl.h \ diff --git a/osm/include/opensm/osm_mcmember.h b/osm/include/opensm/osm_mcmember.h deleted file mode 100644 index 3bf3baa..0000000 --- a/osm/include/opensm/osm_mcmember.h +++ /dev/null @@ -1,160 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - -/* - * Abstract: - * Declaration of osm_mcmember_t. - * This object represents an IBA mcmember. - * This object is part of the OpenSM family of objects. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.4 $ - */ - -#ifndef _OSM_MCMEMBER_H_ -#define _OSM_MCMEMBER_H_ - -#include -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* OpenSM/MCMember -* NAME -* MCMember -* -* DESCRIPTION -* The MCMember object encapsulates the information needed by the -* OpenSM to manage mcmembers. The OpenSM allocates one MCMember object -* per mcmember in the IBA subnet. -* -* The MCMember object is not thread safe, thus callers must provide -* serialization. -* -* This object should be treated as opaque and should be -* manipulated only through the provided functions. -* -* AUTHOR -* Ranjit Pandit, Intel -* -*********/ -/****s* OpenSM: MCMember/osm_mcmember_t -* NAME -* osm_mcmember_t -* -* DESCRIPTION -* MCMember structure. -* -* This object should be treated as opaque and should -* be manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _osm_mcmember -{ - cl_map_item_t map_item; - ib_member_rec_t mcmember_rec; - cl_qlist_t mcmember_port_list; - -} osm_mcmember_t; - -/* -* FIELDS -* map_item -* Linkage structure for cl_qmap. MUST BE FIRST MEMBER! -* -* mcmember_rec -* The IBA defined MCMemberRecord data for this mcmember. -* -* member_port_list -* List of Port specific information for each port that in -* member of a MultiCast Group. -* -* SEE ALSO -* MCMember object -*********/ - -/****s* OpenSM: MCMember/osm_mcmember_t -* NAME -* osm_mcmember_t -* -* DESCRIPTION -* MCMember structure. -* -* This object should be treated as opaque and should -* be manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _osm_mcmember_port -{ - cl_list_item_t list_item; - ib_gid_t port_gid; - uint8_t scope_state; - -} osm_mcmember_port_t; - -/* -* FIELDS -* list_item -* DESCRIPTION??? -* -* port_gid -* DESCRIPTION??? -* -* scope_state -* description??? -* -* SEE ALSO -* MCMember object -*********/ - -END_C_DECLS - -#endif /* _OSM_MCMEMBER_H_ */ -- 1.4.3.2.g4bf7 From ogerlitz at voltaire.com Sun Oct 29 02:07:45 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 29 Oct 2006 12:07:45 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <4540CA0E.9020807@voltaire.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> Message-ID: <45447D71.40405@voltaire.com> Or Gerlitz wrote: > Sean Hefty wrote: >> Updates the librdmacm to work with ABI version 3, which is the proposed >> kernel changes for inclusion in 2.6.20. >> Test programs are also updated. > OK, Sean, i have one system up and running, with kernel based on > Roland's git plus patches 1-7 and user space based on the svn with the > librdmacm patch. Will clone this config on Sunday such that i can > actually run mckey and see it working. Thanks a lot for putting > everything together... rping works fine in this environment, mckey not yet. From the switch logs i see that both processes/ports (sender, receiver) joined the same/correct mgid and from prints i have added to mckey i see that both use the same mdlid. however, the receiver does not get completions from its cq. I will further investigate this and let you know. Or. From tziporet at dev.mellanox.co.il Sun Oct 29 02:44:02 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 29 Oct 2006 12:44:02 +0200 Subject: [openib-general] APM support in openib stack In-Reply-To: <454132E8.3080008@3leafnetworks.com> References: <454132E8.3080008@3leafnetworks.com> Message-ID: <454485F2.3010204@dev.mellanox.co.il> Venkatesh Babu wrote: > Any comments on the issue described in the following email ? > > It doesn't look like a firmware problem. I had got the APM working on > the same Mellanox HCA cards with IBGD 1.8.2 stack. With OFED 1.0 stack I > am getting the following problem. I guess it is some problem in > initializing the timers to the firmware. > > VBabu > > > Can you try OFED 1.1? Thanks Tziporet From tziporet at dev.mellanox.co.il Sun Oct 29 02:52:08 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 29 Oct 2006 12:52:08 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: References: <453F831C.4010204@dev.mellanox.co.il> Message-ID: <454487D8.5080103@dev.mellanox.co.il> Roland Dreier wrote: > > I want to suggest that you will create releases to the libraries you own > > To make this simpler, is there any way we can give maintainers the > ability to put library releases somewhere on the new server so that > they show up on the downloads page automatically? Right now it is > somewhat cumbersome to create library releases, since the poor > sysadmins have to manually add tarballs to the downloads page. > > - R. > Hi Matt, I think Roland's suggestion is very good. Since Hal and Sean also agreed to create a release of their libraries it can serve all. Thanks, Tziporet From monis at voltaire.com Sun Oct 29 05:26:26 2006 From: monis at voltaire.com (Moni Shoua) Date: Sun, 29 Oct 2006 15:26:26 +0200 Subject: [openib-general] OFED 1.1 Build Issue Message-ID: We managed to avoid rebuilding the kernel to solve this issue. Before building any IB dependant modules (out of OFED) it is required to update the Module.symvers. The new values for the symbol CRCs can be taken from the modules themselves ( nm IB_MODULE |grep __crc_) When Module.symvers is up-to-date, there should be no problem building and installing the IB dependant modules. The solution step-by-step 1. The procedure should run after installing the kerne-ib-devel RPM. It is possible to run it in %pre section of the spec file. 2. Foreach IB module (ko) which is listed in $(rpm -ql kernel-ib) - 2.1 take out the __crc_ sybbols 2.2 extract the symbol name and it's CRC value (simple parsing) 2.3 add it (or replace the existing) to Module.symvers (usually under /lib/modules/$(uname -r)/build/ or /lib/modules/$(uname -r)/source/ ) 3. Save the diff of the current Module.symvers from the original (for future restore) 4. When kernel-ib-devel RPM is uninstalled use the patch from (3) to restore Module.symvers. This can be done in the %postun of the spec file) I'd be glad to get comments about this. -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tom Tucker Sent: Friday, October 27, 2006 5:30 PM To: openib-general Subject: [openib-general] OFED 1.1 Build Issue I've been testing some code against the OFED 1.1 release and noticed that if you build anything that depends on IB (RNFS in this case) into the kernel, that the OFED kit doesn't work correctly. This is because the dependent modules (ib_core, etc...) get sucked into the kernel automagically and will cause the subsequent modprobe of the OFED module to fail. I don't think you can fix this without rebuilding the kernel so it should probably be listed in the OFED_release_notes as a known issue. Providing a mechanism to rebuild the kernel as part of the OFED install would be great too, sorry if it's already there and I missed it. Tom _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Sun Oct 29 06:25:02 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 29 Oct 2006 16:25:02 +0200 Subject: [openib-general] [PATCH/RFC 2/2] IPoIB: Convert to NAPI In-Reply-To: References: <20061019202144.GC2674@mellanox.co.il> Message-ID: <20061029142502.GC2704@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH/RFC 2/2] IPoIB: Convert to NAPI > > Convert the IP-over-InfiniBand network device driver over to using > NAPI to handle all completions (both receive and send). > > Signed-off-by: Roland Dreier OK, I finally found some time to test this, and the results look good. I tested with iperf 2.0.2: server: iperf -s -w 256K client: iperf -w 256K -P The system has dual-CPU Xeon 3GHz with HT. Message size was left at default value (8K): Results: #of sockets : 1 2 4 8 16 NONAPI BW :276 297 304 306 303 NAPI BW :302 311 349 408 367 -- MST From dotanb at dev.mellanox.co.il Sun Oct 29 07:35:14 2006 From: dotanb at dev.mellanox.co.il (Dotan Barak) Date: Sun, 29 Oct 2006 17:35:14 +0200 Subject: [openib-general] APM support in openib stack In-Reply-To: <45429B74.8090607@3leafnetworks.com> References: <453F9A12.4020500@ichips.intel.com> <454127B0.8080109@veritas.com> <45429B74.8090607@3leafnetworks.com> Message-ID: <4544CA32.2070008@dev.mellanox.co.il> Hi. Venkatesh Babu wrote: > I don't think there is any event which says "path1 is back again". It > is the application which needs to load the alternate path. The HW just > sends an event IB_EVENT_PORT_ACTIVE when port comes up. Upon recipt of > the this event the application has to see if there exists a path from > this port to the remote node and then load this alternate path by > sending the APR message. > PS: In Gen1 implementation there was an event called IB_PATH_MIG_ARMED > which was generated by HW/FW after loading the alternate path by the > application. > > SA event notification is to just callback registered handlers when > IB_EVENT_PORT_ACTIVE event occurrs on any node in the subnet or on a > specific node according to the registeration parameters. > > VBabu > > somenath wrote: > > >> Sean, >> >> will there be a new API for SA event notification? >> today we already get this IB_EVENT_PATH_MIG (as defined below), will >> "path1 is back again" event >> be delivered the same way? >> >> thanks, som. >> >> enum ib_event_type { >> IB_EVENT_CQ_ERR, >> IB_EVENT_QP_FATAL, >> IB_EVENT_QP_REQ_ERR, >> IB_EVENT_QP_ACCESS_ERR, >> IB_EVENT_COMM_EST, >> IB_EVENT_SQ_DRAINED, >> IB_EVENT_PATH_MIG, >> IB_EVENT_PATH_MIG_ERR, >> IB_EVENT_DEVICE_FATAL, >> IB_EVENT_PORT_ACTIVE, >> IB_EVENT_PORT_ERR, >> IB_EVENT_LID_CHANGE, >> IB_EVENT_PKEY_CHANGE, >> IB_EVENT_SM_CHANGE, >> IB_EVENT_SRQ_ERR, >> IB_EVENT_SRQ_LIMIT_REACHED, >> IB_EVENT_QP_LAST_WQE_REACHED >> }; >> I checked the code of the file cm.c (if OFED 1.1) and the attribute alt_timeout is not mentioned anywhere in this code. I believe that the value of this attribute is set to zero, which means that the QP will wait infinite time to the answer (that will never come). Venkatesh, can you check this issue by querying the QP attributes after the path was migrated? I think that you will find that the value of the timeout attribute is zero. Sean, i don't familiar with the cm.c code, but i believe that the following patch will solve this issue: Index: last_stable/drivers/infiniband/core/cm.c =================================================================== --- last_stable.orig/drivers/infiniband/core/cm.c 2006-10-29 16:58:08.000000000 +0200 +++ last_stable/drivers/infiniband/core/cm.c 2006-10-29 17:31:57.000000000 +0200 @@ -3221,6 +3221,7 @@ static int cm_init_qp_rtr_attr(struct cm if (cm_id_priv->alt_av.ah_attr.dlid) { *qp_attr_mask |= IB_QP_ALT_PATH; qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; + qp_attr->alt_timeout = cm_id_priv->alt_av.packet_life_time; qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; } ret = 0; thanks Dotan From mlleinin at hpcn.ca.sandia.gov Sun Oct 29 10:58:39 2006 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Sun, 29 Oct 2006 10:58:39 -0800 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <454487D8.5080103@dev.mellanox.co.il> References: <453F831C.4010204@dev.mellanox.co.il> <454487D8.5080103@dev.mellanox.co.il> Message-ID: <1162148320.494.196.camel@localhost> On Sun, 2006-10-29 at 12:52 +0200, Tziporet Koren wrote: > Roland Dreier wrote: > > > I want to suggest that you will create releases to the libraries you own > > > > To make this simpler, is there any way we can give maintainers the > > ability to put library releases somewhere on the new server so that > > they show up on the downloads page automatically? Right now it is > > somewhat cumbersome to create library releases, since the poor > > sysadmins have to manually add tarballs to the downloads page. > > > > - R. > > > Hi Matt, > I think Roland's suggestion is very good. > Since Hal and Sean also agreed to create a release of their libraries it > can serve all. > I agree with Roland. Sandia will not be running the webpages or wiki on the new server. I think the market folks will run the webpages and the developers can run the wiki. Any preferences for what wiki to use? Trac (http://trac.edgewall.org/) was one suggestion. - Matt From sashak at voltaire.com Sun Oct 29 12:06:55 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 29 Oct 2006 22:06:55 +0200 Subject: [openib-general] [PATCH] opensm: osm_prtn.c: strip pkey value unconditionally Message-ID: <20061029200655.GD15774@sashak.voltaire.com> Then new partition creation is requested strip unconditionally membership bit before == 0 verification (when autogenerated pkey value is requested). Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_prtn.c | 10 ++-------- 1 files changed, 2 insertions(+), 8 deletions(-) diff --git a/osm/opensm/osm_prtn.c b/osm/opensm/osm_prtn.c index dbed3bc..ae0f6e0 100644 --- a/osm/opensm/osm_prtn.c +++ b/osm/opensm/osm_prtn.c @@ -268,17 +268,11 @@ osm_prtn_t *osm_prtn_make_new(osm_log_t { osm_prtn_t *p = NULL, *p_check; + pkey &= cl_hton16((uint16_t)~0x8000); + if (pkey == 0 && !(pkey = __generate_pkey(p_subn))) return NULL; - if (cl_ntoh16(pkey) & 0x8000) { - pkey &= cl_hton16((uint16_t)~0x8000); - osm_log(p_log, OSM_LOG_VERBOSE, - "osm_prtn_make_new: pkey stripped for" - " partition \'%s\' (0x%04x)\n", - name, cl_ntoh16(pkey)); - } - p = osm_prtn_new(name, pkey); if (!p) { osm_log(p_log, OSM_LOG_ERROR, -- 1.4.3.2.g4bf7 From mst at mellanox.co.il Sun Oct 29 12:12:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 29 Oct 2006 22:12:08 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <1162148320.494.196.camel@localhost> References: <453F831C.4010204@dev.mellanox.co.il> <454487D8.5080103@dev.mellanox.co.il> <1162148320.494.196.camel@localhost> Message-ID: <20061029201208.GB3346@mellanox.co.il> Quoting r. Matt Leininger : > Subject: Re: creating releases for the libraries you own > > On Sun, 2006-10-29 at 12:52 +0200, Tziporet Koren wrote: > > Roland Dreier wrote: > > > > I want to suggest that you will create releases to the libraries you own > > > > > > To make this simpler, is there any way we can give maintainers the > > > ability to put library releases somewhere on the new server so that > > > they show up on the downloads page automatically? Right now it is > > > somewhat cumbersome to create library releases, since the poor > > > sysadmins have to manually add tarballs to the downloads page. > > > > > > - R. > > > > > Hi Matt, > > I think Roland's suggestion is very good. > > Since Hal and Sean also agreed to create a release of their libraries it > > can serve all. > > > I agree with Roland. Sandia will not be running the webpages or wiki > on the new server. I think the market folks will run the webpages and > the developers can run the wiki. This becomes a problem when we want to e.g. upload a release or some document. Developers need a way to put stuff on the web too. How about allowing web server serve static pages from /pub/html and ~/html? Then anyone with an account can easily upload files by scp, and post a link. > Any preferences for what wiki to use? > Trac (http://trac.edgewall.org/) was one suggestion. For documentation, I'd like very much to have wiki that supports WYSIWYG editing. openib wiki is not WYSIWYG and I find it very painful, trac has this liitation too. There are lots of wikis that support WYSIWYG, e.g. MoinMoin is written in Python and is I think popular: http://moinmoin.wikiwikiweb.de/MoinMoinFeatures Please consider this feature: we won't get quality documentation in wiki unless writing it is easy. I just run the WYSIWYG requrirement through wikimatrix and got: http://www.wikimatrix.org/wizard.php?d%5Bpl%5D%5B%5D=&d%5Bfoss%5D=free&d%5Bstorage%5D=&d%5Bflag%5D=1&d%5Bwysiwyg%5D=yes&d%5Bhistory%5D=&d%5Bgo%5D=1 "Installable software with WYSIWYG editing and is Free and Open Source. The following 20 Wikis match your criteria: Corendal Wiki, Giki, IkeWiki, JSPWiki, KeheiWiki, MediaWiki, Midgard Wiki, MoinMoin, MoniWiki, Oddmuse, Perspective, PhpWiki, PmWiki, PukiWiki, SnipSnap, TiddlyWiki, TWiki, WackoWiki, XWiki and Zwiki" -- MST From n_shiraz2001 at rediffmail.com Mon Oct 30 00:16:10 2006 From: n_shiraz2001 at rediffmail.com (shiraz n) Date: 30 Oct 2006 08:16:10 -0000 Subject: [openib-general] problem while running chears runs Message-ID: <20061030081610.20412.qmail@webmail87.rediffmail.com> Hi guys , I am getting the following error while running Chears runs on my cluster .. Can any of you tell me why this is happening .. Thanks and RegardsShiraz [node021:12] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node010:23] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node028:5] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node023:10] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node009:24] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node022:11] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node031:2] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c [node030:3] Fatal Error: Unexpected disconnect received from [node003:30] at line 1698 in file vapiutil.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Oct 30 03:19:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 06:19:11 -0500 Subject: [openib-general] [PATCH TRIVIAL] opensm: remove non-referenced include/opensm/osm_mcmember.h In-Reply-To: <20061029000904.GA13537@sashak.voltaire.com> References: <20061029000904.GA13537@sashak.voltaire.com> Message-ID: <1162207141.15895.61766.camel@hal.voltaire.com> On Sat, 2006-10-28 at 20:09, Sasha Khapyorsky wrote: > This removes non-referenced include/opensm/osm_mcmember.h header file. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 30 03:27:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 06:27:03 -0500 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061028200425.GF11988@sashak.voltaire.com> References: <20061028200425.GF11988@sashak.voltaire.com> Message-ID: <1162207615.15895.62084.camel@hal.voltaire.com> On Sat, 2006-10-28 at 16:04, Sasha Khapyorsky wrote: > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like strcmp() or printf() leads to segmentation faults. > This patch fixes such usages. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 30 03:39:37 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 06:39:37 -0500 Subject: [openib-general] [PATCH] opensm: osm_prtn.c: strip pkey value unconditionally In-Reply-To: <20061029200655.GD15774@sashak.voltaire.com> References: <20061029200655.GD15774@sashak.voltaire.com> Message-ID: <1162208338.15895.62538.camel@hal.voltaire.com> On Sun, 2006-10-29 at 15:06, Sasha Khapyorsky wrote: > Then new partition creation is requested strip unconditionally > membership bit before == 0 verification (when autogenerated pkey value > is requested). > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From mst at mellanox.co.il Mon Oct 30 03:44:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 13:44:54 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061028200425.GF11988@sashak.voltaire.com> References: <20061028200425.GF11988@sashak.voltaire.com> Message-ID: <20061030114453.GA396@mellanox.co.il> Quoting r. Sasha Khapyorsky : > Subject: [PATCH] diags/saquery: fix node_desc.description as string usages > > > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like strcmp() or printf() leads to segmentation faults. > This patch fixes such usages. > > Signed-off-by: Sasha Khapyorsky > --- > diags/src/saquery.c | 22 ++++++++++++++++------ > 1 files changed, 16 insertions(+), 6 deletions(-) > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > index 5b4a85e..f5b23fd 100644 > --- a/diags/src/saquery.c > +++ b/diags/src/saquery.c > @@ -90,17 +90,21 @@ static void > print_node_desc(ib_node_record_t *node_record) > { > ib_node_info_t *p_ni = &(node_record->node_info); > + ib_node_desc_t *p_nd = &(node_record->node_desc); > if (p_ni->node_type == IB_NODE_TYPE_CA) > { > + char desc[sizeof(p_nd->description) + 1]; > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > + desc[sizeof(desc) - 1] = '\0'; > printf("%6d \"%s\"\n", > - cl_ntoh16(node_record->lid), > - node_record->node_desc.description); > + cl_ntoh16(node_record->lid), desc); > } > } Would it not be simpler, and cleaner, to limit the string width in printf: printf("%6d \"%.*s\"\n", cl_ntoh16(node_record->lid), sizeof(desc), node_record->node_desc.description); -- MST From halr at voltaire.com Mon Oct 30 03:44:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 06:44:47 -0500 Subject: [openib-general] [PATCH TRIVIAL] opensm: net to host conversion for printing In-Reply-To: <20061028195727.GA11988@sashak.voltaire.com> References: <20061028195727.GA11988@sashak.voltaire.com> Message-ID: <1162208677.15895.62765.camel@hal.voltaire.com> On Sat, 2006-10-28 at 15:57, Sasha Khapyorsky wrote: > This converts guid value to host byte order before printing. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 30 03:53:34 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 06:53:34 -0500 Subject: [openib-general] [PATCH TRIVIAL] opensm: indentation fixes In-Reply-To: <20061028200626.GG11988@sashak.voltaire.com> References: <20061028200626.GG11988@sashak.voltaire.com> Message-ID: <1162209205.15895.63134.camel@hal.voltaire.com> On Sat, 2006-10-28 at 16:06, Sasha Khapyorsky wrote: > Some trivial indentation fixes. > > Signed-off-by: Sasha Khapyorsky > --- Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 30 04:16:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 07:16:31 -0500 Subject: [openib-general] [PATCH v2] opensm: fix node_desc.description as string usages In-Reply-To: <20061028204103.GL11988@sashak.voltaire.com> References: <20061028200053.GC11988@sashak.voltaire.com> <20061028204103.GL11988@sashak.voltaire.com> Message-ID: <1162210576.15895.64038.camel@hal.voltaire.com> On Sat, 2006-10-28 at 16:41, Sasha Khapyorsky wrote: > Hmm, in one place there was the same copy-paste error as in the osmtest > patch. Resend this one too... > > Sasha > > > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like printf() leads to segmentation faults. This patch fixes > such usages. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 30 04:22:58 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 07:22:58 -0500 Subject: [openib-general] [PATCH v2] osmtest: fix node_desc.description as string usages In-Reply-To: <20061028203340.GK11988@sashak.voltaire.com> References: <20061028200307.GE11988@sashak.voltaire.com> <20061028203340.GK11988@sashak.voltaire.com> Message-ID: <1162210937.15895.64266.camel@hal.voltaire.com> On Sat, 2006-10-28 at 16:33, Sasha Khapyorsky wrote: > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like printf() leads to segmentation faults. This patch fixes > such usages. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From ogerlitz at voltaire.com Mon Oct 30 05:06:45 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 30 Oct 2006 15:06:45 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <45447D71.40405@voltaire.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> Message-ID: <4545F8E5.2000003@voltaire.com> Or Gerlitz wrote: > Or Gerlitz wrote: >> OK, Sean, i have one system up and running, with kernel based on >> Roland's git plus patches 1-7 and user space based on the svn with the >> librdmacm patch. Will clone this config on Sunday such that i can >> actually run mckey and see it working. Thanks a lot for putting >> everything together... > > rping works fine in this environment, mckey not yet. From the switch > logs i see that both processes/ports (sender, receiver) joined the > same/correct mgid and from prints i have added to mckey i see that both > use the same mdlid. however, the receiver does not get completions from > its cq. I will further investigate this and let you know. Sean, One of the systems kernel is actually 2.6.19-rc3 and patches 1-7 (ie not roland's tree) and i see there some issues also with ip multicast over ipoib. I will move to use the same kernel config (roland's tree and patches 1-7), then test ipoib and only then mckey, will let you know. Will you have the chance to test ipoib multicast and mckey over this config at your environment? Or. From moshek at voltaire.com Mon Oct 30 05:59:34 2006 From: moshek at voltaire.com (Moshe Kazir) Date: Mon, 30 Oct 2006 15:59:34 +0200 Subject: [openib-general] mstflint error on ppc64 Message-ID: Hi Michael, The output of mstflint is changed on ppc64 as result of byte ordering issues. If you take a HCA that was burned using x86_64 or Mellanox manufacturing and perform mstflint -d ... q on ppc64 you'll find that the value of PSID VSD and Board Id was changed. I tried to look at the code to find the error, but then I saw that vsd is defined twice in the code according to it's usage (char[205], or unsigned int[52] ) Can you please look and help ? Best regards, Moshe ____________________________________________________________ Moshe Katzir | +972-9971-8639 (o) | +972-52-860-6042 (m) Voltaire - The Grid Backbone www.voltaire.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Oct 30 06:07:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 09:07:49 -0500 Subject: [openib-general] [PATCH 1/5] opensm: build_lid_matrices() routing engine method In-Reply-To: <11612901362947-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> <11612901362947-git-send-email-sashak@voltaire.com> Message-ID: <1162217244.15895.68427.camel@hal.voltaire.com> On Thu, 2006-10-19 at 16:35, Sasha Khapyorsky wrote: > This adds new method named build_lid_matrices() to OpenSM routing engine > structure. When defined this method will be used by ucast_mgr_process() > for switch min hop tables (aka lid matrices) preparation. In case of > failure default lid matrix creation algorithm will be used. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From jlentini at netapp.com Mon Oct 30 06:25:52 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 30 Oct 2006 09:25:52 -0500 (EST) Subject: [openib-general] OFED 1.1 Build Issue In-Reply-To: <15ddcffd0610272205n347c0f32k9c5e054b6611b795@mail.gmail.com> References: <1161963012.2748.29.camel@trinity.ogc.int> <15ddcffd0610272205n347c0f32k9c5e054b6611b795@mail.gmail.com> Message-ID: On Sat, 28 Oct 2006, Or Gerlitz wrote: > What's RNFS? NFS-RDMA. From mst at mellanox.co.il Mon Oct 30 06:31:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 16:31:52 +0200 Subject: [openib-general] [PATCH] IB/mthca: fix MAD extended header format Message-ID: <20061030143152.GB1941@mellanox.co.il> Several fiels in an incoming MAD extended info header were passed at incorrect offsets (mostly off by 4 bytes). As the result, the HCA will fail to generate traps in which this info is needed (e.g. traps which include the GRH of the incoming packet), in violation of the IB spec. Signed-off-by: Michael S. Tsirkin --- Roland, the offsets look wrong. Am I missing something? Please review vs the spec, and if correct consider for 2.6.19. diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 99a94d7..768df72 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -1820,11 +1820,11 @@ int mthca_MAD_IFC(struct mthca_dev *dev, #define MAD_IFC_BOX_SIZE 0x400 #define MAD_IFC_MY_QPN_OFFSET 0x100 -#define MAD_IFC_RQPN_OFFSET 0x104 -#define MAD_IFC_SL_OFFSET 0x108 -#define MAD_IFC_G_PATH_OFFSET 0x109 -#define MAD_IFC_RLID_OFFSET 0x10a -#define MAD_IFC_PKEY_OFFSET 0x10e +#define MAD_IFC_RQPN_OFFSET 0x108 +#define MAD_IFC_SL_OFFSET 0x10c +#define MAD_IFC_G_PATH_OFFSET 0x10d +#define MAD_IFC_RLID_OFFSET 0x10e +#define MAD_IFC_PKEY_OFFSET 0x112 #define MAD_IFC_GRH_OFFSET 0x140 inmailbox = mthca_alloc_mailbox(dev, GFP_KERNEL); @@ -1862,7 +1862,7 @@ #define MAD_IFC_GRH_OFFSET 0x140 val = in_wc->dlid_path_bits | (in_wc->wc_flags & IB_WC_GRH ? 0x80 : 0); - MTHCA_PUT(inbox, val, MAD_IFC_GRH_OFFSET); + MTHCA_PUT(inbox, val, MAD_IFC_G_PATH_OFFSET); MTHCA_PUT(inbox, in_wc->slid, MAD_IFC_RLID_OFFSET); MTHCA_PUT(inbox, in_wc->pkey_index, MAD_IFC_PKEY_OFFSET); @@ -1870,7 +1870,7 @@ #define MAD_IFC_GRH_OFFSET 0x140 if (in_grh) memcpy(inbox + MAD_IFC_GRH_OFFSET, in_grh, 40); - op_modifier |= 0x10; + op_modifier |= 0x4; in_modifier |= in_wc->slid << 16; } -- MST From halr at voltaire.com Mon Oct 30 06:49:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 09:49:39 -0500 Subject: [openib-general] [PATCH 2/5] opensm: ucast_mgr dumper unification In-Reply-To: <116129014671-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> <116129014671-git-send-email-sashak@voltaire.com> Message-ID: <1162219773.15895.70143.camel@hal.voltaire.com> On Thu, 2006-10-19 at 16:35, Sasha Khapyorsky wrote: > This unifies ucsat_mgr dumper. Main goal is to provide infrastructure > for different dump file generation using the same routines. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon Oct 30 06:55:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 09:55:29 -0500 Subject: [openib-general] [PATCH 3/5] opensm: lid matrix dump In-Reply-To: <1161290156640-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> <1161290156640-git-send-email-sashak@voltaire.com> Message-ID: <1162220122.15895.70326.camel@hal.voltaire.com> On Thu, 2006-10-19 at 16:35, Sasha Khapyorsky wrote: > This adds dumping switches lid matrices to the file > 'opensm-lid-matrix.dump'. Like other routing related dumps this code > will be activated when OSM_LOG_ROUTING logging flag is set. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From tziporet at mellanox.co.il Mon Oct 30 07:05:40 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 30 Oct 2006 17:05:40 +0200 Subject: [openib-general] [openfabrics-ewg] staging.openfabrics.org now functional Message-ID: <6C2C79E72C305246B504CBA17B5500C93E35E3@mtlexch01.mtl.com> Hi Johann, When accessing this page there is a login form What is this used for? I also see it's not https? Tziporet -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Johann George Sent: Thursday, October 26, 2006 3:28 AM To: Open Fabrics; openib-general at openib.org Subject: [openfabrics-ewg] staging.openfabrics.org now functional You can now reference the new OpenFabrics server using staging.openfabrics.org Johann _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From johann.george at qlogic.com Mon Oct 30 07:22:04 2006 From: johann.george at qlogic.com (Johann George) Date: Mon, 30 Oct 2006 07:22:04 -0800 Subject: [openib-general] [openfabrics-ewg] staging.openfabrics.org now functional In-Reply-To: <6C2C79E72C305246B504CBA17B5500C93E35E3@mtlexch01.mtl.com> References: <6C2C79E72C305246B504CBA17B5500C93E35E3@mtlexch01.mtl.com> Message-ID: <20061030152204.GA3514@cuprite.pathscale.com> Hello Tziporet, > When accessing this page there is a login form What is this used for? The package drupal was installed as an experiment to serve up web pages. I am barely familiar with it but gather that it allows users to login and maintain portions of the site. There is more information on it at http://drupal.org. We plan to have a conversation with the Marketing Working Group to see if this package or some other one best suits their needs. Johann From tziporet at dev.mellanox.co.il Mon Oct 30 07:25:09 2006 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 30 Oct 2006 17:25:09 +0200 Subject: [openib-general] convention for git directories on the new git OFA server Message-ID: <45461955.70803@dev.mellanox.co.il> Hi all, I suggest we use the following convention for all our git trees in the OFA server: 1. Development trees: each one will place the development tree under his/her home: ~/scm/topic.git (e.g. ~mst/scm/sdp.git) 2. Stable/release trees: these trees will be located under /pub: /pub/scm/.git (e.g. /pub/scm/ofed-1.1.git) 3. All trees (development and stable) should be exposed via git web interface. I think this methodology is similar to the way git trees are handled in Linux, and its also a simple convention that will make it easy to understand the type of each tree Tziporet From rdreier at cisco.com Mon Oct 30 07:42:19 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 07:42:19 -0800 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <20061029201208.GB3346@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 29 Oct 2006 22:12:08 +0200") References: <453F831C.4010204@dev.mellanox.co.il> <454487D8.5080103@dev.mellanox.co.il> <1162148320.494.196.camel@localhost> <20061029201208.GB3346@mellanox.co.il> Message-ID: > How about allowing web server serve static pages from > /pub/html and ~/html? Then anyone with an account > can easily upload files by scp, and post a link. I think that solves the wrong part of the problem. It's trivial for me to find places to host stuff. The issue is having a place for "official" releases that can be updated directly by the developers making releases -- in particular, I would like to have release links show up on http://openfabrics.org/downloads automatically. One way to handle this would be to point the downloads page to a wiki page and have developers edit it as the make releases. > For documentation, I'd like very much to have wiki that supports WYSIWYG > editing. openib wiki is not WYSIWYG and I find it very painful, trac has this > liitation too. There are lots of wikis that support WYSIWYG, e.g. MoinMoin is > written in Python and is I think popular: > http://moinmoin.wikiwikiweb.de/MoinMoinFeatures I don't think WYSIWYG is particularly important. For example wikipedia uses the non-WYSIWYG mediawiki and they seem to do just fine. I'm not sure how well Trac will work for us, but those sorts of feature (bug tracking integration, etc) are much more interesting to me. - R. From rdreier at cisco.com Mon Oct 30 07:44:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 07:44:46 -0800 Subject: [openib-general] [PATCH] IB/mthca: fix MAD extended header format In-Reply-To: <20061030143152.GB1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 16:31:52 +0200") References: <20061030143152.GB1941@mellanox.co.il> Message-ID: > Roland, the offsets look wrong. Am I missing something? > Please review vs the spec, and if correct consider for 2.6.19. I just applied your original patch without checking it carefully :) Yes, the offsets in your patch look like they match the PRM now. However: > +#define MAD_IFC_RLID_OFFSET 0x10e I don't see anything about RLID in either the latest Tavor or Arbel specs -- they say the RLID goes in the input modifier (as the current mthca code also does). So is putting the RLID into the mailbox needed? Or should we follow the docs? - R. From rdreier at cisco.com Mon Oct 30 07:47:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 07:47:36 -0800 Subject: [openib-general] [openfabrics-ewg] convention for git directories on the new git OFA server In-Reply-To: <45461955.70803@dev.mellanox.co.il> (Tziporet Koren's message of "Mon, 30 Oct 2006 17:25:09 +0200") References: <45461955.70803@dev.mellanox.co.il> Message-ID: > 1. Development trees: each one will place the development tree under > his/her home: > ~/scm/topic.git (e.g. ~mst/scm/sdp.git) > 2. Stable/release trees: these trees will be located under /pub: > /pub/scm/.git (e.g. /pub/scm/ofed-1.1.git) > 3. All trees (development and stable) should be exposed via git web > interface. > > I think this methodology is similar to the way git trees are > handled in Linux, and its also a simple convention that will make > it easy to understand the type of each tree If you mean the way git trees are handled on kernel.org, that's not true. There is only one class of git tree -- kernel.org happens to use /pub/scm but ~/scm/ is perfectly fine. But I don't see a reason to have two places to put git trees -- let's just pick one. I think trying to make a distinction between stable and development git trees is a mistake, since a single tree can contain many branches, some of which might be "stable" and some of which might be "development". In fact I think stability is a matter of degree so it may not even be possible to agree on whether a given branch is stable or not. - R. From rdreier at cisco.com Mon Oct 30 07:51:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 07:51:49 -0800 Subject: [openib-general] [PATCH/RFC 2/2] IPoIB: Convert to NAPI In-Reply-To: <20061029142502.GC2704@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 29 Oct 2006 16:25:02 +0200") References: <20061019202144.GC2674@mellanox.co.il> <20061029142502.GC2704@mellanox.co.il> Message-ID: > #of sockets : 1 2 4 8 16 > NONAPI BW :276 297 304 306 303 > NAPI BW :302 311 349 408 367 OK, the NAPI numbers are > than the NONAPI numbers, which I guess is good :) What are the units? - R. From rdreier at cisco.com Mon Oct 30 07:52:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 07:52:42 -0800 Subject: [openib-general] problem with 2.6.19? In-Reply-To: <20061028213302.GK4868@rhun.haifa.ibm.com> (Muli Ben-Yehuda's message of "Sat, 28 Oct 2006 23:33:02 +0200") References: <1161960198.14333.16.camel@stevo-desktop> <1161964101.14333.36.camel@stevo-desktop> <1161964388.14333.38.camel@stevo-desktop> <1161967455.14333.45.camel@stevo-desktop> <1161972601.14333.50.camel@stevo-desktop> <20061028102057.GG4868@rhun.haifa.ibm.com> <20061028213302.GK4868@rhun.haifa.ibm.com> Message-ID: > By the way, by userspace DMA do you mean DMA to userspace buffers, or > DMA initiated by userspace? I'm assuming the former, but if it's the > latter, how are the addresses returned from dma_map_sg communicated to > userspace? Actually, both. Userspace deals in virtual addresses (it asks the kernel to register memory in advance, and the kernel loads the IB device with the virtual -> bus mapping). - R. From halr at voltaire.com Mon Oct 30 07:58:19 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 10:58:19 -0500 Subject: [openib-general] [PATCH 5/5] opensm: dump_lfts.sh compatible dumper for OpenSM In-Reply-To: <11612901763767-git-send-email-sashak@voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> <11612901763767-git-send-email-sashak@voltaire.com> Message-ID: <1162223800.15895.72751.camel@hal.voltaire.com> On Thu, 2006-10-19 at 16:35, Sasha Khapyorsky wrote: > This is bonus - switch forwarding tables dump compatible with output > produced by dump_lfts.sh and which can be used as input for unicast > forwarding tables loader (with -R 'file' -U ). The dump file > name is 'opensm-lfts.sh' and will be generate if OSM_LOG_ROUTING > logging flag is set. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From mst at mellanox.co.il Mon Oct 30 08:03:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 18:03:46 +0200 Subject: [openib-general] [PATCH] IB/mthca: fix MAD extended header format In-Reply-To: References: Message-ID: <20061030160346.GH1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] IB/mthca: fix MAD extended header format > > > Roland, the offsets look wrong. Am I missing something? > > Please review vs the spec, and if correct consider for 2.6.19. > > I just applied your original patch without checking it carefully :) > Yes, the offsets in your patch look like they match the PRM now. > However: > > > +#define MAD_IFC_RLID_OFFSET 0x10e > > I don't see anything about RLID in either the latest Tavor or Arbel > specs -- they say the RLID goes in the input modifier (as the current > mthca code also does). So is putting the RLID into the mailbox > needed? Or should we follow the docs? Old PRM versions (1.01) used to have rlid there - that's why it was in original coed and I just kept it. It kind of looked safest to keep it around, just in case there's some old firmware that wants it there. Most likely we can drop this - worst case user will need to upgrade the firware. You decide. -- MST From mst at mellanox.co.il Mon Oct 30 08:05:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 18:05:01 +0200 Subject: [openib-general] [PATCH/RFC 2/2] IPoIB: Convert to NAPI In-Reply-To: References: Message-ID: <20061030160501.GI1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH/RFC 2/2] IPoIB: Convert to NAPI > > > #of sockets : 1 2 4 8 16 > > NONAPI BW :276 297 304 306 303 > > NAPI BW :302 311 349 408 367 > > OK, the NAPI numbers are > than the NONAPI numbers, which I guess is > good :) What are the units? Oh, didn't I tell? :) Megabyte/sec. -- MST From halr at voltaire.com Mon Oct 30 08:14:20 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 11:14:20 -0500 Subject: [openib-general] [PATCH] opensm: osm_ucast_mgr.c: fix node_desc.description as string usages [was: [PATCH 2/5] opensm: ucast_mgr dumper unification] In-Reply-To: <20061028201750.GH11988@sashak.voltaire.com> References: <11612901253393-git-send-email-sashak@voltaire.com> <116129014671-git-send-email-sashak@voltaire.com> <20061028201750.GH11988@sashak.voltaire.com> Message-ID: <1162224838.15895.73296.camel@hal.voltaire.com> On Sat, 2006-10-28 at 16:17, Sasha Khapyorsky wrote: > On 22:35 Thu 19 Oct , Sasha Khapyorsky wrote: > > This unifies ucsat_mgr dumper. Main goal is to provide infrastructure > > for different dump file generation using the same routines. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > osm/opensm/osm_ucast_mgr.c | 104 +++++++++++++++++++++++--------------------- > > 1 files changed, 55 insertions(+), 49 deletions(-) > > > > And there is incremental patch already: > > > node_desc.description buffer is received from the network and should > not be NULL-terminated. In such cases using it as regular string in > functions like printf() leads to segmentation faults. This patch fixes > such usages. This was in new lft dumper too. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From mst at mellanox.co.il Mon Oct 30 08:20:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 18:20:28 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: References: Message-ID: <20061030162028.GJ1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: creating releases for the libraries you own > > > How about allowing web server serve static pages from > > /pub/html and ~/html? Then anyone with an account > > can easily upload files by scp, and post a link. > > I think that solves the wrong part of the problem. This depends I guess. Mainly, the idea was to avoid the need to put binaries under svn as we were forced to with the old server. > It's trivial for me to find places to host stuff. As a user, I feel more confident that I'm getting the right package if it comes from the right URL. > The issue is having a place for "official" releases that can be updated > directly by the developers making releases -- in particular, I would like to > have release links show up on http://openfabrics.org/downloads automatically. I guess release links are also important. But I want mirroring etc to work which requires that uploaded packages are organized in some sane hierarchy. I guess what I want is a directory per package where I can go and see the latest release packages kind of like http://www.kernel.org/pub/software/ > One way to handle this would be to point the downloads page to a wiki > page and have developers edit it as the make releases. Right, but consider that longterm we will need to add checksums and possibly sign packages, etc. Doing all this manually through wiki just to publish an RC will be a hassle, and error prone. -- MST From rdreier at cisco.com Mon Oct 30 08:43:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 08:43:55 -0800 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <20061030162028.GJ1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 18:20:28 +0200") References: <20061030162028.GJ1941@mellanox.co.il> Message-ID: > Right, but consider that longterm we will need to add checksums > and possibly sign packages, etc. Doing all this manually > through wiki just to publish an RC will be a hassle, and error prone. Yes, I agree. That's why I said that just having ~user/html or whatever isn't very interesting. Ideally we would have a really automatic way of publishing releases. However as a short-term hack we could convert the downloads page to a wiki page. - R. From mst at mellanox.co.il Mon Oct 30 08:57:21 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 18:57:21 +0200 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: References: <45461955.70803@dev.mellanox.co.il> Message-ID: <20061030165721.GL1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: convention for git directories on the new git OFA server > > > 1. Development trees: each one will place the development tree under > > his/her home: > > ~/scm/topic.git (e.g. ~mst/scm/sdp.git) > > 2. Stable/release trees: these trees will be located under /pub: > > /pub/scm/.git (e.g. /pub/scm/ofed-1.1.git) > > 3. All trees (development and stable) should be exposed via git web > > interface. > > > > I think this methodology is similar to the way git trees are > > handled in Linux, and its also a simple convention that will make > > it easy to understand the type of each tree > > If you mean the way git trees are handled on kernel.org, that's not > true. There is only one class of git tree -- kernel.org happens to > use /pub/scm but ~/scm/ is perfectly fine. Yes, but: Linus's tree is under git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git E.g. stable tree is under git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.18.y.gi So I do see the analogy: development trees are under developer's name, stable trees are under stable/version. No? -- MST From rdreier at cisco.com Mon Oct 30 09:07:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 09:07:35 -0800 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: <20061030165721.GL1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 18:57:21 +0200") References: <45461955.70803@dev.mellanox.co.il> <20061030165721.GL1941@mellanox.co.il> Message-ID: > Yes, but: > > Linus's tree is under > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git > E.g. stable tree is under > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.18.y.gi > > So I do see the analogy: development trees are under developer's name, > stable trees are under stable/version. No? No, that's just an artifact of how the stable kernel tree is maintained. 'stable' is really just a pseudo-user that exists so both Greg KH and Chris Wright can share the same trees. - R. From mst at mellanox.co.il Mon Oct 30 09:08:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 19:08:32 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: References: Message-ID: <20061030170832.GM1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: creating releases for the libraries you own > > > Right, but consider that longterm we will need to add checksums > > and possibly sign packages, etc. Doing all this manually > > through wiki just to publish an RC will be a hassle, and error prone. > > Yes, I agree. That's why I said that just having ~user/html or > whatever isn't very interesting. Ideally we would have a really > automatic way of publishing releases. However as a short-term hack we > could convert the downloads page to a wiki page. Well, with ~user/html I can easily cook up a perl script to generate a MD5 checksums or sign stuff and just stick them in the same directory as original files. And while that's not as pretty, I think that'll do short-term. -- MST From rdreier at cisco.com Mon Oct 30 09:09:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 09:09:55 -0800 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <20061030170832.GM1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 19:08:32 +0200") References: <20061030170832.GM1941@mellanox.co.il> Message-ID: > Well, with ~user/html I can easily cook up a perl script to generate a MD5 > checksums or sign stuff and just stick them in the same directory as original files. But you can do that with any old hosting, can't you? Or am I missing something? I just don't see much point in yet another place to stick files on the web, unless it's tied into an "official" openfabrics download page. - R. From mshefty at ichips.intel.com Mon Oct 30 09:24:05 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 30 Oct 2006 09:24:05 -0800 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <4545F8E5.2000003@voltaire.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> <4545F8E5.2000003@voltaire.com> Message-ID: <45463535.2050302@ichips.intel.com> Or Gerlitz wrote: > One of the systems kernel is actually 2.6.19-rc3 and patches 1-7 (ie not > roland's tree) and i see there some issues also with ip multicast over > ipoib. I will move to use the same kernel config (roland's tree and > patches 1-7), then test ipoib and only then mckey, will let you know. > > Will you have the chance to test ipoib multicast and mckey over this > config at your environment? I can update the kernel on my test systems and run this again. - Sean From mst at mellanox.co.il Mon Oct 30 09:34:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 19:34:16 +0200 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: References: Message-ID: <20061030173416.GN1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: convention for git directories on the new git OFA server > > > Yes, but: > > > > Linus's tree is under > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git > > E.g. stable tree is under > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.18.y.gi > > > > So I do see the analogy: development trees are under developer's name, > > stable trees are under stable/version. No? > > No, that's just an artifact of how the stable kernel tree is > maintained. 'stable' is really just a pseudo-user that exists so both > Greg KH and Chris Wright can share the same trees. OK, good idea. Same for OFED I guess - we can have e.g. ~ofed/ if several people need to share the same ofed trees. So it seems it is best to stick with ~user/ convention - this allows creating trees without admin permission. Makes sense? -- MST From mst at mellanox.co.il Mon Oct 30 09:45:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 30 Oct 2006 19:45:37 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: References: Message-ID: <20061030174537.GO1941@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: creating releases for the libraries you own > > > Well, with ~user/html I can easily cook up a perl script to generate a MD5 > > checksums or sign stuff and just stick them in the same directory as original files. > > But you can do that with any old hosting, can't you? Or am I missing > something? This depends on the level of paranoia :) If all files are on the same server, I only have to trust that server's integrity. > I just don't see much point in yet another place to stick > files on the web, unless it's tied into an "official" openfabrics > download page. Its clear the tie needs to be there. As a first step, we can put a link to relevant directories on the official download page. -- MST From venkatesh.babu at 3leafnetworks.com Mon Oct 30 10:38:34 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Mon, 30 Oct 2006 10:38:34 -0800 Subject: [openib-general] APM support in openib stack In-Reply-To: <4544CA32.2070008@dev.mellanox.co.il> References: <453F9A12.4020500@ichips.intel.com> <454127B0.8080109@veritas.com> <45429B74.8090607@3leafnetworks.com> <4544CA32.2070008@dev.mellanox.co.il> Message-ID: <454646AA.5020509@3leafnetworks.com> Thanks for sending the patch. I will try this out and let you know the results. VBabu Dotan Barak wrote: > Hi. > > > Venkatesh Babu wrote: > >> I don't think there is any event which says "path1 is back again". >> It is the application which needs to load the alternate path. The HW >> just sends an event IB_EVENT_PORT_ACTIVE when port comes up. Upon >> recipt of the this event the application has to see if there exists a >> path from this port to the remote node and then load this alternate >> path by sending the APR message. >> PS: In Gen1 implementation there was an event called >> IB_PATH_MIG_ARMED which was generated by HW/FW after loading the >> alternate path by the application. >> >> SA event notification is to just callback registered handlers when >> IB_EVENT_PORT_ACTIVE event occurrs on any node in the subnet or on a >> specific node according to the registeration parameters. >> >> VBabu >> >> somenath wrote: >> >> >> >>> Sean, >>> >>> will there be a new API for SA event notification? >>> today we already get this IB_EVENT_PATH_MIG (as defined below), >>> will "path1 is back again" event >>> be delivered the same way? >>> >>> thanks, som. >>> >>> enum ib_event_type { >>> IB_EVENT_CQ_ERR, >>> IB_EVENT_QP_FATAL, >>> IB_EVENT_QP_REQ_ERR, >>> IB_EVENT_QP_ACCESS_ERR, >>> IB_EVENT_COMM_EST, >>> IB_EVENT_SQ_DRAINED, >>> IB_EVENT_PATH_MIG, >>> IB_EVENT_PATH_MIG_ERR, >>> IB_EVENT_DEVICE_FATAL, >>> IB_EVENT_PORT_ACTIVE, >>> IB_EVENT_PORT_ERR, >>> IB_EVENT_LID_CHANGE, >>> IB_EVENT_PKEY_CHANGE, >>> IB_EVENT_SM_CHANGE, >>> IB_EVENT_SRQ_ERR, >>> IB_EVENT_SRQ_LIMIT_REACHED, >>> IB_EVENT_QP_LAST_WQE_REACHED >>> }; >>> >> > > I checked the code of the file cm.c (if OFED 1.1) and the attribute > alt_timeout is not mentioned anywhere in this code. > I believe that the value of this attribute is set to zero, which means > that the QP will wait infinite time to the answer (that will never come). > > Venkatesh, can you check this issue by querying the QP attributes > after the path was migrated? > I think that you will find that the value of the timeout attribute is > zero. > > Sean, i don't familiar with the cm.c code, but i believe that the > following patch will solve this issue: > > Index: last_stable/drivers/infiniband/core/cm.c > =================================================================== > --- last_stable.orig/drivers/infiniband/core/cm.c 2006-10-29 > 16:58:08.000000000 +0200 > +++ last_stable/drivers/infiniband/core/cm.c 2006-10-29 > 17:31:57.000000000 +0200 > @@ -3221,6 +3221,7 @@ static int cm_init_qp_rtr_attr(struct cm > if (cm_id_priv->alt_av.ah_attr.dlid) { > *qp_attr_mask |= IB_QP_ALT_PATH; > qp_attr->alt_port_num = > cm_id_priv->alt_av.port->port_num; > + qp_attr->alt_timeout = > cm_id_priv->alt_av.packet_life_time; > qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; > } > ret = 0; > > > thanks > Dotan From mshefty at ichips.intel.com Mon Oct 30 12:58:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 30 Oct 2006 12:58:24 -0800 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <45463535.2050302@ichips.intel.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> <4545F8E5.2000003@voltaire.com> <45463535.2050302@ichips.intel.com> Message-ID: <45466770.9050107@ichips.intel.com> Sean Hefty wrote: >>One of the systems kernel is actually 2.6.19-rc3 and patches 1-7 (ie not >>roland's tree) and i see there some issues also with ip multicast over >>ipoib. I will move to use the same kernel config (roland's tree and >>patches 1-7), then test ipoib and only then mckey, will let you know. >> >>Will you have the chance to test ipoib multicast and mckey over this >>config at your environment? This seemed to work fine for me in loopback mode. I'm updating another test system to check between systems now. - Sean From sashak at voltaire.com Mon Oct 30 13:18:28 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 30 Oct 2006 23:18:28 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061030114453.GA396@mellanox.co.il> References: <20061028200425.GF11988@sashak.voltaire.com> <20061030114453.GA396@mellanox.co.il> Message-ID: <20061030211828.GD12259@sashak.voltaire.com> On 13:44 Mon 30 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > Subject: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > node_desc.description buffer is received from the network and should > > not be NULL-terminated. In such cases using it as regular string in > > functions like strcmp() or printf() leads to segmentation faults. > > This patch fixes such usages. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > diags/src/saquery.c | 22 ++++++++++++++++------ > > 1 files changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > > index 5b4a85e..f5b23fd 100644 > > --- a/diags/src/saquery.c > > +++ b/diags/src/saquery.c > > @@ -90,17 +90,21 @@ static void > > print_node_desc(ib_node_record_t *node_record) > > { > > ib_node_info_t *p_ni = &(node_record->node_info); > > + ib_node_desc_t *p_nd = &(node_record->node_desc); > > if (p_ni->node_type == IB_NODE_TYPE_CA) > > { > > + char desc[sizeof(p_nd->description) + 1]; > > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > > + desc[sizeof(desc) - 1] = '\0'; > > printf("%6d \"%s\"\n", > > - cl_ntoh16(node_record->lid), > > - node_record->node_desc.description); > > + cl_ntoh16(node_record->lid), desc); > > } > > } > > Would it not be simpler, and cleaner, to limit the string width in printf: > printf("%6d \"%.*s\"\n", > cl_ntoh16(node_record->lid), > sizeof(desc), > node_record->node_desc.description); This would be simpler. However some web searching shows that not all printf() implementation permits not null terminated arrays even when precision is specified (some issues were reported even with glibc-2.3.2). OTOH I understand your concerns and hate this stupid copying. Originally wanted just to terminate node_desc.description array by '\0', but then potentially this array can be truncated. Sasha From halr at voltaire.com Mon Oct 30 13:29:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 16:29:41 -0500 Subject: [openib-general] [PATCH] OpenSM: Update opensm man page and modular-routing.txt for LID matrix handling Message-ID: <1162243772.15895.85885.camel@hal.voltaire.com> OpenSM: Update opensm man page and modular-routing.txt for LID matrix handling Signed-off-by: Hal Rosenstock Index: man/opensm.8 =================================================================== --- man/opensm.8 (revision 10004) +++ man/opensm.8 (working copy) @@ -5,7 +5,7 @@ opensm \- InfiniBand subnet manager and .SH SYNOPSIS .B opensm -[\-c(ache-options)] [\-g(uid)[=]] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] [\-R | \-routing_engine ] [\-U | \-ucast_file ] [\-a(dd_guid_file) ] [\-o(nce)] [\-s(weep) ] [\-t(imeout) ] [\-maxsmps ] [\-console] [\-i(gnore-guids) ] [\-f | \-\-log_file] [\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-no_qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] +[\-c(ache-options)] [\-g(uid)[=]] [\-l(mc) ] [\-p(riority) ] [\-smkey ] [\-r(eassign_lids)] [\-R | \-routing_engine ] [\-M | \-lid_matrix_file ] [\-U | \-ucast_file ] [\-a(dd_guid_file) ] [\-o(nce)] [\-s(weep) ] [\-t(imeout) ] [\-maxsmps ] [\-console] [\-i(gnore-guids) ] [\-f | \-\-log_file] [\-L | \-\-log_limit ] [\-e(rase_log_file)] [\-P(config)] [\-Q | \-no_qos] [\-N | \-no_part_enforce] [\-y | \-stay_on_fatal] [\-v(erbose)] [\-V] [\-D ] [\-d(ebug) ] [\-h(elp)] [\-?] .SH DESCRIPTION .PP @@ -85,6 +85,11 @@ LID assignments resolving multiple use o This option chooses routing engine instead of Min Hop algorithm (default). Supported engines: updn, file .TP +\fB\-M\fR, \fB\-\-lid_matrix_file\fR +This option specifies name of the lid matrix dump file +from where switch lid matrices (min hops tables will be +loaded. +.TP \fB\-U\fR, \fB\-\-ucast_file\fR This option specifies name of the unicast dump file from where switch forwarding tables will be loaded. @@ -566,6 +571,27 @@ To activate file based routing module, u If the dump_file is not found or is in error, the default routing algorithm is utilized. +The ability to dump switch lid matrices (aka min hops tables) to file and +later to load these is also supported. + +The usage is similar to unicast forwarding tables loading from dump +file (introduced by 'file' routing engine), but new lid matrix file +name should be specified by -M or --lid_matrix_file option. For example: + + opensm -R file -M ./opensm-lid-matrix.dump + +The dump file is named 'opensm-lid-matrix.dump' and will be generated in +standard opensm dump directory (/var/log by default) when +OSM_LOG_ROUTING logging flag is set. + +When routing engine 'file' is activated, but dump file is not specified +or not cannot be open default lid matrix algorithm will be used. + +There is also a switch forwarding tables dumper which generates +a file compatible with dump_lfts.sh output. This file can be used +as input for forwarding tables loading by 'file' routing engine. +Both or one of options -U and -M can be specified together with '-R file'. + .SH AUTHORS .TP Index: doc/modular-routing.txt =================================================================== --- doc/modular-routing.txt (revision 10004) +++ doc/modular-routing.txt (working copy) @@ -51,6 +51,27 @@ In order to activate new module use: If the dump_file is not found or is in error, the default routing algorithm is utilized. +The ability to dump switch lid matrices (aka min hops tables) to file and +later to load these is also supported. + +The usage is similar to unicast forwarding tables loading from dump +file (introduced by 'file' routing engine), but new lid matrix file +name should be specified by -M or --lid_matrix_file option. For example: + + opensm -R file -M ./opensm-lid-matrix.dump + +The dump file is named 'opensm-lid-matrix.dump' and will be generated in +standard opensm dump directory (/var/log by default) when +OSM_LOG_ROUTING logging flag is set. + +When routing engine 'file' is activated, but dump file is not specified +or not cannot be open default lid matrix algorithm will be used. + +There is also a switch forwarding tables dumper which generates +a file compatible with dump_lfts.sh output. This file can be used +as input for forwarding tables loading by 'file' routing engine. +Both or one of options -U and -M can be specified together with '-R file'. + NOTE: ibroute has been updated (for switch management ports) to support this. Also, lmc was added to switch management ports. ibroute needs to be r7855 or later from the trunk. From halr at voltaire.com Mon Oct 30 13:48:30 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 16:48:30 -0500 Subject: [openib-general] ibstatus support for speed Message-ID: <1162244902.15895.86684.camel@hal.voltaire.com> Should support for speed be added to ibstatus ? Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0008:f104:0396:0559 base lid: 0xa sm lid: 0xa state: 1: DOWN phys state: 2: Polling rate: 2.5 Gb/sec (1X) Infiniband device 'mthca0' port 2 status: default gid: fe80:0000:0000:0000:0008:f104:0396:055a base lid: 0xe sm lid: 0xa state: 1: DOWN phys state: 2: Polling rate: 10 Gb/sec (4X) -- Hal From halr at voltaire.com Mon Oct 30 13:49:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 16:49:01 -0500 Subject: [openib-general] Support for optional 64-bit port counters in sysfs Message-ID: <1162244938.15895.86686.camel@hal.voltaire.com> Should the optional 64-bit port counters be supported in sysfs ? -- Hal From rdreier at cisco.com Mon Oct 30 13:55:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 13:55:47 -0800 Subject: [openib-general] ibstatus support for speed In-Reply-To: <1162244902.15895.86684.camel@hal.voltaire.com> (Hal Rosenstock's message of "30 Oct 2006 16:48:30 -0500") References: <1162244902.15895.86684.camel@hal.voltaire.com> Message-ID: > Should support for speed be added to ibstatus ? What is "speed"? Is that something beyond the existing "rate" line: > rate: 10 Gb/sec (4X) - R. From rdreier at cisco.com Mon Oct 30 13:56:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 13:56:43 -0800 Subject: [openib-general] Support for optional 64-bit port counters in sysfs In-Reply-To: <1162244938.15895.86686.camel@hal.voltaire.com> (Hal Rosenstock's message of "30 Oct 2006 16:49:01 -0500") References: <1162244938.15895.86686.camel@hal.voltaire.com> Message-ID: > Should the optional 64-bit port counters be supported in sysfs ? I guess so. If someone sends a patch I would be inclined to merge it. But are there any devices that support them? - R. From halr at voltaire.com Mon Oct 30 14:01:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 17:01:41 -0500 Subject: [openib-general] ibstatus support for speed In-Reply-To: References: <1162244902.15895.86684.camel@hal.voltaire.com> Message-ID: <1162245696.15895.87198.camel@hal.voltaire.com> On Mon, 2006-10-30 at 16:55, Roland Dreier wrote: > > Should support for speed be added to ibstatus ? > > What is "speed"? Is that something beyond the existing "rate" line: > > > rate: 10 Gb/sec (4X) So rate = speed * width ? -- Hal > - R. From rdreier at cisco.com Mon Oct 30 14:05:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 14:05:34 -0800 Subject: [openib-general] ibstatus support for speed In-Reply-To: <1162245696.15895.87198.camel@hal.voltaire.com> (Hal Rosenstock's message of "30 Oct 2006 17:01:41 -0500") References: <1162244902.15895.86684.camel@hal.voltaire.com> <1162245696.15895.87198.camel@hal.voltaire.com> Message-ID: Hal> So rate = speed * width ? Yes, you should see the right think on DDR systems etc. - R. From halr at voltaire.com Mon Oct 30 14:05:08 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 17:05:08 -0500 Subject: [openib-general] Support for optional 64-bit port counters in sysfs In-Reply-To: References: <1162244938.15895.86686.camel@hal.voltaire.com> Message-ID: <1162245902.15895.87340.camel@hal.voltaire.com> On Mon, 2006-10-30 at 16:56, Roland Dreier wrote: > > Should the optional 64-bit port counters be supported in sysfs ? > > I guess so. If someone sends a patch I would be inclined to merge it. OK; How should devices that didn't support these be handled ? Should there be missing files or the files have values of 0 or something else ? > But are there any devices that support them? Yes. -- Hal > - R. From rdreier at cisco.com Mon Oct 30 14:11:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 14:11:26 -0800 Subject: [openib-general] Support for optional 64-bit port counters in sysfs In-Reply-To: <1162245902.15895.87340.camel@hal.voltaire.com> (Hal Rosenstock's message of "30 Oct 2006 17:05:08 -0500") References: <1162244938.15895.86686.camel@hal.voltaire.com> <1162245902.15895.87340.camel@hal.voltaire.com> Message-ID: > > I guess so. If someone sends a patch I would be inclined to merge it. > > OK; How should devices that didn't support these be handled ? Should > there be missing files or the files have values of 0 or something else ? I would only create the files if the counters are supported. > > But are there any devices that support them? > > Yes. Can you give a hint about what the devices are? - R. From venkatesh.babu at 3leafnetworks.com Mon Oct 30 14:44:43 2006 From: venkatesh.babu at 3leafnetworks.com (Venkatesh Babu) Date: Mon, 30 Oct 2006 14:44:43 -0800 Subject: [openib-general] APM support in openib stack In-Reply-To: <4544CA32.2070008@dev.mellanox.co.il> References: <453F9A12.4020500@ichips.intel.com> <454127B0.8080109@veritas.com> <45429B74.8090607@3leafnetworks.com> <4544CA32.2070008@dev.mellanox.co.il> Message-ID: <4546805B.8060707@3leafnetworks.com> I tried this patch and it is working fine. Now if I remove the both the cables connected to the destination, the IB_WC_RETRY_EXC_ERR on the first outstanding WR on a CQ as expected. With this patch I think all my APM related isses were resolved. Dotan, you can check this fix into the OFED svn. Thanks for providing the fix. VBabu > > I checked the code of the file cm.c (if OFED 1.1) and the attribute > alt_timeout is not mentioned anywhere in this code. > I believe that the value of this attribute is set to zero, which means > that the QP will wait infinite time to the answer (that will never come). > > Venkatesh, can you check this issue by querying the QP attributes > after the path was migrated? > I think that you will find that the value of the timeout attribute is > zero. > > Sean, i don't familiar with the cm.c code, but i believe that the > following patch will solve this issue: > > Index: last_stable/drivers/infiniband/core/cm.c > =================================================================== > --- last_stable.orig/drivers/infiniband/core/cm.c 2006-10-29 > 16:58:08.000000000 +0200 > +++ last_stable/drivers/infiniband/core/cm.c 2006-10-29 > 17:31:57.000000000 +0200 > @@ -3221,6 +3221,7 @@ static int cm_init_qp_rtr_attr(struct cm > if (cm_id_priv->alt_av.ah_attr.dlid) { > *qp_attr_mask |= IB_QP_ALT_PATH; > qp_attr->alt_port_num = > cm_id_priv->alt_av.port->port_num; > + qp_attr->alt_timeout = > cm_id_priv->alt_av.packet_life_time; > qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; > } > ret = 0; > > > thanks > Dotan From halr at voltaire.com Mon Oct 30 14:21:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 17:21:18 -0500 Subject: [openib-general] Support for optional 64-bit port counters in sysfs In-Reply-To: References: <1162244938.15895.86686.camel@hal.voltaire.com> <1162245902.15895.87340.camel@hal.voltaire.com> Message-ID: <1162246873.15895.87996.camel@hal.voltaire.com> On Mon, 2006-10-30 at 17:11, Roland Dreier wrote: > > > I guess so. If someone sends a patch I would be inclined to merge it. > > > > OK; How should devices that didn't support these be handled ? Should > > there be missing files or the files have values of 0 or something else ? > > I would only create the files if the counters are supported. > > > > But are there any devices that support them? > > > > Yes. > > Can you give a hint about what the devices are? iPath for one. I think I also ran across some support for this in some Mellanox devices but my memory may be failing me on this. -- Hal > - R. From krause at cup.hp.com Mon Oct 30 14:29:31 2006 From: krause at cup.hp.com (Michael Krause) Date: Mon, 30 Oct 2006 14:29:31 -0800 Subject: [openib-general] ibstatus support for speed In-Reply-To: References: <1162244902.15895.86684.camel@hal.voltaire.com> <1162245696.15895.87198.camel@hal.voltaire.com> Message-ID: <6.2.0.14.2.20061030142220.09c150b8@esmail.cup.hp.com> At 02:05 PM 10/30/2006, Roland Dreier wrote: > Hal> So rate = speed * width ? > >Yes, you should see the right think on DDR systems etc. Strange. Bandwidth = signaling rate * width. This of course is raw bandwidth prior to encoding, protocol, etc. overheads which will derate the effective application bandwidth minimally be 20-25%. If the goal is provide a true indication of the maximum peak bandwidth that an application might see, then stating 10 Gbps for an IB x4 SDR is clearly a misrepresentation and out of alignment with other networking links such as Ethernet which customers understand its bandwidth to be minimally after the encoding, etc. is removed from the equation. The perpetual trend by marketing to use 10 Gbps IB as equivalent to 10 Gbps of application data is actually detrimental not beneficial when it comes to customers. It inevitably leads to the question of why the application is not achieving the stated bandwidth, i.e. why it is say 700-800MB/s theoretical peak for a x4 while a 10 GbE is 1 GB/s peak. So much marketing hype has gone forward already. I realize I'm tilting at windmills but if you are to provide a tool that is supposed to project the maximum bandwidth possible and given the goal of OFA is to provide as much conceptual commonality with existing network stacks / links, then it would be beneficial to have this move towards a much more apple-to-apple communication of information. I know it would certainly help with having to repeatedly explain why IB 10 Gbps is not the same as 10 GbE to customers and analysts. Mike From halr at voltaire.com Mon Oct 30 14:43:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Oct 2006 17:43:03 -0500 Subject: [openib-general] ibstatus support for speed In-Reply-To: <6.2.0.14.2.20061030142220.09c150b8@esmail.cup.hp.com> References: <1162244902.15895.86684.camel@hal.voltaire.com> <1162245696.15895.87198.camel@hal.voltaire.com> <6.2.0.14.2.20061030142220.09c150b8@esmail.cup.hp.com> Message-ID: <1162248170.15895.88897.camel@hal.voltaire.com> On Mon, 2006-10-30 at 17:29, Michael Krause wrote: > At 02:05 PM 10/30/2006, Roland Dreier wrote: > > Hal> So rate = speed * width ? > > > >Yes, you should see the right think on DDR systems etc. > > Strange. Bandwidth = signaling rate * width. This of course is raw > bandwidth prior to encoding, protocol, etc. overheads which will derate the > effective application bandwidth minimally be 20-25%. Yes of course. It's just a simple diagnostic to display the width and speed simply. > If the goal is > provide a true indication of the maximum peak bandwidth that an application > might see, That's not the goal of this simplistic tool. > then stating 10 Gbps for an IB x4 SDR is clearly a > misrepresentation and out of alignment with other networking links such as > Ethernet which customers understand its bandwidth to be minimally after the > encoding, etc. is removed from the equation. The perpetual trend by > marketing to use 10 Gbps IB as equivalent to 10 Gbps of application data is > actually detrimental not beneficial when it comes to customers. It > inevitably leads to the question of why the application is not achieving > the stated bandwidth, i.e. why it is say 700-800MB/s theoretical peak for a > x4 while a 10 GbE is 1 GB/s peak. So much marketing hype has gone forward > already. I realize I'm tilting at windmills but if you are to provide a > tool that is supposed to project the maximum bandwidth possible and given > the goal of OFA is to provide as much conceptual commonality with existing > network stacks / links, then it would be beneficial to have this move > towards a much more apple-to-apple communication of information. I know it > would certainly help with having to repeatedly explain why IB 10 Gbps is > not the same as 10 GbE to customers and analysts. Agreed but this is a different issue from what the tool is for. IMO this issue largely started when IB decided to use the signalling rate rather than the data rate like most other networks. -- Hal > Mike > > From sean.hefty at intel.com Mon Oct 30 15:52:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 30 Oct 2006 15:52:04 -0800 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race Message-ID: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> Require registration with ib_addr module to prevent caller from unloading while a callback is in progress. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 60d3fbd..894d856 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -47,6 +47,7 @@ struct addr_req { struct sockaddr src_addr; struct sockaddr dst_addr; struct rdma_dev_addr *addr; + struct rdma_addr_client *client; void *context; void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context); @@ -61,6 +62,26 @@ static LIST_HEAD(req_list); static DECLARE_WORK(work, process_req, NULL); static struct workqueue_struct *addr_wq; +void rdma_addr_register_client(struct rdma_addr_client *client) +{ + atomic_set(&client->refcount, 1); + init_completion(&client->comp); +} +EXPORT_SYMBOL(rdma_addr_register_client); + +static inline void deref_client(struct rdma_addr_client *client) +{ + if (atomic_dec_and_test(&client->refcount)) + complete(&client->comp); +} + +void rdma_addr_unregister_client(struct rdma_addr_client *client) +{ + deref_client(client); + wait_for_completion(&client->comp); +} +EXPORT_SYMBOL(rdma_addr_unregister_client); + int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, const unsigned char *dst_dev_addr) { @@ -229,6 +250,7 @@ static void process_req(void *data) list_del(&req->list); req->callback(req->status, &req->src_addr, req->addr, req->context); + deref_client(req->client); kfree(req); } } @@ -264,7 +286,8 @@ static int addr_resolve_local(struct soc return ret; } -int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr, +int rdma_resolve_ip(struct rdma_addr_client *client, + struct sockaddr *src_addr, struct sockaddr *dst_addr, struct rdma_dev_addr *addr, int timeout_ms, void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context), @@ -285,6 +308,8 @@ int rdma_resolve_ip(struct sockaddr *src req->addr = addr; req->callback = callback; req->context = context; + req->client = client; + atomic_inc(&client->refcount); src_in = (struct sockaddr_in *) &req->src_addr; dst_in = (struct sockaddr_in *) &req->dst_addr; @@ -305,6 +330,7 @@ int rdma_resolve_ip(struct sockaddr *src break; default: ret = req->status; + atomic_dec(&client->refcount); kfree(req); break; } diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2b4748e..d804a4d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -63,6 +63,7 @@ static struct ib_client cma_client = { }; static struct ib_sa_client sa_client; +static struct rdma_addr_client addr_client; static LIST_HEAD(dev_list); static LIST_HEAD(listen_any_list); static DEFINE_MUTEX(lock); @@ -1625,8 +1626,8 @@ int rdma_resolve_addr(struct rdma_cm_id if (cma_any_addr(dst_addr)) ret = cma_resolve_loopback(id_priv); else - ret = rdma_resolve_ip(&id->route.addr.src_addr, dst_addr, - &id->route.addr.dev_addr, + ret = rdma_resolve_ip(&addr_client, &id->route.addr.src_addr, + dst_addr, &id->route.addr.dev_addr, timeout_ms, addr_handler, id_priv); if (ret) goto err; @@ -2217,6 +2218,7 @@ static int cma_init(void) return -ENOMEM; ib_sa_register_client(&sa_client); + rdma_addr_register_client(&addr_client); ret = ib_register_client(&cma_client); if (ret) @@ -2224,6 +2226,7 @@ static int cma_init(void) return 0; err: + rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(cma_wq); return ret; @@ -2232,6 +2235,7 @@ err: static void cma_cleanup(void) { ib_unregister_client(&cma_client); + rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 81b6230..c094e50 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -36,6 +36,22 @@ #include #include #include +struct rdma_addr_client { + atomic_t refcount; + struct completion comp; +}; + +/** + * rdma_addr_register_client - Register an address client. + */ +void rdma_addr_register_client(struct rdma_addr_client *client); + +/** + * rdma_addr_unregister_client - Deregister an address client. + * @client: Client object to deregister. + */ +void rdma_addr_unregister_client(struct rdma_addr_client *client); + struct rdma_dev_addr { unsigned char src_dev_addr[MAX_ADDR_LEN]; unsigned char dst_dev_addr[MAX_ADDR_LEN]; @@ -52,6 +68,7 @@ int rdma_translate_ip(struct sockaddr *a /** * rdma_resolve_ip - Resolve source and destination IP addresses to * RDMA hardware addresses. + * @client: Address client associated with request. * @src_addr: An optional source address to use in the resolution. If a * source address is not provided, a usable address will be returned via * the callback. @@ -64,7 +81,8 @@ int rdma_translate_ip(struct sockaddr *a * or been canceled. A status of 0 indicates success. * @context: User-specified context associated with the call. */ -int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr, +int rdma_resolve_ip(struct rdma_addr_client *client, + struct sockaddr *src_addr, struct sockaddr *dst_addr, struct rdma_dev_addr *addr, int timeout_ms, void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context), From mshefty at ichips.intel.com Mon Oct 30 16:47:59 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 30 Oct 2006 16:47:59 -0800 Subject: [openib-general] APM support in openib stack In-Reply-To: <4544CA32.2070008@dev.mellanox.co.il> References: <453F9A12.4020500@ichips.intel.com> <454127B0.8080109@veritas.com> <45429B74.8090607@3leafnetworks.com> <4544CA32.2070008@dev.mellanox.co.il> Message-ID: <45469D3F.4000002@ichips.intel.com> > Sean, i don't familiar with the cm.c code, but i believe that the > following patch will solve this issue: > > Index: last_stable/drivers/infiniband/core/cm.c > =================================================================== > --- last_stable.orig/drivers/infiniband/core/cm.c 2006-10-29 16:58:08.000000000 +0200 > +++ last_stable/drivers/infiniband/core/cm.c 2006-10-29 17:31:57.000000000 +0200 > @@ -3221,6 +3221,7 @@ static int cm_init_qp_rtr_attr(struct cm > if (cm_id_priv->alt_av.ah_attr.dlid) { > *qp_attr_mask |= IB_QP_ALT_PATH; > qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; > + qp_attr->alt_timeout = cm_id_priv->alt_av.packet_life_time; > qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; > } > ret = 0; Thanks - something like this does appear to be needed. To mirror the primary path local ack time, we want something like packet_life_time + 1. We're also missing setting the alt_pkey_index here as well. I've switched around my schedule to work on the path failover code now, so I'll work on some patches and a test program for this. - Sean From rdreier at cisco.com Mon Oct 30 16:49:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 16:49:02 -0800 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: <20061030173416.GN1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 19:34:16 +0200") References: <20061030173416.GN1941@mellanox.co.il> Message-ID: > OK, good idea. Same for OFED I guess - we can have e.g. ~ofed/ if > several people need to share the same ofed trees. > > So it seems it is best to stick with ~user/ convention - this allows > creating trees without admin permission. Makes sense? That's OK, although /pub/scm may be easier in that it makes it easier to share trees without having to create a whole new user and try to manage permissions in a sane way. On kernel.org, the stable trees are owned by group kstable, and new people can be added to the kstable group as needed. - R. From rdreier at cisco.com Mon Oct 30 16:50:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 16:50:26 -0800 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: <20061030174537.GO1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 19:45:37 +0200") References: <20061030174537.GO1941@mellanox.co.il> Message-ID: > > > Well, with ~user/html I can easily cook up a perl script to generate a MD5 > > > checksums or sign stuff and just stick them in the same directory as original files. > > > > But you can do that with any old hosting, can't you? Or am I missing > > something? > > This depends on the level of paranoia :) > If all files are on the same server, I only have to trust that server's integrity. But we're talking about signed releases, right? Surely you're not going to put your private key on some web server -- you're going to sign the packages before you upload them anyway. So I still don't see why I care about web hosting, given how many other places already give it to me. - R. From mst at mellanox.co.il Mon Oct 30 20:30:53 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 06:30:53 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061030211828.GD12259@sashak.voltaire.com> References: <20061030211828.GD12259@sashak.voltaire.com> Message-ID: <20061031043053.GA4622@mellanox.co.il> Quoting r. Sasha Khapyorsky : > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > On 13:44 Mon 30 Oct , Michael S. Tsirkin wrote: > > Quoting r. Sasha Khapyorsky : > > > Subject: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > > > > node_desc.description buffer is received from the network and should > > > not be NULL-terminated. In such cases using it as regular string in > > > functions like strcmp() or printf() leads to segmentation faults. > > > This patch fixes such usages. > > > > > > Signed-off-by: Sasha Khapyorsky > > > --- > > > diags/src/saquery.c | 22 ++++++++++++++++------ > > > 1 files changed, 16 insertions(+), 6 deletions(-) > > > > > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > > > index 5b4a85e..f5b23fd 100644 > > > --- a/diags/src/saquery.c > > > +++ b/diags/src/saquery.c > > > @@ -90,17 +90,21 @@ static void > > > print_node_desc(ib_node_record_t *node_record) > > > { > > > ib_node_info_t *p_ni = &(node_record->node_info); > > > + ib_node_desc_t *p_nd = &(node_record->node_desc); > > > if (p_ni->node_type == IB_NODE_TYPE_CA) > > > { > > > + char desc[sizeof(p_nd->description) + 1]; > > > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > > > + desc[sizeof(desc) - 1] = '\0'; > > > printf("%6d \"%s\"\n", > > > - cl_ntoh16(node_record->lid), > > > - node_record->node_desc.description); > > > + cl_ntoh16(node_record->lid), desc); > > > } > > > } > > > > Would it not be simpler, and cleaner, to limit the string width in printf: > > printf("%6d \"%.*s\"\n", > > cl_ntoh16(node_record->lid), > > sizeof(desc), > > node_record->node_desc.description); > > This would be simpler. However some web searching shows that not all > printf() implementation permits not null terminated arrays even when > precision is specified (some issues were reported even with glibc-2.3.2). Hmm, couldn't find it. Warrants a comment then? > OTOH I understand your concerns and hate this stupid copying. Originally > wanted just to terminate node_desc.description array by '\0', but then > potentially this array can be truncated. > > Sasha > -- MST From mst at mellanox.co.il Mon Oct 30 20:41:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 06:41:17 +0200 Subject: [openib-general] creating releases for the libraries you own In-Reply-To: References: Message-ID: <20061031044117.GB4622@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: creating releases for the libraries you own > > > > > Well, with ~user/html I can easily cook up a perl script to generate a MD5 > > > > checksums or sign stuff and just stick them in the same directory as original files. > > > > > > But you can do that with any old hosting, can't you? Or am I missing > > > something? > > > > This depends on the level of paranoia :) If all files are on the same > > server, I only have to trust that server's integrity. > > But we're talking about signed releases, right? Surely you're not > going to put your private key on some web server -- you're going to > sign the packages before you upload them anyway. So I still don't see > why I care about web hosting, given how many other places already give > it to me. Not me as a developer - me as a user :). The user already has to trust openfabrics server's integrity since that's where he got the download link from. So at least the signatures should be on the openfabrics server too - otherwise its an extra server to trust, for the user. And since this means we need web hosting on openfabrics server already, let's put the packages themselves there, too. -- MST From mst at mellanox.co.il Mon Oct 30 20:43:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 06:43:41 +0200 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: References: Message-ID: <20061031044341.GC4622@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: convention for git directories on the new git OFA server > > > OK, good idea. Same for OFED I guess - we can have e.g. ~ofed/ if > > several people need to share the same ofed trees. > > > > So it seems it is best to stick with ~user/ convention - this allows > > creating trees without admin permission. Makes sense? > > That's OK, although /pub/scm may be easier in that it makes it easier > to share trees without having to create a whole new user and try to > manage permissions in a sane way. On kernel.org, the stable trees are > owned by group kstable, and new people can be added to the kstable > group as needed. So, what do you suggest for openfabrics? Could you write it up? -- MST From sashak at voltaire.com Mon Oct 30 20:59:02 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 31 Oct 2006 06:59:02 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031043053.GA4622@mellanox.co.il> References: <20061030211828.GD12259@sashak.voltaire.com> <20061031043053.GA4622@mellanox.co.il> Message-ID: <20061031045902.GB17784@sashak.voltaire.com> On 06:30 Tue 31 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > On 13:44 Mon 30 Oct , Michael S. Tsirkin wrote: > > > Quoting r. Sasha Khapyorsky : > > > > Subject: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > > > > > > > node_desc.description buffer is received from the network and should > > > > not be NULL-terminated. In such cases using it as regular string in > > > > functions like strcmp() or printf() leads to segmentation faults. > > > > This patch fixes such usages. > > > > > > > > Signed-off-by: Sasha Khapyorsky > > > > --- > > > > diags/src/saquery.c | 22 ++++++++++++++++------ > > > > 1 files changed, 16 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > > > > index 5b4a85e..f5b23fd 100644 > > > > --- a/diags/src/saquery.c > > > > +++ b/diags/src/saquery.c > > > > @@ -90,17 +90,21 @@ static void > > > > print_node_desc(ib_node_record_t *node_record) > > > > { > > > > ib_node_info_t *p_ni = &(node_record->node_info); > > > > + ib_node_desc_t *p_nd = &(node_record->node_desc); > > > > if (p_ni->node_type == IB_NODE_TYPE_CA) > > > > { > > > > + char desc[sizeof(p_nd->description) + 1]; > > > > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > > > > + desc[sizeof(desc) - 1] = '\0'; > > > > printf("%6d \"%s\"\n", > > > > - cl_ntoh16(node_record->lid), > > > > - node_record->node_desc.description); > > > > + cl_ntoh16(node_record->lid), desc); > > > > } > > > > } > > > > > > Would it not be simpler, and cleaner, to limit the string width in printf: > > > printf("%6d \"%.*s\"\n", > > > cl_ntoh16(node_record->lid), > > > sizeof(desc), > > > node_record->node_desc.description); > > > > This would be simpler. However some web searching shows that not all > > printf() implementation permits not null terminated arrays even when > > precision is specified (some issues were reported even with glibc-2.3.2). > > Hmm, couldn't find it. Look at this for example: http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html Sasha > Warrants a comment then? > > > OTOH I understand your concerns and hate this stupid copying. Originally > > wanted just to terminate node_desc.description array by '\0', but then > > potentially this array can be truncated. > > > > Sasha > > > > -- > MST From mst at mellanox.co.il Mon Oct 30 21:11:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 07:11:13 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031045902.GB17784@sashak.voltaire.com> References: <20061030211828.GD12259@sashak.voltaire.com> <20061031043053.GA4622@mellanox.co.il> <20061031045902.GB17784@sashak.voltaire.com> Message-ID: <20061031051113.GE4622@mellanox.co.il> Quoting r. Sasha Khapyorsky : > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > On 06:30 Tue 31 Oct , Michael S. Tsirkin wrote: > > Quoting r. Sasha Khapyorsky : > > > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > On 13:44 Mon 30 Oct , Michael S. Tsirkin wrote: > > > > Quoting r. Sasha Khapyorsky : > > > > > Subject: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > > > > > > > > > > node_desc.description buffer is received from the network and should > > > > > not be NULL-terminated. In such cases using it as regular string in > > > > > functions like strcmp() or printf() leads to segmentation faults. > > > > > This patch fixes such usages. > > > > > > > > > > Signed-off-by: Sasha Khapyorsky > > > > > --- > > > > > diags/src/saquery.c | 22 ++++++++++++++++------ > > > > > 1 files changed, 16 insertions(+), 6 deletions(-) > > > > > > > > > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > > > > > index 5b4a85e..f5b23fd 100644 > > > > > --- a/diags/src/saquery.c > > > > > +++ b/diags/src/saquery.c > > > > > @@ -90,17 +90,21 @@ static void > > > > > print_node_desc(ib_node_record_t *node_record) > > > > > { > > > > > ib_node_info_t *p_ni = &(node_record->node_info); > > > > > + ib_node_desc_t *p_nd = &(node_record->node_desc); > > > > > if (p_ni->node_type == IB_NODE_TYPE_CA) > > > > > { > > > > > + char desc[sizeof(p_nd->description) + 1]; > > > > > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > > > > > + desc[sizeof(desc) - 1] = '\0'; > > > > > printf("%6d \"%s\"\n", > > > > > - cl_ntoh16(node_record->lid), > > > > > - node_record->node_desc.description); > > > > > + cl_ntoh16(node_record->lid), desc); > > > > > } > > > > > } > > > > > > > > Would it not be simpler, and cleaner, to limit the string width in printf: > > > > printf("%6d \"%.*s\"\n", > > > > cl_ntoh16(node_record->lid), > > > > sizeof(desc), > > > > node_record->node_desc.description); > > > > > > This would be simpler. However some web searching shows that not all > > > printf() implementation permits not null terminated arrays even when > > > precision is specified (some issues were reported even with glibc-2.3.2). > > > > Hmm, couldn't find it. > > Look at this for example: > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html Hmm, yea. Do you understand the answer there? Does not make sense to me ... -- MST From jgunthorpe at obsidianresearch.com Mon Oct 30 21:13:22 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Mon, 30 Oct 2006 22:13:22 -0700 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031045902.GB17784@sashak.voltaire.com> References: <20061030211828.GD12259@sashak.voltaire.com> <20061031043053.GA4622@mellanox.co.il> <20061031045902.GB17784@sashak.voltaire.com> Message-ID: <20061031051322.GC21576@obsidianresearch.com> On Tue, Oct 31, 2006 at 06:59:02AM +0200, Sasha Khapyorsky wrote: > > > This would be simpler. However some web searching shows that not all > > > printf() implementation permits not null terminated arrays even when > > > precision is specified (some issues were reported even with glibc-2.3.2). > > > > Hmm, couldn't find it. > > Look at this for example: > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html Interestingly SUSv3 (aka POSIX 2001) only requires the terminating null if the precision is longer than the array size. s The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte. Regards, Jason From rdreier at cisco.com Mon Oct 30 21:20:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 21:20:39 -0800 Subject: [openib-general] [PATCH 1/2] ib/core/uverbs: return sq_draining value in query_qp response In-Reply-To: <200610251254.20996.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 25 Oct 2006 12:54:20 +0200") References: <200610251254.20996.jackm@dev.mellanox.co.il> Message-ID: Ok, I thought about this probably way more than it was worth. And I think we might as well just go ahead and make this change. The only sensible situation is with both a new kernel and new userspace -- everything else is broken anyway. So I just applied this as-is. - R. From rdreier at cisco.com Mon Oct 30 21:26:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 21:26:39 -0800 Subject: [openib-general] [PATCH 2/2] libibverbs-1.0: return sq_draining value in query_qp response In-Reply-To: <200610251254.24827.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Wed, 25 Oct 2006 12:54:24 +0200") References: <200610251254.24827.jackm@dev.mellanox.co.il> Message-ID: Thanks, applied to libibverbs and libibverbs-1.0 From rdreier at cisco.com Mon Oct 30 21:29:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 30 Oct 2006 21:29:11 -0800 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: <20061031044341.GC4622@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 31 Oct 2006 06:43:41 +0200") References: <20061031044341.GC4622@mellanox.co.il> Message-ID: > So, what do you suggest for openfabrics? Could you write it up? Actually, ~user is probably OK. If we want to do ~ofed then we just have to create an ofed user, and a group to put everyone with write permission to the ofed tree. Creating directories in /pub/scm just saves creating the fake user and fake home directory. - R. From mst at mellanox.co.il Mon Oct 30 21:43:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 07:43:08 +0200 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: References: <20061031044341.GC4622@mellanox.co.il> Message-ID: <20061031054308.GA705@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: convention for git directories on the new git OFA server > > > So, what do you suggest for openfabrics? Could you write it up? > > Actually, ~user is probably OK. If we want to do ~ofed then we just > have to create an ofed user, and a group to put everyone with write > permission to the ofed tree. We don't need the group in this case - just add relevant peple's keys to ~ofed/.ssh/authorized_keys, and they'll be able to push. The main advantage of that approach being, that anyone who already has permission for ofed tree can add other people, all this without admin permissions. > Creating directories in /pub/scm just > saves creating the fake user and fake home directory. -- MST From mst at mellanox.co.il Mon Oct 30 21:46:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 07:46:47 +0200 Subject: [openib-general] convention for git directories on the new git OFA server In-Reply-To: <20061031054308.GA705@mellanox.co.il> References: <20061031044341.GC4622@mellanox.co.il> <20061031054308.GA705@mellanox.co.il> Message-ID: <20061031054647.GB705@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: convention for git directories on the new git OFA server > > Quoting r. Roland Dreier : > > Subject: Re: convention for git directories on the new git OFA server > > > > > So, what do you suggest for openfabrics? Could you write it up? > > > > Actually, ~user is probably OK. If we want to do ~ofed then we just > > have to create an ofed user, and a group to put everyone with write > > permission to the ofed tree. > > We don't need the group in this case - just add relevant peple's keys > to ~ofed/.ssh/authorized_keys, and they'll be able to push. > The main advantage of that approach being, that anyone who already has > permission for ofed tree can add other people, all this without admin > permissions. Oh, and a separate user to git-shell will help prevent stupid mistakes like working inside the ofed directory. > > Creating directories in /pub/scm just > > saves creating the fake user and fake home directory. -- MST From vlad at dev.mellanox.co.il Tue Oct 31 00:46:17 2006 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 31 Oct 2006 10:46:17 +0200 Subject: [openib-general] OFED 1.1 Build Issue In-Reply-To: References: Message-ID: <45470D59.7020705@dev.mellanox.co.il> The alternative way to resolve this issue is the following: Save Modules.symvers file generated by OFED kernel modules compilation (drivers/infiniband/Modules.symvers). It can be added to the kernel-ib-devel RPM in the next OFED release. Then in order to compile external module copy this Modules.symvers to the directory where external module is build. Regards, Vladimir Moni Shoua wrote: > We managed to avoid rebuilding the kernel to solve this issue. > > Before building any IB dependant modules (out of OFED) it is required to > update the Module.symvers. > The new values for the symbol CRCs can be taken from the modules > themselves ( nm IB_MODULE |grep __crc_) > When Module.symvers is up-to-date, there should be no problem building > and installing the IB dependant modules. > > The solution step-by-step > 1. The procedure should run after installing the kerne-ib-devel RPM. It > is possible to run it in %pre section of the spec file. > 2. Foreach IB module (ko) which is listed in $(rpm -ql kernel-ib) - > 2.1 take out the __crc_ sybbols > 2.2 extract the symbol name and it's CRC value (simple parsing) > 2.3 add it (or replace the existing) to Module.symvers (usually > under /lib/modules/$(uname -r)/build/ or /lib/modules/$(uname > -r)/source/ ) > 3. Save the diff of the current Module.symvers from the original (for > future restore) > 4. When kernel-ib-devel RPM is uninstalled use the patch from (3) to > restore Module.symvers. This can be done in the %postun of the spec > file) > > I'd be glad to get comments about this. > > > > > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Tom Tucker > Sent: Friday, October 27, 2006 5:30 PM > To: openib-general > Subject: [openib-general] OFED 1.1 Build Issue > > > I've been testing some code against the OFED 1.1 release and noticed > that if you build anything that depends on IB (RNFS in this case) into > the kernel, that the OFED kit doesn't work correctly. This is because > the dependent modules (ib_core, etc...) get sucked into the kernel > automagically and will cause the subsequent modprobe of the OFED module > to fail. > > I don't think you can fix this without rebuilding the kernel so it > should probably be listed in the OFED_release_notes as a known issue. > Providing a mechanism to rebuild the kernel as part of the OFED install > would be great too, sorry if it's already there and I missed it. > > Tom > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Tue Oct 31 01:36:49 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 31 Oct 2006 11:36:49 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> Message-ID: <45471931.4070203@voltaire.com> Sean Hefty wrote: > Require registration with ib_addr module to prevent caller from unloading > while a callback is in progress. Sean, Is there a conceptual/practical difference between the ib_addr and ib_cm modules which enforces this client registration of the cma being an addr consumer but not against the cm? Does it happen to be the non existence of ib_addr id vs the existence of an ib_cm id? if yes, wouldn't it better/easier to solve both in the same method? Or. From mst at mellanox.co.il Tue Oct 31 01:52:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 11:52:17 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <45471931.4070203@voltaire.com> References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> <45471931.4070203@voltaire.com> Message-ID: <20061031095217.GA2387@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race > > Sean Hefty wrote: > > Require registration with ib_addr module to prevent caller from unloading > > while a callback is in progress. > > Sean, > > Is there a conceptual/practical difference between the ib_addr and ib_cm > modules which enforces this client registration of the cma being an addr > consumer but not against the cm? > > Does it happen to be the non existence of ib_addr id vs the existence of > an ib_cm id? if yes, wouldn't it better/easier to solve both in the same > method? > > Or. There's no difference - ib_cm also has this issue and needs a fix. -- MST From mst at mellanox.co.il Tue Oct 31 01:53:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 11:53:03 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> Message-ID: <20061031095303.GB2387@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race > > Require registration with ib_addr module to prevent caller from unloading > while a callback is in progress. > > Signed-off-by: Sean Hefty ib_cm has this bug as well. Shouldn't we patch it for 2.6.19 too? -- MST From monis at voltaire.com Tue Oct 31 02:14:26 2006 From: monis at voltaire.com (Moni Shoua) Date: Tue, 31 Oct 2006 12:14:26 +0200 Subject: [openib-general] OFED 1.1 Build Issue In-Reply-To: <45470D59.7020705@dev.mellanox.co.il> References: <45470D59.7020705@dev.mellanox.co.il> Message-ID: <45472202.9080104@voltaire.com> Vladimir Sokolovsky wrote: > The alternative way to resolve this issue is the following: > Save Modules.symvers file generated by OFED kernel modules compilation > (drivers/infiniband/Modules.symvers). > It can be added to the kernel-ib-devel RPM in the next OFED release. > Then in order to compile external module copy this Modules.symvers to > the directory where external module is build. > > Regards, > Vladimir > > > Moni Shoua wrote: > >> We managed to avoid rebuilding the kernel to solve this issue. >> >> Before building any IB dependant modules (out of OFED) it is required to >> update the Module.symvers. >> The new values for the symbol CRCs can be taken from the modules >> themselves ( nm IB_MODULE |grep __crc_) >> When Module.symvers is up-to-date, there should be no problem building >> and installing the IB dependant modules. >> >> The solution step-by-step >> 1. The procedure should run after installing the kerne-ib-devel RPM. It >> is possible to run it in %pre section of the spec file. >> 2. Foreach IB module (ko) which is listed in $(rpm -ql kernel-ib) - >> 2.1 take out the __crc_ sybbols 2.2 extract the symbol name >> and it's CRC value (simple parsing) >> 2.3 add it (or replace the existing) to Module.symvers (usually >> under /lib/modules/$(uname -r)/build/ or /lib/modules/$(uname >> -r)/source/ ) >> 3. Save the diff of the current Module.symvers from the original (for >> future restore) >> 4. When kernel-ib-devel RPM is uninstalled use the patch from (3) to >> restore Module.symvers. This can be done in the %postun of the spec >> file) >> >> I'd be glad to get comments about this. >> >> >> >> >> -----Original Message----- >> From: openib-general-bounces at openib.org >> [mailto:openib-general-bounces at openib.org] On Behalf Of Tom Tucker >> Sent: Friday, October 27, 2006 5:30 PM >> To: openib-general >> Subject: [openib-general] OFED 1.1 Build Issue >> >> >> I've been testing some code against the OFED 1.1 release and noticed >> that if you build anything that depends on IB (RNFS in this case) into >> the kernel, that the OFED kit doesn't work correctly. This is because >> the dependent modules (ib_core, etc...) get sucked into the kernel >> automagically and will cause the subsequent modprobe of the OFED module >> to fail. >> >> I don't think you can fix this without rebuilding the kernel so it >> should probably be listed in the OFED_release_notes as a known issue. >> Providing a mechanism to rebuild the kernel as part of the OFED install >> would be great too, sorry if it's already there and I missed it. >> >> Tom >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > > We already tried to go this way and found that a local Module.symvers is not always generated (but we might have missed something though). I suggest that you check that this alternative way works under all OSs compilation (SuSE and RedHat to be precise)... BTW, Why not updating the kernel Module.symvers when kernel-ib-devel is installed? This will free the developer from copying it to his/hers private directory. From ogerlitz at voltaire.com Tue Oct 31 02:26:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 31 Oct 2006 12:26:22 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <20061031095303.GB2387@mellanox.co.il> References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> <20061031095303.GB2387@mellanox.co.il> Message-ID: <454724CE.1040508@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : >> Subject: [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race >> Require registration with ib_addr module to prevent caller from unloading >> while a callback is in progress. > ib_cm has this bug as well. Shouldn't we patch it for 2.6.19 too? I know there was a similar discussion which i was not tracking... re registrations with the ib_sa module, however, please tell me if and why i am wrong: The kernel is a trusted environment, and hence a kernel consumer module willing to unload itself while holding references such as id for which a callback into this module is associated, breaks this assumption so it is buggy, and need to be fixed. For example what happens if a module opens a socket in the kernel and overrides the net stack callbacks for this socket (data-in, ready-to-send) with callbacks of its own (iscsi_tcp does that) and then unloads with this socket being open? my understanding is that the system would experience a crash. Same for scsi callbacks etc. My thinking that the only place where you must follow refrences and cleanup is while working with user space, so the rdma_ucm must clean addr/sa/cm IDs of processes who exit without cleaning (and uverbs must clean hca resources such as pd/mr/qp/ah etc). Or. From mst at mellanox.co.il Tue Oct 31 02:43:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 12:43:52 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <454724CE.1040508@voltaire.com> References: <454724CE.1040508@voltaire.com> Message-ID: <20061031104352.GD2387@mellanox.co.il> Quoting r. Or Gerlitz : > >> Require registration with ib_addr module to prevent caller from unloading > >> while a callback is in progress. > > > ib_cm has this bug as well. Shouldn't we patch it for 2.6.19 too? > > I know there was a similar discussion which i was not tracking... re > registrations with the ib_sa module, however, please tell me if and why > i am wrong: Look it up in the archives. Summary: The race happens on module unload - you might be inside the cm callback when the module is unloaded. Nothing the module itself does can help here - you must synchronize with the cm before unloading. -- MST From ogerlitz at voltaire.com Tue Oct 31 03:16:31 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 31 Oct 2006 13:16:31 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <20061031104352.GD2387@mellanox.co.il> References: <454724CE.1040508@voltaire.com> <20061031104352.GD2387@mellanox.co.il> Message-ID: <4547308F.2030708@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >>>> Require registration with ib_addr module to prevent caller from unloading >>>> while a callback is in progress. >>> ib_cm has this bug as well. Shouldn't we patch it for 2.6.19 too? >> I know there was a similar discussion which i was not tracking... re >> registrations with the ib_sa module, however, please tell me if and why >> i am wrong: > > Look it up in the archives. Summary: > > The race happens on module unload - you might be inside the cm callback when > the module is unloaded. Nothing the module itself does can help here - you must > synchronize with the cm before unloading. I think to understand: you say that the CM can call the callback while the module unloads. However, my point is that the cm consumer module must destroy its cm id before unloading and that the cm id destroy code would block till all inflight callbacks on this id are done. Similarly to destroy_timer_sync or whatever it is called. Am i still missing something? Or. From mst at mellanox.co.il Tue Oct 31 03:50:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 13:50:17 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <4547308F.2030708@voltaire.com> References: <4547308F.2030708@voltaire.com> Message-ID: <20061031115017.GF2387@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz : > >>>> Require registration with ib_addr module to prevent caller from unloading > >>>> while a callback is in progress. > >>> ib_cm has this bug as well. Shouldn't we patch it for 2.6.19 too? > >> I know there was a similar discussion which i was not tracking... re > >> registrations with the ib_sa module, however, please tell me if and why > >> i am wrong: > > > > Look it up in the archives. Summary: > > > > The race happens on module unload - you might be inside the cm callback when > > the module is unloaded. Nothing the module itself does can help here - you must > > synchronize with the cm before unloading. > > I think to understand: you say that the CM can call the callback while > the module unloads. However, my point is that the cm consumer module > must destroy its cm id before unloading and that the cm id destroy code > would block till all inflight callbacks on this id are done. Similarly > to destroy_timer_sync or whatever it is called. > > Am i still missing something? Yes, you miss the case where you do not destroy cm id explicitly, but rather return error code from callback instead of destroying the cm_id. -- MST From rkuchimanchi at silverstorm.com Tue Oct 31 04:34:10 2006 From: rkuchimanchi at silverstorm.com (Ramachandra K) Date: Tue, 31 Oct 2006 18:04:10 +0530 Subject: [openib-general] OFED 1.1 Build Issue In-Reply-To: <45472202.9080104@voltaire.com> References: <45470D59.7020705@dev.mellanox.co.il> <45472202.9080104@voltaire.com> Message-ID: <454742C2.2050900@silverstorm.com> Moni Shoua wrote: >We already tried to go this way and found that a local Module.symvers is >not always generated (but we might have missed something though). >I suggest that you check that this alternative way works under all OSs >compilation (SuSE and RedHat to be precise)... > > I think Module.symvers generation for external modules was added sometime around 2.6.16, so its not generated on the older kernels (for eg 2.6.9 kernels on RHEL) In this scenario, when there is no Module.symvers file, I guess the other option is to use a single Kbuild file to build both modules, as explained in section 7.3 of Documentation/kbuild/modules.txt. But this may not be feasible always. Come to think of it, why does the OFED installation procedure not update the kernel Module.symvers file when it replaces the old kernel modules present in /lib/modules/ with the new ones ? >BTW, Why not updating the kernel Module.symvers when kernel-ib-devel is >installed? This will free the developer from copying it to his/hers >private directory. > > It might be a good idea to update the Module.symvers file as part of the normal installation and not only kernel-ib-devel. Because if the kernel modules are being replaced (or new modules are being added), shouldn't the Module.symvers file also be updated ? Regards, Ram From kliteyn at dev.mellanox.co.il Tue Oct 31 04:42:00 2006 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 31 Oct 2006 14:42:00 +0200 Subject: [openib-general] [PATCH] osm: fixing broken compilation on Windows Message-ID: <45474498.8070302@dev.mellanox.co.il> Fixing a broken compilation on windows (problems with data types). Signed-off-by: Yevgeny Kliteynik Index: opensm/osm_ucast_file.c =================================================================== --- opensm/osm_ucast_file.c (revision 10009) +++ opensm/osm_ucast_file.c (working copy) @@ -113,7 +113,7 @@ static void add_lid_hops(osm_opensm_t *p uint8_t hops[], unsigned len) { uint16_t new_lid; - unsigned i; + uint8_t i; new_lid = guid ? remap_lid(p_osm, lid, guid) : lid; if (len > osm_switch_get_num_ports(p_sw)) @@ -370,7 +370,8 @@ static int do_lid_matrix_file_load(void file_name, lineno, p); return -1; } - lid = num; + /* We have just checked the range, so casting is safe */ + lid = (uint16_t)num; p = q; while (isspace(*p) || *p == ':') p++; @@ -383,7 +384,8 @@ static int do_lid_matrix_file_load(void file_name, lineno, p); return -1; } - hops[len++] = num; + /* We have just checked the range, so casting is safe */ + hops[len++] = (uint8_t)num; p = q; while (isspace(*p)) p++; From ogerlitz at voltaire.com Tue Oct 31 04:45:38 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 31 Oct 2006 14:45:38 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <45466770.9050107@ichips.intel.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> <4545F8E5.2000003@voltaire.com> <45463535.2050302@ichips.intel.com> <45466770.9050107@ichips.intel.com> Message-ID: <45474572.8080109@voltaire.com> Sean Hefty wrote: > Sean Hefty wrote: >>> One of the systems kernel is actually 2.6.19-rc3 and patches 1-7 (ie >>> not roland's tree) and i see there some issues also with ip multicast >>> over ipoib. I will move to use the same kernel config (roland's tree >>> and patches 1-7), then test ipoib and only then mckey, will let you >>> know. >>> >>> Will you have the chance to test ipoib multicast and mckey over this >>> config at your environment? > > This seemed to work fine for me in loopback mode. I'm updating another > test system to check between systems now. OK, ip multicast through ipoib works fine in my config, rping also works fine. However, mckey does not work. The receiver never polls anything from its cq. Adding some debug prints at librdmacm and mckey everything seems to be fine (both sender and receiver use same/correct mgid/mlid, the sender when creating its ah and the receiver when joining etc). The only thing i found was that the remote_qpn used for post sending was not 0xffffff but fixing this did not help. Below are some prints from the receiver, the sender and the SM, where i can see that both ports joined the same/correct mgid. Please let me know if you manage to get mckey working in non loopback mode and if yes, if you have an idea how can i further debug my config. thanks, Or. > root at excell01 librdmacm]# /home/ogerlitz/ib1.1/bin/mckey -m 224.5.5.5 > mckey: starting server > mckey: joining > rdma_create_qp: created qp 0x505b00 > ucma_process_join attaching to mlid c008 mgid ff12:4001:ffff:0000:0000:0000:0005:0505 > ucma_process_join: calling ibv_attach_mcast qp 0x505b00 > rdma_get_cm_event: ucma_process_join returned 0 > join handler, params: dlid c008 > join handler, params: qp_num ffffff > receiving data transfers > > [root at excell02 src]# /home/ogerlitz/ib1.1/bin/mckey -m 224.5.5.5 -s -C 10240 -S 1024 > mckey: starting client > mckey: joining > rdma_create_qp: created qp 0x503c80 > ucma_process_join attaching to mlid c008 mgid ff12:4001:ffff:0000:0000:0000:0005:0505 > ucma_process_join: calling ibv_attach_mcast qp 0x503c80 > rdma_get_cm_event: ucma_process_join returned 0 > join handler, params: dlid c008 > join handler, params: qp_num ffffff > initiating data transfers > data transfers complete > test complete > return status 0 SM prints > MCG MLID 0xC008 MGID 0xff124001ffff00000000000000050505 > 0x0008f104039708dd 1 0x0003 0x0008f104039708dc 0008f104039708dc 0.0.0.0 > 0x0008f104039708fd 1 0x0007 0x0008f104039708fc 0008f104039708fc 0.0.0.0 From ogerlitz at voltaire.com Tue Oct 31 04:50:48 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 31 Oct 2006 14:50:48 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <20061031115017.GF2387@mellanox.co.il> References: <4547308F.2030708@voltaire.com> <20061031115017.GF2387@mellanox.co.il> Message-ID: <454746A8.1040604@voltaire.com> Michael S. Tsirkin wrote: >> Michael S. Tsirkin wrote: >>> Quoting r. Or Gerlitz : >>> The race happens on module unload - you might be inside the cm callback when >>> the module is unloaded. Nothing the module itself does can help here - you must >>> synchronize with the cm before unloading. >> I think to understand: you say that the CM can call the callback while >> the module unloads. However, my point is that the cm consumer module >> must destroy its cm id before unloading and that the cm id destroy code >> would block till all inflight callbacks on this id are done. Similarly >> to destroy_timer_sync or whatever it is called. >> Am i still missing something? > Yes, you miss the case where you do not destroy cm id explicitly, but rather > return error code from callback instead of destroying the cm_id. So the only case for which all this registration api/code at the ib_sa ib_cm and ib_addr (is it also in the ib_mad) protects against is where the consumer wants to destroy its ID by returning non zero from a callback and not by an explicit call to XXX_destory_id() ??? If yes, this seems to me as one big over-doing, assuming the consumer always either call XXX_destory_id() OR returns non zero from a callback on this ID, there must be away to avoid the race within the ID provider module, so at least the api can be saved... Or. From mst at mellanox.co.il Tue Oct 31 05:02:21 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 15:02:21 +0200 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <454746A8.1040604@voltaire.com> References: <4547308F.2030708@voltaire.com> <20061031115017.GF2387@mellanox.co.il> <454746A8.1040604@voltaire.com> Message-ID: <20061031130221.GH2387@mellanox.co.il> Quoting r. Or Gerlitz : > there must be away to avoid the race within the ID provider module, There's no other way - go over the archives. > so at least the api can be saved... See Documentation/stable_api_nonsense.txt -- MST From sashak at voltaire.com Tue Oct 31 05:31:30 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 31 Oct 2006 15:31:30 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031051113.GE4622@mellanox.co.il> References: <20061030211828.GD12259@sashak.voltaire.com> <20061031043053.GA4622@mellanox.co.il> <20061031045902.GB17784@sashak.voltaire.com> <20061031051113.GE4622@mellanox.co.il> Message-ID: <20061031133130.GA18776@sashak.voltaire.com> On 07:11 Tue 31 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > On 06:30 Tue 31 Oct , Michael S. Tsirkin wrote: > > > Quoting r. Sasha Khapyorsky : > > > > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > > > On 13:44 Mon 30 Oct , Michael S. Tsirkin wrote: > > > > > Quoting r. Sasha Khapyorsky : > > > > > > Subject: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > > > > > > > > > > > > > > > node_desc.description buffer is received from the network and should > > > > > > not be NULL-terminated. In such cases using it as regular string in > > > > > > functions like strcmp() or printf() leads to segmentation faults. > > > > > > This patch fixes such usages. > > > > > > > > > > > > Signed-off-by: Sasha Khapyorsky > > > > > > --- > > > > > > diags/src/saquery.c | 22 ++++++++++++++++------ > > > > > > 1 files changed, 16 insertions(+), 6 deletions(-) > > > > > > > > > > > > diff --git a/diags/src/saquery.c b/diags/src/saquery.c > > > > > > index 5b4a85e..f5b23fd 100644 > > > > > > --- a/diags/src/saquery.c > > > > > > +++ b/diags/src/saquery.c > > > > > > @@ -90,17 +90,21 @@ static void > > > > > > print_node_desc(ib_node_record_t *node_record) > > > > > > { > > > > > > ib_node_info_t *p_ni = &(node_record->node_info); > > > > > > + ib_node_desc_t *p_nd = &(node_record->node_desc); > > > > > > if (p_ni->node_type == IB_NODE_TYPE_CA) > > > > > > { > > > > > > + char desc[sizeof(p_nd->description) + 1]; > > > > > > + memcpy(desc, p_nd->description, sizeof(p_nd->description)); > > > > > > + desc[sizeof(desc) - 1] = '\0'; > > > > > > printf("%6d \"%s\"\n", > > > > > > - cl_ntoh16(node_record->lid), > > > > > > - node_record->node_desc.description); > > > > > > + cl_ntoh16(node_record->lid), desc); > > > > > > } > > > > > > } > > > > > > > > > > Would it not be simpler, and cleaner, to limit the string width in printf: > > > > > printf("%6d \"%.*s\"\n", > > > > > cl_ntoh16(node_record->lid), > > > > > sizeof(desc), > > > > > node_record->node_desc.description); > > > > > > > > This would be simpler. However some web searching shows that not all > > > > printf() implementation permits not null terminated arrays even when > > > > precision is specified (some issues were reported even with glibc-2.3.2). > > > > > > Hmm, couldn't find it. > > > > Look at this for example: > > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html > > Hmm, yea. Do you understand the answer there? > Does not make sense to me ... I less care about the answer, but more about valgrind output (btw cannot see this with glibc-2.5) and similar "corner case" issues. Sasha From halr at voltaire.com Tue Oct 31 05:28:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 31 Oct 2006 08:28:15 -0500 Subject: [openib-general] [PATCH] osm: fixing broken compilation on Windows In-Reply-To: <45474498.8070302@dev.mellanox.co.il> References: <45474498.8070302@dev.mellanox.co.il> Message-ID: <1162301205.15895.124768.camel@hal.voltaire.com> On Tue, 2006-10-31 at 07:42, Yevgeny Kliteynik wrote: > Fixing a broken compilation on windows (problems with data types). > > Signed-off-by: Yevgeny Kliteynik Thanks. Applied. -- Hal From sashak at voltaire.com Tue Oct 31 05:50:06 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 31 Oct 2006 15:50:06 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031051322.GC21576@obsidianresearch.com> References: <20061030211828.GD12259@sashak.voltaire.com> <20061031043053.GA4622@mellanox.co.il> <20061031045902.GB17784@sashak.voltaire.com> <20061031051322.GC21576@obsidianresearch.com> Message-ID: <20061031135006.GB18776@sashak.voltaire.com> On 22:13 Mon 30 Oct , Jason Gunthorpe wrote: > On Tue, Oct 31, 2006 at 06:59:02AM +0200, Sasha Khapyorsky wrote: > > > > > This would be simpler. However some web searching shows that not all > > > > printf() implementation permits not null terminated arrays even when > > > > precision is specified (some issues were reported even with glibc-2.3.2). > > > > > > Hmm, couldn't find it. > > > > Look at this for example: > > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html > > Interestingly SUSv3 (aka POSIX 2001) only requires the terminating > null if the precision is longer than the array size. > > s > The argument shall be a pointer to an array of char. Bytes from > the array shall be written up to (but not including) any > terminating null byte. If the precision is specified, no more than > that many bytes shall be written. If the precision is not specified or > is greater than the size of the array, the application shall ensure > that the array contains a null byte. The man page printf.3 from man-pages-2.41 states this even more exlicitely: "if a precision is specified, no more than the number specified are written. If a precision is given, no null byte need be present;" Sasha From mst at mellanox.co.il Tue Oct 31 05:46:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 15:46:45 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031133130.GA18776@sashak.voltaire.com> References: <20061031133130.GA18776@sashak.voltaire.com> Message-ID: <20061031134645.GA4536@mellanox.co.il> Quoting r. Sasha Khapyorsky : > > > Look at this for example: > > > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html > > > > Hmm, yea. Do you understand the answer there? > > Does not make sense to me ... > > I less care about the answer, but more about valgrind output (btw > cannot see this with glibc-2.5) and similar "corner case" issues. Well, I understand what you are saying, but whether a work-around is worth it depends on whether the issue is still relevant in distros in use today - the message you quote is from 2005. After all, you never know whether some other piece of code triggers a compiler bug in some rare case on some outdated distro. Can you run valgrind on the simple test and check? In any case, I think we need a commment so we can clean this up sometime in the future. -- MST From ogerlitz at voltaire.com Tue Oct 31 05:52:03 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 31 Oct 2006 15:52:03 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <45474572.8080109@voltaire.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> <4545F8E5.2000003@voltaire.com> <45463535.2050302@ichips.intel.com> <45466770.9050107@ichips.intel.com> <45474572.8080109@voltaire.com> Message-ID: <45475503.7050707@voltaire.com> Or Gerlitz wrote: > Sean Hefty wrote: >> Sean Hefty wrote: >>>> One of the systems kernel is actually 2.6.19-rc3 and patches 1-7 (ie >>>> not roland's tree) and i see there some issues also with ip multicast >>>> over ipoib. I will move to use the same kernel config (roland's tree >>>> and patches 1-7), then test ipoib and only then mckey, will let you >>>> know. >>>> >>>> Will you have the chance to test ipoib multicast and mckey over this >>>> config at your environment? >> This seemed to work fine for me in loopback mode. I'm updating another >> test system to check between systems now. > > OK, ip multicast through ipoib works fine in my config, rping also works > fine. However, mckey does not work. The receiver never polls anything > from its cq. > > Adding some debug prints at librdmacm and mckey everything seems to be > fine (both sender and receiver use same/correct mgid/mlid, the sender > when creating its ah and the receiver when joining etc). > > The only thing i found was that the remote_qpn used for post sending was > not 0xffffff but fixing this did not help. > > Below are some prints from the receiver, the sender and the SM, where i > can see that both ports joined the same/correct mgid. > > Please let me know if you manage to get mckey working in non loopback > mode and if yes, if you have an idea how can i further debug my config. I see now that if i am making the mckey sender also poll the cq, it never gets any completion! I have verified now that udaddy works fine, so basically the libibverbs IB UD (udadday) and RC (rping) support works well for the librdmacm examples. Or. From krause at cup.hp.com Tue Oct 31 06:32:04 2006 From: krause at cup.hp.com (Michael Krause) Date: Tue, 31 Oct 2006 06:32:04 -0800 Subject: [openib-general] ibstatus support for speed In-Reply-To: <1162248170.15895.88897.camel@hal.voltaire.com> References: <1162244902.15895.86684.camel@hal.voltaire.com> <1162245696.15895.87198.camel@hal.voltaire.com> <6.2.0.14.2.20061030142220.09c150b8@esmail.cup.hp.com> <1162248170.15895.88897.camel@hal.voltaire.com> Message-ID: <6.2.0.14.2.20061031062800.09d58088@esmail.cup.hp.com> At 02:43 PM 10/30/2006, Hal Rosenstock wrote: >On Mon, 2006-10-30 at 17:29, Michael Krause wrote: > > At 02:05 PM 10/30/2006, Roland Dreier wrote: > > > Hal> So rate = speed * width ? > > > > > >Yes, you should see the right think on DDR systems etc. > > > > Strange. Bandwidth = signaling rate * width. This of course is raw > > bandwidth prior to encoding, protocol, etc. overheads which will derate > the > > effective application bandwidth minimally be 20-25%. > >Yes of course. It's just a simple diagnostic to display the width and >speed simply. > > > If the goal is > > provide a true indication of the maximum peak bandwidth that an > application > > might see, > >That's not the goal of this simplistic tool. > > > then stating 10 Gbps for an IB x4 SDR is clearly a > > misrepresentation and out of alignment with other networking links such as > > Ethernet which customers understand its bandwidth to be minimally after > the > > encoding, etc. is removed from the equation. The perpetual trend by > > marketing to use 10 Gbps IB as equivalent to 10 Gbps of application > data is > > actually detrimental not beneficial when it comes to customers. It > > inevitably leads to the question of why the application is not achieving > > the stated bandwidth, i.e. why it is say 700-800MB/s theoretical peak > for a > > x4 while a 10 GbE is 1 GB/s peak. So much marketing hype has gone forward > > already. I realize I'm tilting at windmills but if you are to provide a > > tool that is supposed to project the maximum bandwidth possible and given > > the goal of OFA is to provide as much conceptual commonality with existing > > network stacks / links, then it would be beneficial to have this move > > towards a much more apple-to-apple communication of information. I > know it > > would certainly help with having to repeatedly explain why IB 10 Gbps is > > not the same as 10 GbE to customers and analysts. > >Agreed but this is a different issue from what the tool is for. Understood. >IMO this issue largely started when IB decided to use the signalling >rate rather than the data rate like most other networks. Blame it on marketroids who were more concerned about their naive attempt to look better than other technology and not about customers or the people who have to continually explain how their drivel is simply wrong. Unfortunately, these same marketroids continual to perpetuate this message even now with their apple-to-orange comparisons. Annoys customers who when educated end up with a slightly less favorable opinion of the technology. Mike From sashak at voltaire.com Tue Oct 31 06:54:37 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 31 Oct 2006 16:54:37 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031134645.GA4536@mellanox.co.il> References: <20061031133130.GA18776@sashak.voltaire.com> <20061031134645.GA4536@mellanox.co.il> Message-ID: <20061031145437.GE18776@sashak.voltaire.com> On 15:46 Tue 31 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > > > Look at this for example: > > > > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html > > > > > > Hmm, yea. Do you understand the answer there? > > > Does not make sense to me ... > > > > I less care about the answer, but more about valgrind output (btw > > cannot see this with glibc-2.5) and similar "corner case" issues. > > Well, I understand what you are saying, but whether a work-around is worth it > depends on whether the issue is still relevant in distros in use today as well as with VC++/unknown DDK version? > - the > message you quote is from 2005. After all, you never know whether some other > piece of code triggers a compiler bug in some rare case on some outdated distro. Right, but this one is "known". > Can you run valgrind on the simple test and check? Yes, it was fine with glibc-2.5. > In any case, I think we need a commment so we can clean this up > sometime in the future. IMO it is better to just truncate original buffer by putting '\0' at the end and don't think too much about "possible impacts", but feel free to submit the patch. Sasha From mst at mellanox.co.il Tue Oct 31 07:06:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 17:06:50 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031145437.GE18776@sashak.voltaire.com> References: <20061031145437.GE18776@sashak.voltaire.com> Message-ID: <20061031150650.GB4536@mellanox.co.il> Quoting r. Sasha Khapyorsky : > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > On 15:46 Tue 31 Oct , Michael S. Tsirkin wrote: > > Quoting r. Sasha Khapyorsky : > > > > > Look at this for example: > > > > > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html > > > > > > > > Hmm, yea. Do you understand the answer there? > > > > Does not make sense to me ... > > > > > > I less care about the answer, but more about valgrind output (btw > > > cannot see this with glibc-2.5) and similar "corner case" issues. > > > > Well, I understand what you are saying, but whether a work-around is worth it > > depends on whether the issue is still relevant in distros in use today > > as well as with VC++/unknown DDK version? Again, should be fine according to VC++ docs. Whether there's a bug in this function, or some other function, is an open question but I don't really see why make an exception here. > > > - the > > message you quote is from 2005. After all, you never know whether some other > > piece of code triggers a compiler bug in some rare case on some outdated distro. > > Right, but this one is "known". > > > Can you run valgrind on the simple test and check? > > Yes, it was fine with glibc-2.5. > > > In any case, I think we need a commment so we can clean this up > > sometime in the future. > > IMO it is better to just truncate original buffer by putting '\0' at the > end and don't think too much about "possible impacts", but feel free to > submit the patch. I would just use the %.*s precision and not worry - its probably fixed in all distros already. -- MST From vlad at dev.mellanox.co.il Tue Oct 31 07:31:33 2006 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 31 Oct 2006 17:31:33 +0200 Subject: [openib-general] OFED 1.1 Build Issue In-Reply-To: <454742C2.2050900@silverstorm.com> References: <45470D59.7020705@dev.mellanox.co.il> <45472202.9080104@voltaire.com> <454742C2.2050900@silverstorm.com> Message-ID: <45476C55.1080300@dev.mellanox.co.il> Ramachandra K wrote: > Moni Shoua wrote: > >> We already tried to go this way and found that a local Module.symvers >> is not always generated (but we might have missed something though). >> I suggest that you check that this alternative way works under all >> OSs compilation (SuSE and RedHat to be precise)... >> >> > I think Module.symvers generation for external modules was added sometime > around 2.6.16, so its not generated on the older kernels (for eg 2.6.9 > kernels > on RHEL) > > In this scenario, when there is no Module.symvers file, I guess the other > option is to use a single Kbuild file to build both modules, > as explained in section 7.3 of Documentation/kbuild/modules.txt. > > But this may not be feasible always. Come to think of it, why does the > OFED installation procedure not update the kernel Module.symvers file > when it replaces the old kernel modules present in /lib/modules/ > with the new ones ? > >> BTW, Why not updating the kernel Module.symvers when kernel-ib-devel >> is installed? This will free the developer from copying it to >> his/hers private directory. >> >> > It might be a good idea to update the Module.symvers file as part of the > normal installation and not only kernel-ib-devel. Because if the kernel > modules are being replaced (or new modules are being added), shouldn't > the Module.symvers file also be updated ? > Regards, > Ram Agree, Module.symvers should be updated by kernel-ib RPM. So, need to implement Moni's suggestion with light changes: update kernel-ib RPM %post and %preun sections instead of kernel-ib-devel RPM %pre and %postun. Regards, Vladimir From tziporet at mellanox.co.il Tue Oct 31 08:19:11 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 31 Oct 2006 18:19:11 +0200 Subject: [openib-general] OFED support page on Wiki Message-ID: <6C2C79E72C305246B504CBA17B5500C92ACFC7@mtlexch01.mtl.com> Hi, I have added OFED support page on the Wiki: https://openib.org/tiki/tiki-index.php?page=OFED+Support and added a link to this page from the Wiki home. And I am going to add few more items in the coming days. Fill free to add/update with more issues/howtos/etc. Tziporet From mshefty at ichips.intel.com Tue Oct 31 08:28:47 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 08:28:47 -0800 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <45474572.8080109@voltaire.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> <4545F8E5.2000003@voltaire.com> <45463535.2050302@ichips.intel.com> <45466770.9050107@ichips.intel.com> <45474572.8080109@voltaire.com> Message-ID: <454779BF.2080703@ichips.intel.com> Or Gerlitz wrote: > Please let me know if you manage to get mckey working in non loopback > mode and if yes, if you have an idea how can i further debug my config. I did get mckey working fine in non loopback mode. >> root at excell01 librdmacm]# /home/ogerlitz/ib1.1/bin/mckey -m 224.5.5.5 >> [root at excell02 src]# /home/ogerlitz/ib1.1/bin/mckey -m 224.5.5.5 -s -C >> 10240 -S 1024 You need to use the same message parameters (count and size) for both sender and receiver. - Sean From mshefty at ichips.intel.com Tue Oct 31 08:49:05 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 08:49:05 -0800 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <454746A8.1040604@voltaire.com> References: <4547308F.2030708@voltaire.com> <20061031115017.GF2387@mellanox.co.il> <454746A8.1040604@voltaire.com> Message-ID: <45477E81.3040205@ichips.intel.com> Or Gerlitz wrote: > So the only case for which all this registration api/code at the ib_sa > ib_cm and ib_addr (is it also in the ib_mad) protects against is where > the consumer wants to destroy its ID by returning non zero from a > callback and not by an explicit call to XXX_destory_id() ib_sa and ib_addr are similar. Both simply callback the user, and once the callback completes, the module can unload. Users of those modules cannot protect against a thread running in their callback. ib_mad already requires registration, so does not have this issue. > If yes, this seems to me as one big over-doing, assuming the consumer > always either call XXX_destory_id() OR returns non zero from a callback > on this ID, there must be away to avoid the race within the ID provider > module, so at least the api can be saved... As long as the user can destroy a cm_id from their callback, the ib_cm and rdma_cm have this issue. This feature ends up being fairly useful, so I'm hesitant to remove it. The alternative is that a user must always call xxx_destroy_id(), but that cannot be done from within the callback thread itself. This would require a user to schedule a thread to call destroy, which may not always be possible. (Consider the case where the cm creates a new id as part of a connection request. For the user to schedule the destruction, it would need to queue the new cm_id somewhere, which may not be possible.) - Sean From sashak at voltaire.com Tue Oct 31 08:56:53 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 31 Oct 2006 18:56:53 +0200 Subject: [openib-general] [PATCH] diags/saquery: fix node_desc.description as string usages In-Reply-To: <20061031150650.GB4536@mellanox.co.il> References: <20061031145437.GE18776@sashak.voltaire.com> <20061031150650.GB4536@mellanox.co.il> Message-ID: <20061031165653.GB19213@sashak.voltaire.com> On 17:06 Tue 31 Oct , Michael S. Tsirkin wrote: > Quoting r. Sasha Khapyorsky : > > Subject: Re: [PATCH] diags/saquery: fix node_desc.description as string usages > > > > On 15:46 Tue 31 Oct , Michael S. Tsirkin wrote: > > > Quoting r. Sasha Khapyorsky : > > > > > > Look at this for example: > > > > > > http://sourceware.org/ml/bug-glibc/2005-02/msg00123.html > > > > > > > > > > Hmm, yea. Do you understand the answer there? > > > > > Does not make sense to me ... > > > > > > > > I less care about the answer, but more about valgrind output (btw > > > > cannot see this with glibc-2.5) and similar "corner case" issues. > > > > > > Well, I understand what you are saying, but whether a work-around is worth it > > > depends on whether the issue is still relevant in distros in use today > > > > as well as with VC++/unknown DDK version? > > Again, should be fine according to VC++ docs. Maybe, never seen. > Whether there's a bug in this function, or some other function, is > an open question but I don't really see why make an exception here. > > > > > > - the > > > message you quote is from 2005. After all, you never know whether some other > > > piece of code triggers a compiler bug in some rare case on some outdated distro. > > > > Right, but this one is "known". > > > > > Can you run valgrind on the simple test and check? > > > > Yes, it was fine with glibc-2.5. > > > > > In any case, I think we need a commment so we can clean this up > > > sometime in the future. > > > > IMO it is better to just truncate original buffer by putting '\0' at the > > end and don't think too much about "possible impacts", but feel free to > > submit the patch. > > I would just use the %.*s precision and not worry - its probably fixed in all > distros already. Feel free to submit the patch. I don't want to spend time for this now and for possible "compatibility issues" later. Sasha From mshefty at ichips.intel.com Tue Oct 31 08:51:07 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 08:51:07 -0800 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <20061031095303.GB2387@mellanox.co.il> References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> <20061031095303.GB2387@mellanox.co.il> Message-ID: <45477EFB.5020807@ichips.intel.com> Michael S. Tsirkin wrote: > ib_cm has this bug as well. Shouldn't we patch it for 2.6.19 too? Yes - I've started on patches for the ib_cm and rdma_cm as well. They just aren't quite so straightforward to fix. - Sean From chris_youb at yahoo.ca Tue Oct 31 09:13:27 2006 From: chris_youb at yahoo.ca (Chris Youb) Date: Tue, 31 Oct 2006 12:13:27 -0500 (EST) Subject: [openib-general] OFED SRP initiator always sends CM REJ in response to CM REP Message-ID: <20061031171327.21998.qmail@web52106.mail.yahoo.com> Abstract: We are developing SRP target code and testing it with the OFED 1.1 SRP initiator. The OFED SRP initiator sends us a CM REQ (IB 12.6.5) and we respond with CM REP (12.6.8). However, instead of the expected CM RTU (12.6.9) we ALWAYS receive a CM REJ (12.6.7) with status 0x1C == Reason 28 (12.6.7.2). Software Setup: - SUSE 10.0 - OFED 1.1 - Mellanox card with 3.5.00 firmware Details: Initially we suspected our response values in the CM REP packet. There is nothing obvious to us, and anything we weren't sure about we tried a number of combinations. This applies to the SRP private data as well. We also took a look at ./openib-1.1/drivers/infiniband/core/ cm.c and cma.c. The function cma.c:cma_rep_recv looked like a possibility but there's little debug output. Aside from putting in printk's and recompiling and installing is there an easier way to debug? Included below is the output from the initiator and packet dumps of the 3 packets. -------------------------------------- ./ibsrpdm -cv id_ext=200601045300030B,ioc_guid=00045381300030b2,dgid=fe8000000000000000045381300030b2,pkey=ffff,service_id=200601045300030b id_ext=2000020453000011,ioc_guid=00045381300030b2,dgid=fe8000000000000000045381300030b2,pkey=ffff,service_id=2000020453000011 cycl-247:/usr/local/ofed/sbin # echo id_ext=2000020453000011,ioc_guid=00045381300030b2,dgid=fe8000000000000000045381300030b2,pkey=ffff,service_id=2000020453000011 >/sys/class/infiniband_srp/srp-mthca0-1/add_target -------------------------------------- *** CM REQ *** received MAD [QP1]: struct HdrLRH (8 bytes) - Local Route Header (section 7.7) { VL: 0x0 (4 bit uint) LVer: 0x0 (4 bit uint) SL: 0x0 (4 bit uint) rsv0: 0x0 (2 bit uint) LNH: 0x2 (2 bit uint) DLID: 0x0004 (16 bit uint) rsv1: 0x00 (5 bit uint) pktLen: 0x048 (11 bit uint) SLID: 0x0104 (16 bit uint) } MAD: struct CMFormat (256 bytes) - Request for Communication (section 16.7.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x07 (8 bit uint) classVersion: 0x02 (8 bit uint) method: 0x03 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x0000003BE64D1C3E (64 bit uint) attributeID: 0x0010 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } data: struct CMREQ (232 bytes) - Request for Communication (section 12.6.5) { LCID: 0x3E1C4DE6 (32 bit uint) rsv0: 0x00000000 (32 bit uint) serviceID: 0x2000020453000011 (64 bit uint) LGUID: 0x00066A0098005B37 (64 bit uint) localCMQKey: 0x00000000 (32 bit uint) localQKey: 0x00000000 (32 bit uint) localQPN: 0x180014 (24 bit uint) responderResources: 0x04 (8 bit uint) localEECN: 0x000000 (24 bit uint) initiatorDepth: 0x00 (8 bit uint) remoteEECN: 0x000000 (24 bit uint) remoteResponseTimeout: 0x14 (5 bit uint) transportService: 0x0 (2 bit uint) flowControl: 0x1 (1 bit uint) startingPSN: 0x1D4319 (24 bit uint) localResponseTimeout: 0x14 (5 bit uint) retryCount: 0x7 (3 bit uint) PKey: 0xFFFF (16 bit uint) pathPacketMTU: 0x4 (4 bit uint) RDCExists: 0x0 (1 bit uint) RNRRetryCount: 0x7 (3 bit uint) maxCMRetries: 0xF (4 bit uint) SRQ: 0x0 (1 bit uint) rsv1: 0x0 (3 bit uint) primaryPath: struct CMPath (44 bytes) - Path Information (section 12.6) { SLID: 0x0104 (16 bit uint) DLID: 0x0004 (16 bit uint) SGID: FE80:0:0:0:6:6A00:A000:5B37 (HdrIPv6Addr) DGID: FE80:0:0:0:4:5381:3000:30B2 (HdrIPv6Addr) flowLabel: 0x00000 (20 bit uint) rsv0: 0x0 (4 bit uint) rsv1: 0x0 (2 bit uint) packetRate: 0x02 (6 bit uint) TClass: 0x00 (8 bit uint) hopLimit: 0x00 (8 bit uint) SL: 0x0 (4 bit uint) subnetLocal: 0x1 (1 bit uint) rsv2: 0x0 (3 bit uint) localACKTimeout: 0x13 (5 bit uint) rsv3: 0x0 (3 bit uint) } alternatePath: struct CMPath (44 bytes) - Path Information (section 12.6) { SLID: 0x0000 (16 bit uint) DLID: 0x0000 (16 bit uint) SGID: 0:0:0:0:0:0:0:0 (HdrIPv6Addr) DGID: 0:0:0:0:0:0:0:0 (HdrIPv6Addr) flowLabel: 0x00000 (20 bit uint) rsv0: 0x0 (4 bit uint) rsv1: 0x0 (2 bit uint) packetRate: 0x00 (6 bit uint) TClass: 0x00 (8 bit uint) hopLimit: 0x00 (8 bit uint) SL: 0x0 (4 bit uint) subnetLocal: 0x0 (1 bit uint) rsv2: 0x0 (3 bit uint) localACKTimeout: 0x00 (5 bit uint) rsv3: 0x0 (3 bit uint) } privateData: raw data in hex (92 bytes) { 00000000 00000000 00000000 00000000 00000104 00000000 00060000 00000000 00000000 00000000 00066A00 A0005B37 20000204 53000011 00045381 300030B2 00000000 00000000 00000000 00000000 00000000 00000000 00000000 } } } 0302070100000000 3b0000003e1c4de6 0000100000000000 e64d1c3e00000000 0402002011000053 006a0600375b0098 0000000000000000 0414001800000000 a1000000a719431d f047ffff04000401 000080fe00000000 006a0600375b00a0 000080fe00000000 81530400b2300030 0200000098080000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000004010000 0000000000000600 0000000000000000 00000000006a0600 375b00a004020020 1100005381530400 b230003000000000 0000000000000000 0000000000000000 0000000000000000 -------------------------------------- *** CM REP *** sending MAD [QP1]: MAD: struct CMFormat (256 bytes) - Request for Communication (section 16.7.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x07 (8 bit uint) classVersion: 0x02 (8 bit uint) method: 0x03 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x0000003BE64D1C3E (64 bit uint) attributeID: 0x0013 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } data: struct CMREP (232 bytes) - Reply To Request For Communication (section 12.6.8) { LCID: 0xE118A61B (32 bit uint) RCID: 0x3E1C4DE6 (32 bit uint) localQKey: 0x00000000 (32 bit uint) localQPN: 0x000408 (24 bit uint) rsv0: 0x00 (8 bit uint) localEEContext: 0x000000 (24 bit uint) rsv1: 0x00 (8 bit uint) startingPSN: 0x3B3163 (24 bit uint) rsv2: 0x00 (8 bit uint) responderResources: 0x04 (8 bit uint) initiatorDepth: 0x07 (8 bit uint) targetACKDelay: 0x1F (5 bit uint) failoverAccepted: 0x0 (2 bit uint) flowControl: 0x1 (1 bit uint) RNRRetryCount: 0x7 (3 bit uint) rsv3: 0x00 (5 bit uint) LGUID: 0x00045381300030B0 (64 bit uint) privateData: raw data in hex (196 bytes) { C0000000 00000064 00000000 00000000 00000104 00000104 00060000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 } } } 0302070100000000 3b0000003e1c4de6 0000130000000000 1ba618e1e64d1c3e 0000000000080400 000000000063313b e0f9070481530400 b0300030000000c0 6400000000000000 0000000004010000 0401000000000600 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 -------------------------------------- *** CM REJ *** received MAD [QP1]: struct HdrLRH (8 bytes) - Local Route Header (section 7.7) { VL: 0x0 (4 bit uint) LVer: 0x0 (4 bit uint) SL: 0x0 (4 bit uint) rsv0: 0x0 (2 bit uint) LNH: 0x2 (2 bit uint) DLID: 0x0004 (16 bit uint) rsv1: 0x00 (5 bit uint) pktLen: 0x048 (11 bit uint) SLID: 0x0104 (16 bit uint) } MAD: struct CMFormat (256 bytes) - Request for Communication (section 16.7.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x07 (8 bit uint) classVersion: 0x02 (8 bit uint) method: 0x03 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x0000003BE64D1C3E (64 bit uint) attributeID: 0x0012 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } data: struct CMREJ (232 bytes) - Reject (section 12.6.7) { LCID: 0x3E1C4DE6 (32 bit uint) RCID: 0xE118A61B (32 bit uint) messageRejected: 0x1 (2 bit uint) rsv0: 0x00 (6 bit uint) rejectInfoLength: 0x00 (7 bit uint) rsv1: 0x0 (1 bit uint) reason: 0x001C (16 bit uint) ARI: raw data in hex (72 bytes) { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 } privateData: raw data in hex (148 bytes) { 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 } } } 0302070100000000 3b0000003e1c4de6 0000120000000000 e64d1c3e1ba618e1 1c00004000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 -------------------------------------- --------------------------------- Make free worldwide PC-to-PC calls. Try the new Yahoo! Canada Messenger with Voice -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahomike at us.ibm.com Tue Oct 31 09:22:34 2006 From: ahomike at us.ibm.com (Mike Aho) Date: Tue, 31 Oct 2006 11:22:34 -0600 Subject: [openib-general] psm.h not found Message-ID: I cannot find psm.h which header file mtl_psm.h calls out in ompi v1.2 12372. Any hints on where I would get that? Thanks. --Mike Michael E. Aho Roadrunner Communications Stack Interconnect Lead MS: 45E/015-2 (Office D116) Rochester, MN 55901-7829 Phone (507) 253-6222, TL 553-6222 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at cisco.com Tue Oct 31 09:31:51 2006 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 31 Oct 2006 12:31:51 -0500 Subject: [openib-general] psm.h not found In-Reply-To: References: Message-ID: <518D6ED3-F5C4-42B6-8C99-687E81F48CA7@cisco.com> This sounds like a question for the Open MPI mailing list; this list is for OpenIB / OpenFabrics issues. MTL and PSM issues are Open MPI-specific -- they do not have anything to do with OpenIB / OpenFabrics. So I'll reply separately and move your thread over to that list... On Oct 31, 2006, at 12:22 PM, Mike Aho wrote: > > I cannot find psm.h which header file mtl_psm.h calls out in ompi > v1.2 12372. Any hints on where I would get that? Thanks. > > --Mike > Michael E. Aho > Roadrunner Communications Stack Interconnect Lead > MS: 45E/015-2 (Office D116) > Rochester, MN 55901-7829 > Phone (507) 253-6222, TL 553-6222 > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/ > openib-general -- Jeff Squyres Server Virtualization Business Unit Cisco Systems From gshipman at lanl.gov Tue Oct 31 09:56:28 2006 From: gshipman at lanl.gov (Galen M. Shipman) Date: Tue, 31 Oct 2006 10:56:28 -0700 Subject: [openib-general] psm.h not found In-Reply-To: References: Message-ID: <45478E4C.8090909@lanl.gov> Hi Mike, I have copied this to the Open MPI devel list as this is an Open MPI specific question. The PSM MTL in Open MPI does not use the OpenIB verbs api at all. Instead it makes use of the PSM library from QLogic. If you are using the InfiniPath adapter you should be able to use PSM with Open MPI. I would point you toward QLogic support to obtain this library. Thanks, Galen M. Shipman Los Alamos National Labs Mike Aho wrote: > > I cannot find psm.h which header file mtl_psm.h calls out in ompi v1.2 > 12372. Any hints on where I would get that? Thanks. > > --Mike > Michael E. Aho > Roadrunner Communications Stack Interconnect Lead > MS: 45E/015-2 (Office D116) > Rochester, MN 55901-7829 > Phone (507) 253-6222, TL 553-6222 > >------------------------------------------------------------------------ > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From trimmer at silverstorm.com Tue Oct 31 10:18:46 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 31 Oct 2006 13:18:46 -0500 Subject: [openib-general] OFED SRP initiator always sends CM REJ in response to CM REP In-Reply-To: <20061031171327.21998.qmail@web52106.mail.yahoo.com> Message-ID: From: Chris Youb Sent: Tuesday, October 31, 2006 12:13 PM To: openib-general at openib.org Subject: [openib-general] OFED SRP initiator always sends CM REJ in response to CM REP Abstract: We are developing SRP target code and testing it with the OFED 1.1 SRP initiator. The OFED SRP initiator sends us a CM REQ (IB 12.6.5) and we respond with CM REP (12.6.8). However, instead of the expected CM RTU (12.6.9) we ALWAYS receive a CM REJ (12.6.7) with status 0x1C == Reason 28 (12.6.7.2). Software Setup: - SUSE 10.0 - OFED 1.1 - Mellanox card with 3.5.00 firmware Details: Initially we suspected our response values in the CM REP packet. There is nothing obvious to us, and anything we weren't sure about we tried a number of combinations. This applies to the SRP private data as well. We also took a look at ./openib-1.1/drivers/infiniband/core/ cm.c and cma.c. The function cma.c:cma_rep_recv looked like a possibility but there's little debug output. Aside from putting in printk's and recompiling and installing is there an easier way to debug? Chris, I reviewed the packets and found at least 1 problem. The REQ has responder resources=0x4, yet the REP has initiator depth=7. The REP must provide an initiator depth <= the responder resources in the REQ. Some other non-fatal issues: Target Ack Delay is a bit high (0x1f -> 2.4 hours). This will basically cause initiator to compute QP Ack timeouts of 2.4 hours. This value should represent the time internal to the CA from receipt of a message to sending of the ACK. See IBTA 12.7.33 for more info. Todd Rimmer -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Oct 31 10:31:09 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 10:31:09 -0800 Subject: [openib-general] OFED SRP initiator always sends CM REJ in response to CM REP In-Reply-To: <20061031171327.21998.qmail@web52106.mail.yahoo.com> References: <20061031171327.21998.qmail@web52106.mail.yahoo.com> Message-ID: <4547966D.9090403@ichips.intel.com> Chris Youb wrote: > We are developing SRP target code and testing it with the OFED 1.1 SRP > initiator. The OFED SRP initiator sends us a CM REQ (IB 12.6.5) and we > respond with CM REP (12.6.8). However, instead of the expected CM RTU > (12.6.9) we ALWAYS receive a CM REJ (12.6.7) with status 0x1C == Reason > 28 (12.6.7.2). This is IB_CM_REJ_CONSUMER_DEFINED reject reason. It is only sent by the ib_cm in response to an action taken by the user - most likely because the SRP initiator is rejecting the response for some reason. > Initially we suspected our response values in the CM REP packet. There > is nothing obvious to us, and anything we weren't sure about we tried a > number of combinations. This applies to the SRP private data as well. > We also took a look at ./openib-1.1/drivers/infiniband/core/ cm.c and > cma.c. The function cma.c:cma_rep_recv looked like a possibility but > there's little debug output. I don't believe that the OFED SRP initiator uses the CMA. - Sean From halr at voltaire.com Tue Oct 31 10:35:44 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 31 Oct 2006 13:35:44 -0500 Subject: [openib-general] OpenSM unneeded/no longer used header files Message-ID: <1162319741.29957.8140.camel@hal.voltaire.com> The following OpenSM header files appear to be unused: 183 osm_errors.h 230 osm_ft_config_ctrl.h 291 osm_mcast_config_ctrl.h 289 osm_pi_config_ctrl.h 289 osm_pkey_config_ctrl.h 297 osm_sm_info_get_ctrl.h 290 osm_subnet_config_ctrl.h Any objections if they disappear ? -- Hal From rdreier at cisco.com Tue Oct 31 10:43:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 10:43:07 -0800 Subject: [openib-general] OFED SRP initiator always sends CM REJ in response to CM REP In-Reply-To: <20061031171327.21998.qmail@web52106.mail.yahoo.com> (Chris Youb's message of "Tue, 31 Oct 2006 12:13:27 -0500 (EST)") References: <20061031171327.21998.qmail@web52106.mail.yahoo.com> Message-ID: > We are developing SRP target code and testing it with the OFED > 1.1 SRP initiator. The OFED SRP initiator sends us a CM REQ (IB > 12.6.5) and we respond with CM REP (12.6.8). However, instead of > the expected CM RTU (12.6.9) we ALWAYS receive a CM REJ (12.6.7) > with status 0x1C == Reason 28 (12.6.7.2). Seems like it should be pretty easy to debug. You could start by adding some printks to srp_cm_handler() to see if you get a IB_CM_REP_RECEIVED event. If you do get the event, add some printks to the handling of IB_CM_REP_RECEIVED in the case statement to see if any of the things there fail. If the SRP initiator is happy, then you need to add some tracing to the handling of REPs in cm.c to see if something is being rejected there. However, based on the reject code I suspect the SRP initiator is what is rejecting it. - R. From rdreier at cisco.com Tue Oct 31 10:49:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 10:49:02 -0800 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> (Sean Hefty's message of "Mon, 30 Oct 2006 15:52:04 -0800") References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> Message-ID: > @@ -305,6 +330,7 @@ int rdma_resolve_ip(struct sockaddr *src > break; > default: > ret = req->status; > + atomic_dec(&client->refcount); > kfree(req); > break; > } Doesn't this need to be deref_client() here too? Or is there some reason why this can't be the last reference to the client? (BTW, I really find the "deref" name confusing -- it makes me think of dereferencing a pointer, rather than putting a reference. I would use put_client() instead) - R. From rdreier at cisco.com Tue Oct 31 10:52:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 10:52:01 -0800 Subject: [openib-general] [PATCH] IB/mthca: fix MAD extended header format In-Reply-To: <20061030143152.GB1941@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 30 Oct 2006 16:31:52 +0200") References: <20061030143152.GB1941@mellanox.co.il> Message-ID: thanks, applied. I decided to leave the RLID -- it's not hurting anything so let's be conservative for now From mshefty at ichips.intel.com Tue Oct 31 10:57:27 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 10:57:27 -0800 Subject: [openib-general] [PATCH] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: References: <000101c6fc7e$66eb3fd0$ff0da8c0@amr.corp.intel.com> Message-ID: <45479C97.6050402@ichips.intel.com> Roland Dreier wrote: > > @@ -305,6 +330,7 @@ int rdma_resolve_ip(struct sockaddr *src > > break; > > default: > > ret = req->status; > > + atomic_dec(&client->refcount); > > kfree(req); > > break; > > } > > Doesn't this need to be deref_client() here too? Or is there some > reason why this can't be the last reference to the client? This can't be the last reference. For this to be the last reference on the client, the user would have to call rdma_addr_unregister_client() at the same time they call rdma_resolve_ip(), which can't work anyway. > (BTW, I really find the "deref" name confusing -- it makes me think of > dereferencing a pointer, rather than putting a reference. I would use > put_client() instead) That's easy to change. - Sean From rdreier at cisco.com Tue Oct 31 11:02:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 11:02:49 -0800 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: (Roland Dreier's message of "Wed, 25 Oct 2006 07:15:16 -0700") References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> Message-ID: The discussion fizzled out without really reaching a definitive answer, so I'm going to apply the original patch (below), since I pretty much convinced myself that only the driver doing the config access has enough information to fix this reliably. - R. Author: John Partridge Date: Tue Oct 31 11:00:04 2006 -0800 IB/mthca: Make sure all PCI config writes reach device before doing MMIO During initialization, mthca writes some PCI config space registers and then does an MMIO read from one of the BARs it just enabled. This MMIO read sometimes failed and caused a crash on SGI Altix machines, because the PCI-X host bridge (legitimately, according to the PCI spec) allowed the MMIO read to start before the config write completed. To fix this, add a config read after all config writes to make sure they are all done before starting the MMIO read. Signed-off-by: John Partridge Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c index 91934f2..578dc7c 100644 --- a/drivers/infiniband/hw/mthca/mthca_reset.c +++ b/drivers/infiniband/hw/mthca/mthca_reset.c @@ -281,6 +281,20 @@ good: goto out; } + /* + * Perform a "flush" of the PCI config writes here by reading + * the PCI_COMMAND register. This is needed to make sure that + * we don't try to touch other PCI BARs before the config + * writes are done -- otherwise an MMIO cycle could start + * before the config writes are done and reach the HCA before + * the BAR is actually enabled. + */ + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { + err = -ENODEV; + mthca_err(mdev, "Couldn't access HCA memory after restoring, " + "aborting.\n"); + } + out: if (bridge) pci_dev_put(bridge); From sean.hefty at intel.com Tue Oct 31 11:12:59 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 11:12:59 -0800 Subject: [openib-general] [PATCH v2] for 2-6-19 rdma/addr: use client registration to fix module unload race In-Reply-To: Message-ID: <000001c6fd20$950c7440$ff0da8c0@amr.corp.intel.com> Require registration with ib_addr module to prevent caller from unloading while a callback is in progress. Signed-off-by: Sean Hefty --- Changes from v1: Renamed deref_client to put_client. diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c old mode 100644 new mode 100755 index 60d3fbd..e11187e --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -47,6 +47,7 @@ struct addr_req { struct sockaddr src_addr; struct sockaddr dst_addr; struct rdma_dev_addr *addr; + struct rdma_addr_client *client; void *context; void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context); @@ -61,6 +62,26 @@ static LIST_HEAD(req_list); static DECLARE_WORK(work, process_req, NULL); static struct workqueue_struct *addr_wq; +void rdma_addr_register_client(struct rdma_addr_client *client) +{ + atomic_set(&client->refcount, 1); + init_completion(&client->comp); +} +EXPORT_SYMBOL(rdma_addr_register_client); + +static inline void put_client(struct rdma_addr_client *client) +{ + if (atomic_dec_and_test(&client->refcount)) + complete(&client->comp); +} + +void rdma_addr_unregister_client(struct rdma_addr_client *client) +{ + put_client(client); + wait_for_completion(&client->comp); +} +EXPORT_SYMBOL(rdma_addr_unregister_client); + int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, const unsigned char *dst_dev_addr) { @@ -229,6 +250,7 @@ static void process_req(void *data) list_del(&req->list); req->callback(req->status, &req->src_addr, req->addr, req->context); + put_client(req->client); kfree(req); } } @@ -264,7 +286,8 @@ static int addr_resolve_local(struct soc return ret; } -int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr, +int rdma_resolve_ip(struct rdma_addr_client *client, + struct sockaddr *src_addr, struct sockaddr *dst_addr, struct rdma_dev_addr *addr, int timeout_ms, void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context), @@ -285,6 +308,8 @@ int rdma_resolve_ip(struct sockaddr *src req->addr = addr; req->callback = callback; req->context = context; + req->client = client; + atomic_inc(&client->refcount); src_in = (struct sockaddr_in *) &req->src_addr; dst_in = (struct sockaddr_in *) &req->dst_addr; @@ -305,6 +330,7 @@ int rdma_resolve_ip(struct sockaddr *src break; default: ret = req->status; + atomic_dec(&client->refcount); kfree(req); break; } diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2b4748e..d804a4d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -63,6 +63,7 @@ static struct ib_client cma_client = { }; static struct ib_sa_client sa_client; +static struct rdma_addr_client addr_client; static LIST_HEAD(dev_list); static LIST_HEAD(listen_any_list); static DEFINE_MUTEX(lock); @@ -1625,8 +1626,8 @@ int rdma_resolve_addr(struct rdma_cm_id if (cma_any_addr(dst_addr)) ret = cma_resolve_loopback(id_priv); else - ret = rdma_resolve_ip(&id->route.addr.src_addr, dst_addr, - &id->route.addr.dev_addr, + ret = rdma_resolve_ip(&addr_client, &id->route.addr.src_addr, + dst_addr, &id->route.addr.dev_addr, timeout_ms, addr_handler, id_priv); if (ret) goto err; @@ -2217,6 +2218,7 @@ static int cma_init(void) return -ENOMEM; ib_sa_register_client(&sa_client); + rdma_addr_register_client(&addr_client); ret = ib_register_client(&cma_client); if (ret) @@ -2224,6 +2226,7 @@ static int cma_init(void) return 0; err: + rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(cma_wq); return ret; @@ -2232,6 +2235,7 @@ err: static void cma_cleanup(void) { ib_unregister_client(&cma_client); + rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 81b6230..c094e50 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -36,6 +36,22 @@ #include #include #include +struct rdma_addr_client { + atomic_t refcount; + struct completion comp; +}; + +/** + * rdma_addr_register_client - Register an address client. + */ +void rdma_addr_register_client(struct rdma_addr_client *client); + +/** + * rdma_addr_unregister_client - Deregister an address client. + * @client: Client object to deregister. + */ +void rdma_addr_unregister_client(struct rdma_addr_client *client); + struct rdma_dev_addr { unsigned char src_dev_addr[MAX_ADDR_LEN]; unsigned char dst_dev_addr[MAX_ADDR_LEN]; @@ -52,6 +68,7 @@ int rdma_translate_ip(struct sockaddr *a /** * rdma_resolve_ip - Resolve source and destination IP addresses to * RDMA hardware addresses. + * @client: Address client associated with request. * @src_addr: An optional source address to use in the resolution. If a * source address is not provided, a usable address will be returned via * the callback. @@ -64,7 +81,8 @@ int rdma_translate_ip(struct sockaddr *a * or been canceled. A status of 0 indicates success. * @context: User-specified context associated with the call. */ -int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr, +int rdma_resolve_ip(struct rdma_addr_client *client, + struct sockaddr *src_addr, struct sockaddr *dst_addr, struct rdma_dev_addr *addr, int timeout_ms, void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context), From mst at mellanox.co.il Tue Oct 31 11:43:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 21:43:57 +0200 Subject: [openib-general] [PATCH repost] IB/srp: destroy/recreate qp/cq at reconnect In-Reply-To: <20061019195719.GB2674@mellanox.co.il> References: <20061019195719.GB2674@mellanox.co.il> Message-ID: <20061031194357.GC5950@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: [PATCH repost] IB/srp: destroy/recreate qp/cq at reconnect > > From: Ishai Rabinovitz > > This makes SRP more robust in presence of hardware errors > and is closer to behaviour suggested by IB spec, > reducing chance of stale packets. > > Signed-off-by: Ishai Rabinovitz > Signed-off-by: Michael S. Tsirkin > > --- > > Hello, Roland! > What do you think about this? Please consider for 2.6.19. > > For some reason (could be a firmware problem) I got a CQ overrun in SRP. > Because of that there was a QP FATAL. Since in srp_reconnect_target we are not > destroying the QP, the QP FATAL persists after the reconnect. > In order to be able to recover from such situation I suggest we > destroy the CQ and the QP in every reconnect. > > This also corrects a minor spec in-compliance - when srp_reconnect_target > is called, srp destroys the CM ID and resets the QP, the new connection > will be retried with the same QPN which could theoretically lead to > stale packets (for strict spec compliance I think QPN should not be reused > till all stale packets are flushed out of the network). Roland, what do you think about this patch? Seems like a good idea, to me. -- MST From mst at mellanox.co.il Tue Oct 31 11:53:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 21:53:12 +0200 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> Message-ID: <20061031195312.GD5950@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Ordering between PCI config space writes and MMIO reads? > > The discussion fizzled out without really reaching a definitive > answer, so I'm going to apply the original patch (below), since I > pretty much convinced myself that only the driver doing the config > access has enough information to fix this reliably. > > - R. > > Author: John Partridge > Date: Tue Oct 31 11:00:04 2006 -0800 > > IB/mthca: Make sure all PCI config writes reach device before doing MMIO > > During initialization, mthca writes some PCI config space registers > and then does an MMIO read from one of the BARs it just enabled. This > MMIO read sometimes failed and caused a crash on SGI Altix machines, > because the PCI-X host bridge (legitimately, according to the PCI > spec) allowed the MMIO read to start before the config write completed. > > To fix this, add a config read after all config writes to make sure > they are all done before starting the MMIO read. > > Signed-off-by: John Partridge > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c > index 91934f2..578dc7c 100644 > --- a/drivers/infiniband/hw/mthca/mthca_reset.c > +++ b/drivers/infiniband/hw/mthca/mthca_reset.c > @@ -281,6 +281,20 @@ good: > goto out; > } > > + /* > + * Perform a "flush" of the PCI config writes here by reading > + * the PCI_COMMAND register. This is needed to make sure that > + * we don't try to touch other PCI BARs before the config > + * writes are done -- otherwise an MMIO cycle could start > + * before the config writes are done and reach the HCA before > + * the BAR is actually enabled. > + */ > + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { > + err = -ENODEV; > + mthca_err(mdev, "Couldn't access HCA memory after restoring, " > + "aborting.\n"); > + } > + > out: > if (bridge) > pci_dev_put(bridge); Here's what I don't understand: according to PCI rules, pci config read can bypass pci config write (both are non-posted). So why does doing it help flush the writes as the comment claims? Isn't this more the case of /* pci_config_write seems to complete asynchronously on Altix systems. * This is probably broken but its not clear what's the best * thing to do is - for now, do pci_read_config_dword which seems to flush * everything out. */ -- MST From rdreier at cisco.com Tue Oct 31 11:53:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 11:53:02 -0800 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061031195312.GD5950@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 31 Oct 2006 21:53:12 +0200") References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> <20061031195312.GD5950@mellanox.co.il> Message-ID: > Here's what I don't understand: according to PCI rules, pci config read > can bypass pci config write (both are non-posted). > So why does doing it help flush the writes as the comment claims? No, I don't believe a read of a config register can pass a write of the same register. (Someone correct me if I'm wrong) - R. From matthew at wil.cx Tue Oct 31 11:58:11 2006 From: matthew at wil.cx (Matthew Wilcox) Date: Tue, 31 Oct 2006 12:58:11 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> <20061031195312.GD5950@mellanox.co.il> Message-ID: <20061031195811.GF26964@parisc-linux.org> On Tue, Oct 31, 2006 at 11:53:02AM -0800, Roland Dreier wrote: > > Here's what I don't understand: according to PCI rules, pci config read > > can bypass pci config write (both are non-posted). > > So why does doing it help flush the writes as the comment claims? > > No, I don't believe a read of a config register can pass a write of > the same register. (Someone correct me if I'm wrong) I don't see anything in the PCI spec which forbids it, but I would expect that hardware designers don't actually do that in practice. From arlin.r.davis at intel.com Tue Oct 31 12:04:48 2006 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 31 Oct 2006 12:04:48 -0800 Subject: [openib-general] Verbs QP create with RQ=0? Message-ID: Roland, We have an application that doesn't require a receive CQ or QP resources since it will never post receive messages. We are attempting to limit our resources and reduce our receive resources to zero but it appears that the create qp assumes a receive CQ is always provided. According to the specification this appears to be correct, but it is not clear what the minimum requirements are for the receive CQ and QP resources. What are your recommendations for creating a QP with no recv QP/CQ resources? Is it possible to create receive queue of 0? Thanks, -arlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue Oct 31 12:28:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 22:28:00 +0200 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: References: Message-ID: <20061031202800.GA6866@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Ordering between PCI config space writes and MMIO reads? > > > Here's what I don't understand: according to PCI rules, pci config read > > can bypass pci config write (both are non-posted). > > So why does doing it help flush the writes as the comment claims? > > No, I don't believe a read of a config register can pass a write of > the same register. (Someone correct me if I'm wrong) It can if PCI-X/PCI-Ex spec is anything to go by. For example, see table 2-23, transaction ordering rules, in the PCI-express spec: it is marked as "Y/N: there are no requirements. The second transaction may optionally pass the first transaction or be blocked by it." In typical systems the OS should take care not to start a new non-posted transaction before the previous one completed. In particular, all intel and ppc systems I've seen simply block the CPU unti the split completion arrives. I find it hard to believe that Altix des not supply a way to check that completion for config write transaction has arrived. -- MST From jmodem at AbominableFirebug.com Tue Oct 31 12:34:47 2006 From: jmodem at AbominableFirebug.com (Richard B. Johnson) Date: Tue, 31 Oct 2006 15:34:47 -0500 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> <20061031195312.GD5950@mellanox.co.il> Message-ID: <019301c6fd2c$044d7010$0732700a@djlaptop> ----- Original Message ----- From: "Michael S. Tsirkin" To: "Roland Dreier" Cc: ; ; ; ; ; ; "David Miller" Sent: Tuesday, October 31, 2006 2:53 PM Subject: Re: Ordering between PCI config space writes and MMIO reads? > Quoting r. Roland Dreier : >> Subject: Re: Ordering between PCI config space writes and MMIO reads? >> >> The discussion fizzled out without really reaching a definitive >> answer, so I'm going to apply the original patch (below), since I >> pretty much convinced myself that only the driver doing the config >> access has enough information to fix this reliably. >> >> - R. >> >> Author: John Partridge >> Date: Tue Oct 31 11:00:04 2006 -0800 >> >> IB/mthca: Make sure all PCI config writes reach device before doing >> MMIO >> >> During initialization, mthca writes some PCI config space registers >> and then does an MMIO read from one of the BARs it just enabled. >> This >> MMIO read sometimes failed and caused a crash on SGI Altix machines, >> because the PCI-X host bridge (legitimately, according to the PCI >> spec) allowed the MMIO read to start before the config write >> completed. >> >> To fix this, add a config read after all config writes to make sure >> they are all done before starting the MMIO read. >> >> Signed-off-by: John Partridge >> Signed-off-by: Roland Dreier >> >> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c >> b/drivers/infiniband/hw/mthca/mthca_reset.c >> index 91934f2..578dc7c 100644 >> --- a/drivers/infiniband/hw/mthca/mthca_reset.c >> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c >> @@ -281,6 +281,20 @@ good: >> goto out; >> } >> >> + /* >> + * Perform a "flush" of the PCI config writes here by reading >> + * the PCI_COMMAND register. This is needed to make sure that >> + * we don't try to touch other PCI BARs before the config >> + * writes are done -- otherwise an MMIO cycle could start >> + * before the config writes are done and reach the HCA before >> + * the BAR is actually enabled. >> + */ >> + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { >> + err = -ENODEV; >> + mthca_err(mdev, "Couldn't access HCA memory after restoring, " >> + "aborting.\n"); >> + } >> + >> out: >> if (bridge) >> pci_dev_put(bridge); > > Here's what I don't understand: according to PCI rules, pci config read > can bypass pci config write (both are non-posted). > So why does doing it help flush the writes as the comment claims? > > Isn't this more the case of > /* pci_config_write seems to complete asynchronously on Altix systems. > * This is probably broken but its not clear what's the best > * thing to do is - for now, do pci_read_config_dword which seems to flush > * everything out. */ > If you write to the PCI bus and then you read the result, the read __might__ be the read that flushes any posted writes rather than the read of device registers that would occur after the BARs were configured (hardware may be slower than the CPU). So, it's best to do the required configuration cycles first, then after all is done, read something before you actually need to use data from subsequent read/write cycles. > -- > MST > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 (somewhere) New Book: http://www.AbominableFirebug.com From or.gerlitz at gmail.com Tue Oct 31 12:45:01 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 31 Oct 2006 22:45:01 +0200 Subject: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA In-Reply-To: <454779BF.2080703@ichips.intel.com> References: <000001c6f877$23298c80$52fc070a@amr.corp.intel.com> <4540CA0E.9020807@voltaire.com> <45447D71.40405@voltaire.com> <4545F8E5.2000003@voltaire.com> <45463535.2050302@ichips.intel.com> <45466770.9050107@ichips.intel.com> <45474572.8080109@voltaire.com> <454779BF.2080703@ichips.intel.com> Message-ID: <15ddcffd0610311245q614fee15g810d0438cbf965fa@mail.gmail.com> On 10/31/06, Sean Hefty wrote: > Or Gerlitz wrote: > > Please let me know if you manage to get mckey working in non loopback > > mode and if yes, if you have an idea how can i further debug my config. > > I did get mckey working fine in non loopback mode. OK, thanks for putting the time to do that. > >> root at excell01 librdmacm]# /home/ogerlitz/ib1.1/bin/mckey -m 224.5.5.5 > >> [root at excell02 src]# /home/ogerlitz/ib1.1/bin/mckey -m 224.5.5.5 -s -C > >> 10240 -S 1024 > You need to use the same message parameters (count and size) for both sender and > receiver. OK, i think i have tried this as well, but will make sure tomorrow, i do that. However, this does not explain why i don't see any completions on the sender's cq, correct? just for the sake of comparison, can you tell your config params (eg hca type/fw and the node arch etc). Or. From matthew at wil.cx Tue Oct 31 12:47:17 2006 From: matthew at wil.cx (Matthew Wilcox) Date: Tue, 31 Oct 2006 13:47:17 -0700 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <019301c6fd2c$044d7010$0732700a@djlaptop> References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> <20061031195312.GD5950@mellanox.co.il> <019301c6fd2c$044d7010$0732700a@djlaptop> Message-ID: <20061031204717.GG26964@parisc-linux.org> On Tue, Oct 31, 2006 at 03:34:47PM -0500, Richard B. Johnson wrote: > If you write to the PCI bus and then you read the result, the read > __might__ be the > read that flushes any posted writes rather than the read of device Config space writes aren't posted, they're delayed. So, for example, you can do the config write on the primary bus, then it hits a bridge on its way to the destination device. The bridge is entitled (obviously, it's unlikely to) drop it, and then the config read can pass by the config write. I'm beginning to think Michael Tsirkin has the only solution to this -- architectures need to check that their hardware blocks until the config write completion has occurred (and if not, simulate that it has in software). From mst at mellanox.co.il Tue Oct 31 12:50:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 31 Oct 2006 22:50:04 +0200 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <019301c6fd2c$044d7010$0732700a@djlaptop> References: <019301c6fd2c$044d7010$0732700a@djlaptop> Message-ID: <20061031205004.GB6866@mellanox.co.il> Quoting r. Richard B. Johnson : > Subject: Re: Ordering between PCI config space writes and MMIO reads? > > > ----- Original Message ----- > From: "Michael S. Tsirkin" > To: "Roland Dreier" > Cc: ; ; > ; ; ; > ; "David Miller" > Sent: Tuesday, October 31, 2006 2:53 PM > Subject: Re: Ordering between PCI config space writes and MMIO reads? > > > > Quoting r. Roland Dreier : > >> Subject: Re: Ordering between PCI config space writes and MMIO reads? > >> > >> The discussion fizzled out without really reaching a definitive > >> answer, so I'm going to apply the original patch (below), since I > >> pretty much convinced myself that only the driver doing the config > >> access has enough information to fix this reliably. > >> > >> - R. > >> > >> Author: John Partridge > >> Date: Tue Oct 31 11:00:04 2006 -0800 > >> > >> IB/mthca: Make sure all PCI config writes reach device before doing > >> MMIO > >> > >> During initialization, mthca writes some PCI config space registers > >> and then does an MMIO read from one of the BARs it just enabled. > >> This > >> MMIO read sometimes failed and caused a crash on SGI Altix machines, > >> because the PCI-X host bridge (legitimately, according to the PCI > >> spec) allowed the MMIO read to start before the config write > >> completed. > >> > >> To fix this, add a config read after all config writes to make sure > >> they are all done before starting the MMIO read. > >> > >> Signed-off-by: John Partridge > >> Signed-off-by: Roland Dreier > >> > >> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c > >> b/drivers/infiniband/hw/mthca/mthca_reset.c > >> index 91934f2..578dc7c 100644 > >> --- a/drivers/infiniband/hw/mthca/mthca_reset.c > >> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c > >> @@ -281,6 +281,20 @@ good: > >> goto out; > >> } > >> > >> + /* > >> + * Perform a "flush" of the PCI config writes here by reading > >> + * the PCI_COMMAND register. This is needed to make sure that > >> + * we don't try to touch other PCI BARs before the config > >> + * writes are done -- otherwise an MMIO cycle could start > >> + * before the config writes are done and reach the HCA before > >> + * the BAR is actually enabled. > >> + */ > >> + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) { > >> + err = -ENODEV; > >> + mthca_err(mdev, "Couldn't access HCA memory after restoring, " > >> + "aborting.\n"); > >> + } > >> + > >> out: > >> if (bridge) > >> pci_dev_put(bridge); > > > > Here's what I don't understand: according to PCI rules, pci config read > > can bypass pci config write (both are non-posted). > > So why does doing it help flush the writes as the comment claims? > > > > Isn't this more the case of > > /* pci_config_write seems to complete asynchronously on Altix systems. > > * This is probably broken but its not clear what's the best > > * thing to do is - for now, do pci_read_config_dword which seems to flush > > * everything out. */ > > > > If you write to the PCI bus and then you read the result, the read __might__ > be the read that flushes any posted writes rather than the read of device > registers that would occur after the BARs were configured (hardware may be > slower than the CPU). So, it's best to do the required configuration cycles > first, then after all is done, read something before you actually need to use > data from subsequent read/write cycles. But why should it help? Accordig to the spec, read does not flush configuration writes (unlike regular writes). -- MST From sashak at voltaire.com Tue Oct 31 14:02:29 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 1 Nov 2006 00:02:29 +0200 Subject: [openib-general] OpenSM unneeded/no longer used header files In-Reply-To: <1162319741.29957.8140.camel@hal.voltaire.com> References: <1162319741.29957.8140.camel@hal.voltaire.com> Message-ID: <20061031220229.GD19983@sashak.voltaire.com> On 13:35 Tue 31 Oct , Hal Rosenstock wrote: > The following OpenSM header files appear to be unused: > > 183 osm_errors.h > 230 osm_ft_config_ctrl.h > 291 osm_mcast_config_ctrl.h > 289 osm_pi_config_ctrl.h > 289 osm_pkey_config_ctrl.h > 297 osm_sm_info_get_ctrl.h > 290 osm_subnet_config_ctrl.h > > Any objections if they disappear ? It is ok for me. Sasha From rdreier at cisco.com Tue Oct 31 14:30:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 14:30:13 -0800 Subject: [openib-general] Ordering between PCI config space writes and MMIO reads? In-Reply-To: <20061031204717.GG26964@parisc-linux.org> (Matthew Wilcox's message of "Tue, 31 Oct 2006 13:47:17 -0700") References: <20061024214724.GS25210@parisc-linux.org> <20061024223631.GT25210@parisc-linux.org> <20061024.154347.77057163.davem@davemloft.net> <20061031195312.GD5950@mellanox.co.il> <019301c6fd2c$044d7010$0732700a@djlaptop> <20061031204717.GG26964@parisc-linux.org> Message-ID: > I'm beginning to think Michael Tsirkin has the only solution to this > -- architectures need to check that their hardware blocks until the > config write completion has occurred (and if not, simulate that it has > in software). OK, I guess I'm convinced. The vague language in the base PCI 3.0 spec about "dependencies" made me think that a read of a config register had to wait until all previous writes to the same register are done. So I'll drop this patch for now. John, you'll need to try and come up with a way to solve this in the Altix implementation of pci_write_config_xxx(). - R. From mshefty at ichips.intel.com Tue Oct 31 14:59:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 14:59:26 -0800 Subject: [openib-general] Verbs QP create with RQ=0? In-Reply-To: References: Message-ID: <4547D54E.50605@ichips.intel.com> Davis, Arlin R wrote: > We have an application that doesn’t require a receive CQ or QP resources > since it will never post receive messages. We are attempting to limit > our resources and reduce our receive resources to zero but it appears > that the create qp assumes a receive CQ is always provided. According to > the specification this appears to be correct, but it is not clear what > the minimum requirements are for the receive CQ and QP resources. What > are your recommendations for creating a QP with no recv QP/CQ resources? > Is it possible to create receive queue of 0? Why not just set the receive CQ == the send CQ? Creating a QP with a receive queue size of 0 should be possible. Is that failing? - Sean From sean.hefty at intel.com Tue Oct 31 16:25:48 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 31 Oct 2006 16:25:48 -0800 Subject: [openib-general] [RFC] [PATCH] rdma/ib_cm: fix APM support Message-ID: <000101c6fd4c$47cc7000$ff0da8c0@amr.corp.intel.com> The following patch attempts to fix issues in the ib_cm regarding support for path migration. The fixes are mainly on feedback from Venkatesh. The patch has NOT been tested to verify that APM works correctly, but I did check that it didn't break anything. I need to develop a test program to verify that APM works. I'd like to get feedback to this approach. For the most part, it makes use of the existing interfaces where possible to limit changes to the userspace library. More specifically: The ib_cm_establish() call is replaced with a more generic ib_cm_notify(). This routine is used to notify the CM that failover has occurred, so that future CM messages (LAP, DREQ) reach the remote CM. New alternate path information is captured when a LAP message is sent or received. This allows QP attributes to be initialized for the user when loading a new path after failover has occurred. Signed-off-by: Sean Hefty --- Venkatesh / anyone else: it would be helpful if someone could try porting their application to this interface, and let me know if it works. I'm working on a test program for this, but it will take a few days to create it. diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1cf0d42..c4e9bb5 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -152,7 +152,6 @@ struct cm_id_private { u8 peer_to_peer; u8 responder_resources; u8 initiator_depth; - u8 local_ack_timeout; u8 retry_count; u8 rnr_retry_count; u8 service_timeout; @@ -691,7 +690,7 @@ static void cm_enter_timewait(struct cm_ * timewait before notifying the user that we've exited timewait. */ cm_id_priv->id.state = IB_CM_TIMEWAIT; - wait_time = cm_convert_to_ms(cm_id_priv->local_ack_timeout); + wait_time = cm_convert_to_ms(cm_id_priv->av.packet_life_time + 1); queue_delayed_work(cm.wq, &cm_id_priv->timewait_info->work.work, msecs_to_jiffies(wait_time)); cm_id_priv->timewait_info = NULL; @@ -1024,8 +1023,6 @@ int ib_send_cm_req(struct ib_cm_id *cm_i cm_id_priv->local_qpn = cm_req_get_local_qpn(req_msg); cm_id_priv->rq_psn = cm_req_get_starting_psn(req_msg); - cm_id_priv->local_ack_timeout = - cm_req_get_primary_local_ack_timeout(req_msg); spin_lock_irqsave(&cm_id_priv->lock, flags); ret = ib_post_send_mad(cm_id_priv->msg, NULL); @@ -1411,8 +1408,6 @@ static int cm_req_handler(struct cm_work cm_id_priv->responder_resources = cm_req_get_init_depth(req_msg); cm_id_priv->path_mtu = cm_req_get_path_mtu(req_msg); cm_id_priv->sq_psn = cm_req_get_starting_psn(req_msg); - cm_id_priv->local_ack_timeout = - cm_req_get_primary_local_ack_timeout(req_msg); cm_id_priv->retry_count = cm_req_get_retry_count(req_msg); cm_id_priv->rnr_retry_count = cm_req_get_rnr_retry_count(req_msg); cm_id_priv->qp_type = cm_req_get_qp_type(req_msg); @@ -1716,7 +1711,7 @@ static int cm_establish_handler(struct c unsigned long flags; int ret; - /* See comment in ib_cm_establish about lookup. */ + /* See comment in cm_establish about lookup. */ cm_id_priv = cm_acquire_id(work->local_id, work->remote_id); if (!cm_id_priv) return -EINVAL; @@ -2402,11 +2397,16 @@ int ib_send_cm_lap(struct ib_cm_id *cm_i cm_id_priv = container_of(cm_id, struct cm_id_private, id); spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id->state != IB_CM_ESTABLISHED || - cm_id->lap_state != IB_CM_LAP_IDLE) { + (cm_id->lap_state != IB_CM_LAP_UNINIT && + cm_id->lap_state != IB_CM_LAP_IDLE)) { ret = -EINVAL; goto out; } + ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av); + if (ret) + goto out; + ret = cm_alloc_msg(cm_id_priv, &msg); if (ret) goto out; @@ -2480,6 +2480,7 @@ static int cm_lap_handler(struct cm_work goto unlock; switch (cm_id_priv->id.lap_state) { + case IB_CM_LAP_UNINIT: case IB_CM_LAP_IDLE: break; case IB_CM_MRA_LAP_SENT: @@ -2502,6 +2503,10 @@ static int cm_lap_handler(struct cm_work cm_id_priv->id.lap_state = IB_CM_LAP_RCVD; cm_id_priv->tid = lap_msg->hdr.tid; + cm_init_av_for_response(work->port, work->mad_recv_wc->wc, + work->mad_recv_wc->recv_buf.grh, + &cm_id_priv->av); + cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av); ret = atomic_inc_and_test(&cm_id_priv->work_count); if (!ret) list_add_tail(&work->list, &cm_id_priv->work_list); @@ -3040,7 +3045,7 @@ static void cm_work_handler(void *data) cm_free_work(work); } -int ib_cm_establish(struct ib_cm_id *cm_id) +static int cm_establish(struct ib_cm_id *cm_id) { struct cm_id_private *cm_id_priv; struct cm_work *work; @@ -3088,7 +3093,43 @@ int ib_cm_establish(struct ib_cm_id *cm_ out: return ret; } -EXPORT_SYMBOL(ib_cm_establish); + +static int cm_migrate(struct ib_cm_id *cm_id) +{ + struct cm_id_private *cm_id_priv; + unsigned long flags; + int ret = 0; + + cm_id_priv = container_of(cm_id, struct cm_id_private, id); + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id->state == IB_CM_ESTABLISHED && + (cm_id->lap_state == IB_CM_LAP_UNINIT || + cm_id->lap_state == IB_CM_LAP_IDLE)) + cm_id_priv->av = cm_id_priv->alt_av; + else + ret = -EINVAL; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + return ret; +} + +int ib_cm_notify(struct ib_cm_id *cm_id, enum ib_event_type event) +{ + int ret; + + switch (event) { + case IB_EVENT_COMM_EST: + ret = cm_establish(cm_id); + break; + case IB_EVENT_PATH_MIG: + ret = cm_migrate(cm_id); + break; + default: + ret = -EINVAL; + } + return ret; +} +EXPORT_SYMBOL(ib_cm_notify); static void cm_recv_handler(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc) @@ -3221,6 +3262,9 @@ static int cm_init_qp_rtr_attr(struct cm if (cm_id_priv->alt_av.ah_attr.dlid) { *qp_attr_mask |= IB_QP_ALT_PATH; qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; + qp_attr->alt_pkey_index = cm_id_priv->alt_av.pkey_index; + qp_attr->alt_timeout = + cm_id_priv->alt_av.packet_life_time + 1; qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; } ret = 0; @@ -3247,19 +3291,31 @@ static int cm_init_qp_rts_attr(struct cm case IB_CM_REP_SENT: case IB_CM_MRA_REP_RCVD: case IB_CM_ESTABLISHED: - *qp_attr_mask = IB_QP_STATE | IB_QP_SQ_PSN; - qp_attr->sq_psn = be32_to_cpu(cm_id_priv->sq_psn); - if (cm_id_priv->qp_type == IB_QPT_RC) { - *qp_attr_mask |= IB_QP_TIMEOUT | IB_QP_RETRY_CNT | - IB_QP_RNR_RETRY | - IB_QP_MAX_QP_RD_ATOMIC; - qp_attr->timeout = cm_id_priv->local_ack_timeout; - qp_attr->retry_cnt = cm_id_priv->retry_count; - qp_attr->rnr_retry = cm_id_priv->rnr_retry_count; - qp_attr->max_rd_atomic = cm_id_priv->initiator_depth; - } - if (cm_id_priv->alt_av.ah_attr.dlid) { - *qp_attr_mask |= IB_QP_PATH_MIG_STATE; + if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) { + *qp_attr_mask = IB_QP_STATE | IB_QP_SQ_PSN; + qp_attr->sq_psn = be32_to_cpu(cm_id_priv->sq_psn); + if (cm_id_priv->qp_type == IB_QPT_RC) { + *qp_attr_mask |= IB_QP_TIMEOUT | IB_QP_RETRY_CNT | + IB_QP_RNR_RETRY | + IB_QP_MAX_QP_RD_ATOMIC; + qp_attr->timeout = + cm_id_priv->av.packet_life_time + 1; + qp_attr->retry_cnt = cm_id_priv->retry_count; + qp_attr->rnr_retry = cm_id_priv->rnr_retry_count; + qp_attr->max_rd_atomic = + cm_id_priv->initiator_depth; + } + if (cm_id_priv->alt_av.ah_attr.dlid) { + *qp_attr_mask |= IB_QP_PATH_MIG_STATE; + qp_attr->path_mig_state = IB_MIG_REARM; + } + } else { + *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE; + qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; + qp_attr->alt_pkey_index = cm_id_priv->alt_av.pkey_index; + qp_attr->alt_timeout = + cm_id_priv->alt_av.packet_life_time + 1; + qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr; qp_attr->path_mig_state = IB_MIG_REARM; } ret = 0; diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index ad4f4d5..e04f662 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -685,11 +685,11 @@ out: return result; } -static ssize_t ib_ucm_establish(struct ib_ucm_file *file, - const char __user *inbuf, - int in_len, int out_len) +static ssize_t ib_ucm_notify(struct ib_ucm_file *file, + const char __user *inbuf, + int in_len, int out_len) { - struct ib_ucm_establish cmd; + struct ib_ucm_notify cmd; struct ib_ucm_context *ctx; int result; @@ -700,7 +700,7 @@ static ssize_t ib_ucm_establish(struct i if (IS_ERR(ctx)) return PTR_ERR(ctx); - result = ib_cm_establish(ctx->cm_id); + result = ib_cm_notify(ctx->cm_id, (enum ib_event_type) cmd.event); ib_ucm_ctx_put(ctx); return result; } @@ -1107,7 +1107,7 @@ static ssize_t (*ucm_cmd_table[])(struct [IB_USER_CM_CMD_DESTROY_ID] = ib_ucm_destroy_id, [IB_USER_CM_CMD_ATTR_ID] = ib_ucm_attr_id, [IB_USER_CM_CMD_LISTEN] = ib_ucm_listen, - [IB_USER_CM_CMD_ESTABLISH] = ib_ucm_establish, + [IB_USER_CM_CMD_NOTIFY] = ib_ucm_notify, [IB_USER_CM_CMD_SEND_REQ] = ib_ucm_send_req, [IB_USER_CM_CMD_SEND_REP] = ib_ucm_send_rep, [IB_USER_CM_CMD_SEND_RTU] = ib_ucm_send_rtu, diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h index c9b4738..5c07017 100644 --- a/include/rdma/ib_cm.h +++ b/include/rdma/ib_cm.h @@ -60,6 +60,7 @@ enum ib_cm_state { }; enum ib_cm_lap_state { + IB_CM_LAP_UNINIT, IB_CM_LAP_IDLE, IB_CM_LAP_SENT, IB_CM_LAP_RCVD, @@ -443,13 +444,20 @@ int ib_send_cm_drep(struct ib_cm_id *cm_ u8 private_data_len); /** - * ib_cm_establish - Forces a connection state to established. + * ib_cm_notify - Notifies the CM of an event reported to the consumer. * @cm_id: Connection identifier to transition to established. + * @event: Type of event. * - * This routine should be invoked by users who receive messages on a - * connected QP before an RTU has been received. + * This routine should be invoked by users to notify the CM of relevant + * communication events. Events that should be reported to the CM and + * when to report them are: + * + * IB_EVENT_COMM_EST - Used when a message is received on a connected + * QP before an RTU has been received. + * IB_EVENT_PATH_MIG - Notifies the CM that the connection has failed over + * to the alternate path. */ -int ib_cm_establish(struct ib_cm_id *cm_id); +int ib_cm_notify(struct ib_cm_id *cm_id, enum ib_event_type event); /** * ib_send_cm_rej - Sends a connection rejection message to the diff --git a/include/rdma/ib_user_cm.h b/include/rdma/ib_user_cm.h old mode 100644 new mode 100755 index 066c20b..37650af --- a/include/rdma/ib_user_cm.h +++ b/include/rdma/ib_user_cm.h @@ -38,7 +38,7 @@ #define IB_USER_CM_H #include -#define IB_USER_CM_ABI_VERSION 4 +#define IB_USER_CM_ABI_VERSION 5 enum { IB_USER_CM_CMD_CREATE_ID, @@ -46,7 +46,7 @@ enum { IB_USER_CM_CMD_ATTR_ID, IB_USER_CM_CMD_LISTEN, - IB_USER_CM_CMD_ESTABLISH, + IB_USER_CM_CMD_NOTIFY, IB_USER_CM_CMD_SEND_REQ, IB_USER_CM_CMD_SEND_REP, @@ -117,8 +117,9 @@ struct ib_ucm_listen { __u32 reserved; }; -struct ib_ucm_establish { +struct ib_ucm_notify { __u32 id; + __u32 event; }; struct ib_ucm_private_data { From abmlist at gmail.com Tue Oct 31 17:18:00 2006 From: abmlist at gmail.com (Anand Bisen) Date: Tue, 31 Oct 2006 17:18:00 -0800 Subject: [openib-general] Kernel.org kernel Message-ID: What are the IB drivers that are in kernel.org (2.6.18.1) kernel. If they are not the same then should I remove them from the menuconfig and then build OFED drivers in order to avoid conflict. Is there any how-to procedure on how I can patch my kernel.org's vanilla kernel with the OFED 1.1 drivers or they have to be built seperately. Thanks Anand -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue Oct 31 19:29:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 19:29:27 -0800 Subject: [openib-general] Kernel.org kernel In-Reply-To: (Anand Bisen's message of "Tue, 31 Oct 2006 17:18:00 -0800") References: Message-ID: > What are the IB drivers that are in kernel.org (2.6.18.1) kernel. If they > are not the same then should I remove them from the menuconfig and then > build OFED drivers in order to avoid conflict. Is there any how-to procedure > on how I can patch my kernel.org's vanilla kernel with the OFED 1.1 drivers > or they have to be built seperately. If you are able to build your own kernel and like to use up-to-date kernels, I would recommend just using the drivers that are in the mainline kernel and not worrying about OFED. - R. From rdreier at cisco.com Tue Oct 31 19:38:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 31 Oct 2006 19:38:09 -0800 Subject: [openib-general] Verbs QP create with RQ=0? In-Reply-To: (Arlin R. Davis's message of "Tue, 31 Oct 2006 12:04:48 -0800") References: Message-ID: > We have an application that doesn't require a receive CQ or QP resources > since it will never post receive messages. We are attempting to limit > our resources and reduce our receive resources to zero but it appears > that the create qp assumes a receive CQ is always provided. According to > the specification this appears to be correct, but it is not clear what > the minimum requirements are for the receive CQ and QP resources. What > are your recommendations for creating a QP with no recv QP/CQ resources? > Is it possible to create receive queue of 0? As was already suggested, you should be able to use the same CQ for receives and for sends. If you never post any receives on the QP, you don't have to allocate any extra space on your send CQ. And it should work to have 0 receive work queue entries. Have you tried it? I think there actually is a bug in mthca that prevents creating a QP with both the send queue and receive queue of size 0 (which would be useful as an RDMA target), but I don't think that's what you're running into. That's on my list of things to chase down, but it hasn't been a real issue for anyone (as far as I know) so the priority is not high for me. - R. From somenath at veritas.com Tue Oct 31 10:51:37 2006 From: somenath at veritas.com (somenath) Date: Tue, 31 Oct 2006 10:51:37 -0800 Subject: [openib-general] remote node/port going down notification Message-ID: <45479B39.1070404@veritas.com> is there a way to get remote node/port down notification (other part of a connected qpair)? I can't associate any of these async events to remote node/port going down.. I assume these are for local port/node... thanks, som. enum ib_event_type { IB_EVENT_CQ_ERR, IB_EVENT_QP_FATAL, IB_EVENT_QP_REQ_ERR, IB_EVENT_QP_ACCESS_ERR, IB_EVENT_COMM_EST, IB_EVENT_SQ_DRAINED, IB_EVENT_PATH_MIG, IB_EVENT_PATH_MIG_ERR, IB_EVENT_DEVICE_FATAL, IB_EVENT_PORT_ACTIVE, IB_EVENT_PORT_ERR, IB_EVENT_LID_CHANGE, IB_EVENT_PKEY_CHANGE, IB_EVENT_SM_CHANGE, IB_EVENT_SRQ_ERR, IB_EVENT_SRQ_LIMIT_REACHED, IB_EVENT_QP_LAST_WQE_REACHED };